






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The d-stream algorithm for clustering multi-dimensional data in the form of a stream, using density grids and strategies for managing sparse grid space. The paper covers concepts such as density grids, neighboring grids, grid clusters, and the dynamic detection and removal of sporadic grids.
Typology: Study Guides, Projects, Research
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Existing data-stream clustering algorithms such as CluS- tream are based on k-means. These clustering algorithms are incompetent to find clusters of arbitrary shapes and can- not handle outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this paper proposes D-Stream, a framework for cluster- ing stream data using a density-based approach. The algo- rithm uses an online component which maps each input data record into a grid and an offline component which computes the grid density and clusters the grids based on the den- sity. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream. Exploiting the intricate relationships between the decay factor, data density and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to de- tect and remove sporadic grids mapped to by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has su- perior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.
H.2.8 [Database Management]: Database Applications— data mining
Algorithms, Experimentation, Performance, Theory
Stream data mining, density-based clustering, D-Stream, sporadic grids
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. KDD’07, August 12–15, 2007, San Jose, California, USA. Copyright 2007 ACM 978-1-59593-609-7/07/0008 ...$5.00.
Clustering high-dimensional stream data in real time is a difficult and important problem with ample applications such as network intrusion detection, weather monitoring, emergency response systems, stock trading, electronic busi- ness, telecommunication, planetary remote sensing, and web site analysis. In these applications, large volume of multi- dimensional data streams arrive at the data collection center in real time. Examples such as the transactions in a super- market and the phone records of a mobile phone company illustrate that, the raw data typically have massive volume and can only be scanned once following the temporal or- der [7, 8]. Recently, there has been active research on how to store, query and analyze data streams. Clustering is a key data mining task. In this paper, we consider clustering multi-dimensional data in the form of a stream, i.e. a sequence of data records stamped and ordered by time. Stream data clustering analysis causes unprece- dented difficulty for traditional clustering algorithms. There are several key challenges. First, the data can only be ex- amined in one pass. Second, viewing a data stream as a long vector of data is not adequate in many applications. In fact, in many applications of data stream clustering, users are more interested in the evolving behaviors of clusters. Recently, there have been different views and approaches to stream data clustering. Earlier clustering algorithms for data stream uses a single-phase model which treats data stream clustering as a continuous version of static data clus- tering [9]. These algorithms uses divide and conquer schemes that partition data streams into segments and discover clus- ters in data streams based on a k-means algorithm in finite space [10, 12]. A limitation of such schemes is that they put equal weights to outdated and recent data and cannot cap- ture the evolving characteristics of stream data. Moving- window techniques are proposed to partially address this problem [2, 4]. Another recent data stream clustering paradigm proposed by Aggarwal et al. uses a two-phase scheme [1] which con- sists of an online component that processes raw data stream and produces summary statistics and an offline component that uses the summary data to generate clusters. Strate- gies for dividing the time horizon and manage the statistics are studied. The design leads to the CluStream system [1]. Many recent data stream clustering algorithms are based on CluStream’s two-phase framework. Wang et al. proposed an improved offline component using an incomplete partition- ing strategy [17]. Extensions of this work including cluster- ing multiple data streams [6], parallel data streams [5], and
distributed data steams [3], and applications of data stream mining [11, 16, 13]. A number of limitations of CluStream and other related work lie in the k-means algorithm used in their offline com- ponent. First, a fundamental drawback of k-means is that it aims at identifying spherical clusters but is incapable of revealing clusters of arbitrary shapes. However, nonconvex and interwoven clusters are seen in many applications. Sec- ond, the k-means algorithm is unable to detect noise and outliers. Third, the k-means algorithm requires multiple scans of the data, making it not directly applicable to large- volume data stream. For this reason, the CluStream ar- chitecture uses an online processing which compresses raw data stream in micro-clusters, which are used as the basic elements in the offline phase. Density-based clustering has been long proposed as an- other major clustering algorithm [14, 15]. We find the density- based method a natural and attractive basic clustering al- gorithm for data streams, because it can find arbitrarily shaped clusters, it can handle noises and is an one-scan al- gorithm that needs to examine the raw data only once. Fur- ther, it does not demand a prior knowledge of the number of clusters k as the k-means algorithm does. In this paper, we propose D-Stream, a density-based clustering framework for data streams. It is not a simple switch-over to use density-based instead of k-means algo- rithms for data streams. There are two main technical chal- lenges. First, it is not desirable to treat the data stream as a long sequence of static data since we are interested in the evolving temporal feature of the data stream. To capture the dynamic changing of clusters, we propose an innovative scheme that associates a decay factor to the density of each data point. Unlike the CluStream architecture which asks the users to input the target time duration for clustering, the decay factor provides a novel mechanism for the system to dynamically and automatically form the clusters by placing more weights on the most recent data without totally dis- carding the historical information. In addition, D-Stream does not require the user to specify the number of clusters k. Thus, D-Stream is particularly suitable for users with little domain knowledge on the application data. Second, due to the large volume of stream data, it is im- possible to retain the density information for every data record. Therefore, we propose to partition the data space into discretized fine grids and map new data records into the corresponding grid. Thus, we do not need to retain the raw data and only need to operate on the grids. How- ever, for high-dimensional data, the number of grids can be large. Therefore, how to handle with high dimensionality and improve scalability is a critical issue. Fortunately, in practice, most grids are empty or only contain few records and a memory-efficient technique for managing such a sparse grid space is developed in D-Stream. By addressing the above issues, we propose D-Stream, a density-based stream data clustering framework. We study in depth the relationship between time horizon, decay fac- tor, and data density to ensure the generation of high qual- ity clusters, and develop novel strategies for controlling the decay factor and detecting outliers. D-Stream automati- cally and dynamically adjusts the clusters without requir- ing user specification of target time horizon and number of clusters. The experimental results show that D-Stream can
Figure 1: The overall process of D-Stream.
find clusters of arbitrary shapes. Comparing to CluStream, D-Stream is better in terms of both clustering quality and efficiency and it exhibits high scalability for large-scale and high-dimensional stream data. The rest of the paper is organized as follows. In Section 2, we overview the overall architecture of D-Stream. In Section 3, we present the concept and theory on the proposed density grid and decay factor. In Section 4, we give the algorithmic details and theoretical analysis for D-Stream. We conduct experimental study of D-Stream and compare D-Stream to CluStream on real-world and synthetic data sets in Section 5 and conclude the paper in Section 6.
2. OVERALL ALGORITHM OF D-STREAM We overview the overall architecture of D-Stream, which assumes a discrete time step model, where the time stamp is labelled by integers 0, 1 , 2 , · · · , n, · · ·. Like CluStream [1], D- Stream has an online component and an offline component. The overall algorithm is outlined in Figure 1. For a data stream, at each time step, the online com- ponent of D-Stream continuously reads a new data record, place the multi-dimensional data into a corresponding dis- cretized density grid in the multi-dimensional space, and update the characteristic vector of the density grid (Lines 5-8 of Figure 1). The density grid and characteristic vec- tor are to be described in detail in Section 3. The offline component dynamically adjusts the clusters every gap time steps, where gap is an integer parameter. After the first gap, the algorithm generates the initial cluster (Lines 9-11). Then, the algorithm periodically removes sporadic grids and regulates the clusters (Lines 12-15). 3. DENSITY GRIDS In this section, we introduce the concept of density grid and other associated definitions, which form the basis for the D-Stream algorithm. Since it is impossible to retain the raw data, D-Stream partitions the multi-dimensional data space into many den- sity grids and forms clusters of these grids. This concept is schematically illustrated in Figure 2.
3.1 Basic definitions In this paper, we assume that the input data has d di- mensions, and each input data record is defined within the
Proposition 3.2 shows that the sum of the density of all data records in the system will never exceed (^1) −^1 λ. Since
there are N =
∏d i=1 pi^ grids, the^ average density^ of each grid is no more than but approaching (^) N(1^1 −λ). This obser-
vation motivates the following definitions. At time t, for a grid g, we call it a dense grid if
D(g, t) ≥ Cm N(1 − λ)
= Dm, (8)
where Cm > 1 is a parameter controlling the threshold. For example, we set Cm = 3. We require N > Cm since D(g, t) cannot exceed (^1) −^1 λ. At time t, for a grid g, we call it a sparse grid if
D(g, t) ≤
Cl N(1 − λ)
= Dl, (9)
where 0 < Cl < 1. For example, we set Cl = 0.8. At time t, for a grid g, we call it a transitional grid if Cl N(1 − λ)
≤ D(g, t) ≤
Cm N(1 − λ)
In the multi-dimensional space, we consider connecting neighboring grids, defined below, in order to form clusters. Definition 3.3. (Neighboring Grids) Consider two den- sity grids g 1 = (j^11 , j 21 , · · · , j^1 d ) and g 2 = (j^21 , j 22 , · · · , j^2 d ), if there exists k, 1 ≤ k ≤ d, such that:
Definition 3.4. (Grid Group) A set of density grids G = (g 1 , · · · , gm) is a grid group if for any two grids gi, gj ∈ G, there exist a sequence of grids gk 1 , · · · , gkl such that gk 1 = gi, gkl = gj , and gk 1 ∼ gk 2 , gk 2 ∼ gk 3 , · · · , and gkl− 1 ∼ gkl.
Definition 3.5. (Inside and Outside Grids) Consider a grid group G and a grid g ∈ G, suppose g = (j 1 , · · · , jd), if g has neighboring grids in every dimension i = 1, · · · , d, then g is an inside grid in G. Otherwise g is an outside grid in G.
Now we are ready to define how to form clusters based on the density of grids. Definition 3.6. (Grid Cluster) Let G = (g 1 , · · · , gm) be a grid group, if every inside grid of G is a dense grid and every outside grid is either a dense grid or a transitional grid, then G is a grid cluster.
Intuitively, a grid cluster is a connected grid group which has higher density than the surrounding grids. Note that we always try to merge clusters whenever possible, so the resulting clusters are surrounded by sparse grids.
4. COMPONENTS OF D-STREAM
We now describe in detail the key components of D-Stream outline in Figure 1. As we have discuss in the last section, for each new data record x, we map it to a grid g and use (5) to update the density of g (Lines 5-8 of Figure 1). We then periodically (every gap time steps) form clusters and remove sporadic grids. In the following, we describe our strategies for determining gap, managing the list of active grids, and generating clusters.
4.1 Grid inspection and time interval gap To mine the dynamic characteristics of data streams, our density grid scheme developed in Section 3 gradually reduces the density of each data record and grid. A dense grid may degenerate to a transitional or sparse grid if it does not receive no new data for a long time. On the other hand, a sparse grid can be upgraded to a transitional or dense grid after it receives some new data records. Therefore, after a period of time, the density of each grid should be inspected and the clusters adjusted. A key decision is the length of the time interval for grid inspection. It is interesting to note that the value of the time interval gap cannot be too large or too small. If gap is too large, dynamical changes of data streams will not be adequately recognized. If gap is too small, it will result in frequent computation by the offline component and increase the workload. When such computation load is too heavy, the processing speed of the offline component may not match the speed of the input data stream. We propose the following strategy to determine the suit- able value of gap. We consider the minimum time needed for a dense grid to degenerate to a sparse grid as well as the minimum time needed for a sparse grid to become a dense grid. Then we set gap to be minimum of these two mini- mum times in order to ensure that the inspection is frequent enough to detect the density changes of any grid. Proposition 4.1. For any dense grid g, the minimum time needed for g to become a sparse grid from being a dense grid is
δ 0 =
logλ
Cl Cm
Proof. According to (8), if at time t, a grid g is a dense grid, then we have:
D(g, t) ≥ Dm =
Cm N(1 − λ)
Suppose after δt time, g becomes a sparse grid, then we have:
D(g, t + δt) ≤ Dl =
Cl N(1 − λ)
On the other hand, let E(g, t) be the set of data records in g at time t, we have E(g, t) ⊆ E(g, t + δt) and:
D(g, t + δt) =
x∈E(g,t+δt)
D(x, t + δt)
x∈E(g,t)
D(x, t + δt)
x∈E(g,t)
λδt^ D(x, t) = λδt^ D(g, t) (14)
Combining (13) and (14) we get:
λ δt D(g, t) ≤ D(g, t + δt) ≤
Cl N(1 − λ)
Combining (12) and (15) we get:
λδt^
Cm N(1 − λ)
≤ λδt^ D(g, t) ≤
Cl N(1 − λ)
which yields:
δt ≥ logλ
Cl Cm
Proposition 4.2. For any sparse grid g, the minimum time needed for g to become a dense grid from being a sparse grid is
δ 1 =
logλ
N − Cm N − Cl
Proof. According to (9), if at time t, a grid g is a sparse grid, then we have:
D(g, t) ≤ Dl =
Cl N(1 − λ)
Suppose after δt time, g becomes a dense grid, then we have:
D(g, t + δt) ≥ Dm =
Cm N(1 − λ)
We also know that:
D(g, t + δt) =
x∈E(g,t+δt)
D(x, t + δt) (21)
E(g, t + δt) can be divided into those points in E(g, t) and those come after t. The least time for a sparse grid g to become dense is achieved when all the new data records are mapped to g. In this case, there is a new data record mapped to g for any of the time steps from t + 1 until t + δt. The sum of the density of all these new records at time∑ t + δt is δt − 1 i=0 λ
i. Therefore we have:
D(g, t + δt) ≤
x∈E(g,t)
D(x, t + δt) +
δ∑t − 1
i=
λ i
x∈E(g,t)
λ δt D(x, t) +
1 − λδt 1 − λ
= λδt^ D(g, t) +
1 − λδt 1 − λ
Now we plug (20) and (19) into (22) to obtain:
Cm N(1 − λ)
≤ D(g, t + δt) ≤ λδt^ D(g, t) +
1 − λδt 1 − λ
≤
λδt^ Cl N(1 − λ)
1 − λδt 1 − λ
Solving (23) yields:
λδt^ ≤ N − Cm N − Cl
which results in:
δt ≥ logλ
N − Cm N − Cl
Note N − Cm > 0 since Cm < N according to (8). Based on the two propositions above, we choose gap to be small enough so that any change of a grid from dense to sparse or from sparse to dense can be recognized. Thus, in D-Stream we set:
gap = min{δ 0 , δ 1 }
= min
logλ Cl Cm
logλ N − Cm N − Cl
logλ
max
Cl Cm
N − Cm N − Cl
4.2 Detecting and removing sporadic grids A serious challenge for the density grid scheme is the large number of grids, especially for high-dimensional data. For example, if each dimension is divided into 20 regions, there will be 20d^ possible grids. A key observation is that most of the grids in the space are empty or receive data very infrequently. In our implementa- tion, we allocate memory to store the characteristic vectors for those grids that are not empty, which form a very small subset in the grid space. Unfortunately, in practice, this is still not efficient enough due to the appearance of out- lier data that are made from errors, which lead to continual increase of non-empty grids that will be processed during clustering. We call such grids sporadic grids since they contain very few data. Since a data stream flows in by mas- sive volume in high speed and it could run for a very long time, sporadic grids accumulate and their number can be- come exceedingly large, causing the system to operate more and more slowly. Therefore, it is imperative to detect and remove such sporadic grids periodically. This is done in Line 13 of the D-Stream algorithm in Figure 1. Sparse grid with D ≤ Dl are candidates for sporadic grids. However, there are two reasons for the density of a grid to be less than Dl. The first cause is that it has received very few data, while the second cause is that the grid has previ- ously received many data but the density is reduced by the effect of decay factor. Only the grids in the former case are true sporadic grids that we aim to remove. The sparse grids in the latter case should not be removed since they contain many data records and are often upgraded to transitional or dense grids. We have found through extensive experimen- tation that wrongly removing these grids in the latter case can significantly deteriorate the clustering quality. We define a density threshold function to differentiate these two classes of sparse grids. Definition 4.1. (Density Threshold Function) Sup- pose the last update time of a grid g is tg , then at time t (t > tg ), the density threshold function is
π(tg, t) =
Cl N
t ∑−tg
i=
λi^ =
Cl(1 − λt−tg^ +1) N(1 − λ)
Proposition 4.3. There are the following properties of the function π(tg , t). (1) If t 1 ≤ t 2 ≤ t 3 , then
λt^3 −t^2 π(t 1 , t 2 ) + π(t 2 + 1, t 3 ) = π(t 1 , t 3 ).
(2) If t 1 ≤ t 2 , then π(t 1 , t) ≥ π(t 2 , t) for any t > t 1 , t 2.
Proof. (1) We see that:
λ t 3 −t 2 π(t 1 , t 2 ) + π(t 2 + 1, t 3 )
Cl N
t (^2) ∑−t 1
i=
λt^3 −t^2 +i^ + Cl N
t 3 − ∑t 2 − 1
i=
λi
Cl N
t (^3) ∑−t 1
i=t 3 −t 2
λi^ +
Cl N
t 3 − ∑t 2 − 1
i=
λi
Cl N
t (^3) ∑−t 1
i=
λi^ = π(t 1 , t 3 )
Proof. We prove (a). (b) can be proved similarly. Suppose the grid g has been previously deleted for the periods of (0, t 1 ), (t 1 + 1, t 2 ), · · · , (tm− 1 + 1, tm), then:
Da (g, t + gap) =
∑^ m
i=
D(g, ti)λt−ti+gap^ + D(g, t + gap) (31)
Since we assume that g receives no data from t+1 to t+gap,
Da(g, t + gap) =
∑^ m
i=
D(g, ti)λt−ti+gap^ + D(g, t)λgap
∑^ m
i=
π(ti− 1 + 1, ti)λt−ti+gap^ + Dlλgap
= π(0, tm)λt−tm^ λgap^ + Dlλgap
(according to (S2)) < π(0, tm)λβt/(1+β)λgap^ + Dlλgap
In order to ensure Da(g, t + gap) < Dl, we require:
π(0, tm)λ βt/(1+β) λ gap
⇒ λβt/(1+β)^ < (1 − λgap)Dl λgapπ(0, tm)
1 − λgap λgap(1 − λtm+1)
Thus, (a) is true for t 0 satisfying:
t 0 >
1 + β β
logλ
1 − λgap λgap(1 − λtm+1)
Proposition 4.5 is a key result showing that (S1), (S2), (D1) and (D2) work together correctly. It implies that, as time extends for long enough, we will never delete a potential transitional or dense grid due to the previous removals of data. If a grid is sparse (resp. not dense), then when it is deleted, it must be sparse (resp. not dense) even considering those deleted data. Note that Da(g, t + gap) is the density of the grid upon deletion assuming no previous deletion has ever occurred. The result shows that, after an initial phase, deleting sporadic grids does not affect the clustering results.
4.3 Clustering algorithms
We describe the algorithms for generating the initial clus- ter and for adjusting the clusters every gap steps. The pro- cedure initial clustering (used in Line 10 of Figure 1) is il- lustrated in Figure 3. The procedure adjust clustering (used in Line 14 of Figure 1) is illustrated in Figure 4. They first update the density of all active grids to the current time. Once the density of grids are determined at the given time, the clustering procedure is similar to the standard method used by density-based clustering. It should be noted that, during the computation, when- ever we update grids or find neighboring grids, we only con- sider those grids that are maintained in grid list. There- fore, although the number of possible grids is huge for high- dimensional data, most empty or infrequent grids are dis- carded, which saves computing time and makes our algo- rithm very fast without deteriorating clustering quality.
5. EXPERIMENTAL RESULTS
We evaluate the quality and efficiency of D-Stream and compare it with CluStream [1]. All of our experiments are conducted on a PC with 1.7GHz CPU and 256M memory. We have implemented D-Stream in VC++ 6.0 with a Matlab
Figure 3: The procedure for initial clustering.
Figure 4: The procedure for dynamically adjusting clusters.
graphical interface. In all experiments, we use Cm = 3.0, Cl = 0.8, λ = 0.998, and β = 0.3. We use two testing sets. The first testing set is a real data set used by the KDD CUP-99. It contains network in- trusion detection stream data collected by the MIT Lincoln laboratory [1]. This data set contains a total of five clus- ters and each connection record contains 42 attributes. As in [1], all the 34 continuous attributes are used for cluster- ing. In addition, we also use some synthetic data sets to test the scalability of D-Stream. The synthetic data sets have a varying base size from 30K to 85K, the number of clusters is set to 4, and the number of dimensions is in the range of 2 to 40. In the experiments below, we normalize all the at- tributes of the data sets to [0, 1]. Each dimension is evenly partitioned into multiple segments, each with length len.
5.1 Evolving data streams with many outliers We find that the sequence order of data stream can make great effect on the clustering results. In order to validate the effectiveness of D-Stream, we generate the synthetic data sets according to two different orders.
First, we randomly generate 30K 2-dimensional data set in 4 clusters, including 5K outlier data that are scattered in the space. The distribution of the original data set is shown in Figure 5. These clusters have nonconvex shapes and some are interwoven. We generate the data sequentially at each time step. At each time, any data point that has not been generated is equally likely to be picked as the new data record. Therefore, data points from different clusters and those outliers alternately appear in the data stream. The final test result by D-Stream is shown in Figure 6. we set len = 0.05. From Figure 6, we can see that the algo- rithm can discover the four clusters without user supply on the number of clusters. It is much more effective than the k-means algorithm used by CluStream since k-means will fail on such data sets with many outliers. We can also see that our scheme for detecting sporadic grids can effectively remove most outliers.
Figure 5: Original distribution of the 30K data.
Figure 6: Final clustering results on the 30K data.
In the second test, we aim to show that D-Stream can capture the dynamic evolution of data clusters and can re- move real outlier data during such an adaptive process. To this end, we order the four classes and generate them se- quentially one by one. In this test, we generate 85K data points including 10K random outlier data. The data distri- bution is shown in Figure 7. The speed of the data stream is 1K/second, which means that there are 1K input data
points coming evenly in one second and the whole stream is processed in 85 seconds. We check the clustering results at three different times, including t 1 = 25, t 2 = 55, and t 3 = 85. The clustering results are shown from Figure 8 to 10. It clearly illustrates that D-Stream can adapt timely to the dynamic evolution of stream data and is immune to the outliers.
Figure 7: Original distribution of the 85K data.
Figure 8: Clustering results at t 1 = 25.
Figure 9: Clustering results at t 2 = 55.
5.2 Clustering quality comparison We test D-Stream on the synthetic data set and KDD CUP-99 data set described above under different grid gran- ularity. The correct rates of clustering results at different times are shown in Figure 11 and 12. In the figures, len
Figure 14: Efficiency comparison with varying sizes of data sets.
Figure 15: Efficiency comparison with varying di- mensionality.
the grids using a density-based algorithm. In contrast to previous algorithms based on k-means, the proposed algo- rithm can find clusters of arbitrary shapes. The algorithm also proposes a density decaying scheme that can effectively adjust the clusters in real time and capture the evolving be- haviors of the data stream. Further, a sophisticated and theoretically sound technique is developed to detect and re- move the sporadic grids in order to dramatically improve the space and time efficiency without affecting the cluster- ing results. The technique makes high-speed data stream clustering feasible without degrading the clustering quality.
7. ACKNOWLEDGEMENT
This work is supported by Microsoft Research New Fac- ulty Fellowship and National Natural Science Foundation of China Grant 60673060.
8. REFERENCES
[1] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A
framework for clustering evolving data streams. In Proc. VLDB, pages 81–92, 2003. [2] B. Babcock, M. Datar, R. Motwani, and L. O’Callaghan. Maintaining variance and k-medians over data stream windows. In Proceedings of the twenty-second ACM symposium on Principles of database systems, pages 234–243, 2003. [3] S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering distributed data streams in peer-to-peer environments. Information Sciences, 176(14):1952–1985, 2006. [4] D. Barbar´a. Requirements for clustering data streams. SIGKDD Explorations Newsletter, 3(2):23–27, 2002. [5] J. Beringer and E. H¨ullermeier. Online-clustering of parallel data streams. Data and Knowledge Engineering, 58(2):180–204, 2006. [6] B.R. Dai, J.W. Huang, M.Y. Yeh, and M.S. Chen. Adapative clustering for multiple evolving streams. IEEE Transaction On Knowledge and data engineering, 18(9), 2006. [7] M. Garofalakis, J. Gehrke, and R. Rastogi. Querying and mining data streams: you only get one look. In Proc. ACM SIGMOD, pages 635–635, 2002. [8] L. Golab and M. T. ¨Ozsu. Issues in Data Stream Management. SIGMOD Record, 32(2):5–14, 2003. [9] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: Theory and practice. Trans. Know. Eng., 15(3):515–528, 2003. [10] S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams. In Annual IEEE Symp. on Foundations of Comp. Sci., pages 359–366, 2000. [11] O. Nasraoui, C. Rojas, and C. Cardona. A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Computer Networks, 50(10):1488–1512, 2006. [12] L. O’Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. Streaming-data algorithms for high-quality clustering. In Proc. of 18th International Conference on Data Engineering, pages 685–694, 2002. [13] S. Oh, J. Kang, Y. Byun, G. Park, and S. Byun. Intrusion detection based on on clustering a data stream. In Third ACIS International Conference on Software Engineering Research, Management and Applications, pages 220–227, 2005. [14] J. Sander, M. Ester, H. Kriegel, and X. Xu. Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov., 2(2):169–194, 1998. [15] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In Proc. VLDB, pages 187–198, 2006. [16] H. Sun, G. Yu, Y. Bao, F. Zhao, and D. Wang. S-tree: an effective index for clustering arbitrary shapes in data streams. In Research Issues in Data Engineering: Stream Data Mining and Applications, 15th International Workshop on, pages 81–88, 2005. [17] Z. Wang, B. Wang, C. Zhou, , and X. Xu. Clustering Data streams on the Two-tier structure. Advanced Web Technologies and Applications, pages 416–425,