Query Estimation in Data Streams Using Micro-Clustering
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Advancement in technology has lead to availability of inexpensive electronic devices
everywhere. These devices and various applications automatically generate
a large amount of data which is increasing exponentially. The data can grow at
a high rate of millions of data items per day for business and scienti c applications.
A large number of applications generate continuous, transient large stream
of data. For example the applications that naturally generate data streams are
nancial tickers, log records or click-streams in web tracking and personalization,
manufacturing processes, data feeds from sensor applications, sensor network,
performance measurements in network monitoring and tra c management, call
detail records in telecommunications, email messages.
The analysis of large amount of data generated by various applications can create
a lot of opportunities. For example, analyzing data of patients to diagnose
the cause of disease, to design marketing strategies, predicting investment strategies,
analyzing customer behavior. We need e cient techniques to analyze and
process these unbounded data streams for useful information. However conventional
techniques may not be applicable for their analysis. The processing of data
stream requires single pass processing with limited memory. A number of techniques
have been proposed for analysis of data streams meeting rigid processing
requirement. These methods use various synopsis techniques such as sampling,
wavelets, sketch etc.
Micro-clustering is a synopsis technique used for clustering and classi cation of
data stream. In this work we investigate how to estimate queries over large
data streams using micro-clustering and cosine series. We store summary of data
stream in micro-clusters and process clusters of data for estimating queries over
streams. In order to assess the technique we conducted an experimental study.
As the results of this study reveal, our technique outperform competitor method.
Description
Ph.D, CSED, Thesis
