Distributed Stream Processing of Twitter Data using Apache Spark

dc.contributor.authorShruti Arora
dc.contributor.supervisorRani, Rinkle
dc.date.accessioned2018-08-08T07:43:06Z
dc.date.available2018-08-08T07:43:06Z
dc.date.issued2018-08-08
dc.descriptionMaster of Engineering- CSEen_US
dc.description.abstractData is continuously being generated from sources such as machines, network traffic, sensor networks, etc. Twitter is an online social networking service with more than 300 million users, generating a huge amount of information every day. Twitter’s most important characteristic is its ability for users to tweet about events, situations, feelings, opinions, or even something totally new, in real time. Currently there are different workflows offering realtime data analysis for Twitter, presenting general processing over streaming data. This study will attempt to develop an analytical framework with the ability of in-memory processing to extract and analyze structured and unstructured Twitter data. The proposed framework includes data ingestion and stream processing and data visualization components with the Apache Kafka and Apache Flume messaging system that is used to perform data ingestion task. Furthermore, Spark makes it possible to perform sophisticated data processing and machine learning algorithms in real time. We have conducted a case study on tweets and analysis on the time and origin of the tweets. We also worked on study of SparkML component to study the K-Means Clustering algorithm.en_US
dc.identifier.urihttp://hdl.handle.net/10266/5179
dc.language.isoenen_US
dc.subjectStream Processingen_US
dc.subjectApache Sparken_US
dc.subjectTwitteren_US
dc.subjectSparkMLen_US
dc.subjectApache Kafkaen_US
dc.titleDistributed Stream Processing of Twitter Data using Apache Sparken_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
801632045_CSE_ShrutiArora.pdf
Size:
2.45 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.03 KB
Format:
Item-specific license agreed upon to submission
Description: