Document and entity centric analysis of Google+ activity data
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Social media has become very popular way of communication among internet users in the recent years. Humans have an ingrained tendency to share their ideas, experiences and knowledge, which associate them with the rest of the world, so that they can be recognized and can also identify their importance and worth. They are eager to know about happenings around them, that is why they communicate in order to share their ideas, observations and queries. Social media is one such communication medium that made people to be heard and satisfy their curiosity to know about rest of the world. A huge amount of unstructured data is available for analysis on the social web. The data available on these sites have redundancies as users are free to enter the data according to their knowledge and interest. This data needs to be cleaned before doing any analysis due to the presence of various redundancies in it. In this research, Google+ activities data is extracted from Google+ API. This dataset is first cleaned by removing various HTML tags and stopwords present in the activities content. This human language data is queried using TF-IDF to find out the document of interest and similarity between there document is computed using Cosine Similarity metric and similarity of documents is visualized as a matrix diagram. The various collocations present in the activity data have been analyzed. Further the summary of each activity is extracted by using Luhn’s document summarization algorithm. Various entities present in post/activities are also extracted and interactions between them are analyzed.
Description
ME-CSE-Thesis
