Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/5022
Title: Efficient Pattern Mining of Big Data using Graphs
Authors: Bhatia, Vandana
Supervisor: Rani, Rinkle
Keywords: Big data;Graph Mining;Pattern Mining;Large Graphs;Frequent Subgraph Mining
Issue Date: 8-Jun-2018
Abstract: Big data has great amount of hidden knowledge and many insights which have raised remarkable challenges in knowledge discovery and data mining. For certain types of data, the relationship among the entities is of much more importance than the information itself. Big data has many such connections which can be mined efficiently using graphs. However, it is very challenging to obtain ample profits from this complex data. To overcome these challenges, graph mining approaches such as clustering and subgraph mining are used. In recent times, these approaches have become an indispensable tool for analyzing graphs in various domains. This thesis presents research work undertaken in the field of pattern mining approaches for large graphs. The main objective of this research is to investigate the benefits of using scalable approaches for mining large graphs. Two fuzzy clustering algorithms namely „PGFC‟ and „PFCA‟ are proposed for large graphs using different concepts of graph analysis. Furthermore, a scalable deep learning based fuzzy clustering model named „DFuzzy‟ is proposed that leverages the idea from stacked autoencoder pipelines to identify overlapping and non-overlapping clusters in large graphs efficiently. Our proposed clustering approaches are proved to be effective for small and large graph dataset, and generate high quality clusters. For mining frequent subgraphs, a scalable frequent subgraph mining algorithm named „PaGro‟ is proposed for a single large graph using pattern-growth based approach. In PaGro, a two-step hybrid approach is developed for optimization of subgraph isomorphism and subgraph pruning task at both local and global levels to avoid the excess communication overhead. Additionally, an approximate frequent subgraph mining algorithm named „Ap- FSM‟ is proposed which exploits PaGro using sampling for faster processing. A novel sampling approach is proposed in „Ap-FSM‟ for the selection of an approximate subgraph while capturing the original graph properties for convenient and relatively easy analysis. The results of PaGro and Ap-FSM show that both outperform the competent algorithms in various aspects of processing Time, no. of iterations and memory overhead. It is suggested that the utilization of graph clustering and frequent subgraph mining generate discriminate and significant patterns, which can help in many tasks such as classification and indexing of big data.
Description: Doctor of Philosophy -Computer Science
URI: http://hdl.handle.net/10266/5022
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Vandana_PhD-Thesis.pdf3.13 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.