Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/1069
Title: Knowledge Discovery in Databases Processing Using Improved Data Mining Techniques
Authors: Manchanda, Sanjeev
Supervisor: Dave, Mayank
Singh, S. B.
Keywords: Data Mining;Knowledge Discovery in Databases;KDD;Machine Learning;Classification;Supervised Learning
Issue Date: 10-May-2010
Abstract: This thesis focuses on problems related to supervised learning for data mining and system development for knowledge based systems. These problems have been analyzed and an effort have been made to find solutions for the problems related to earlier processes and models for supervised learning and knowledge based system development respectively. Initially this research work concentrates upon investigating past developments in the areas of data mining, supervised learning as well as software and knowledge engineering to find the past developments through literature survey. Then this research work concentrates upon enumerating different processing methods used for supervised learning. These different processes are devised through a general Knowledge Discovery in Databases (KDD) process. From this general process, six specific processes were identified to be widely practiced worldwide. After investigating different processes for supervised learning, limitations of these methods are identified. After identifying problems related to different supervised learning processes, investigations are directed towards finding the solutions for these problems. Search for finding solutions of these problems motivated to develop a new process for supervised learning. This motivation resulted in the form of a new process named as Fuzzy Boundaries of Regression Based Clusters process. Proposed process is based on parallel processing of classification as well as regression algorithms. Proposed process allows pruning the training data without compromising the performance of outcome. It reduces the size of the train set significantly, while analyzing each record of data qualitatively as well as quantitatively through classification and regression algorithms respectively. Proposed process as well as previously known processes are applied on twenty classification algorithms and five regression algorithms over ten datasets gathered from internationally renowned organizations. After investigating six previously used processes and a new proposed process for supervised learning, question arose about suitability of previously known system development models for knowledge based system that involves the processing of two or more of these processes simultaneously. All previously known models are investigated for this purpose, but none of the system development model is recognized to be satisfactory for such a dynamic knowledge based system development. This necessity to find satisfactory model led to development of model for developing and maintaining dynamic knowledge based systems. This led to the construction of ‘Genetic Information System Development and Maintenance’ model. This model postulates the need of creating and maintaining a team of software developers within the organization for whom system is to be developed. Further this model advocates the need of system development and maintenance activities to run simultaneously throughout the life of an organization. Different advantages of proposed model are enumerated. Proposed process for supervised learning and model for knowledge based system is investigated through quantitative evaluation. So, a case study was used to hypothesize and prove different aspects of this model. Massive experimentation work is performed for evaluating and comparing previously known supervised learning processes as well as proposed process. Experimentation setup is divided into two sub categories viz. Major Study and Minor Study. Major study includes twenty classification algorithms, five regression algorithms and ten datasets for evaluating eight processes, whereas Minor study includes five classification algorithms, five algorithms and ten datasets for evaluating eleven processes. Results of proposed and previously practiced processes are compared through tables and graphs. ‘Genetic Information System Development and Maintenance’ model is evaluated through a case study of Pepsi Foods Ltd., India. Proposed model is implemented in this organization and data was gathered through this implementation. Gathered data is compared with data collected from earlier system development of same organization. Hypothesized results have been presented for this model.
URI: http://hdl.handle.net/10266/1069
Appears in Collections:Doctoral Theses@SOM

Files in This Item:
File Description SizeFormat 
1069.pdf3.23 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.