Please use this identifier to cite or link to this item: http://hdl.handle.net/10266/867
Title: Detecting Domain From Source Code Using Semantic Clustering
Authors: Madan, Sanjay
Supervisor: Batra, Shalini
Keywords: semantic clustering;LSI;semantics;clustering
Issue Date: 12-Aug-2009
Abstract: To understand the software source code lots of approaches have been developed and many of them concern to the program structural information but this results in the loss of domain semantic crucial information contained in the text or symbols of source code. To understand software as a whole, we need to enrich these approaches with conceptual insights gained from the domain semantics. This thesis proposes the mapping of domain to the code using the information retrieval techniques to use linguistic information, such as identifier names and comments in source code. Here we introduce an algorithm based on the concept of Semantic Clustering to group source artifacts based on how the synonymy and polysemy is related. The algorithm uses the concept of Latent Semantic Indexing (LSI). The biggest advantage of the approach used is that it works at the source code textual level thus making it language independent. It correlates the semantics with structural information applies at different levels of abstraction (e.g. packages, classes, methods). After detecting the clusters, based on semantic similarity automatic labeling of the program code is done and is visually explored. Since 3-Dimension visualization makes the concept detection much easier, we have concentrated on visualization of semantic clusters detected in the source code.
URI: http://hdl.handle.net/10266/867
Appears in Collections:Masters Theses@CSED

Files in This Item:
File Description SizeFormat 
867 Sanjay Madan (80732016).pdf2.94 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.