Detecting Domain From Source Code Using Semantic Clustering

Madan, Sanjay

Detecting Domain From Source Code Using Semantic Clustering

Files

867 Sanjay Madan (80732016).pdf (2.87 MB)

Date

2009-08-12T08:20:35Z

Authors

Madan, Sanjay

Supervisors

Batra, Shalini

Abstract

To understand the software source code lots of approaches have been developed and many of them concern to the program structural information but this results in the loss of domain semantic crucial information contained in the text or symbols of source code. To understand software as a whole, we need to enrich these approaches with conceptual insights gained from the domain semantics. This thesis proposes the mapping of domain to the code using the information retrieval techniques to use linguistic information, such as identifier names and comments in source code. Here we introduce an algorithm based on the concept of Semantic Clustering to group source artifacts based on how the synonymy and polysemy is related. The algorithm uses the concept of Latent Semantic Indexing (LSI). The biggest advantage of the approach used is that it works at the source code textual level thus making it language independent. It correlates the semantics with structural information applies at different levels of abstraction (e.g. packages, classes, methods). After detecting the clusters, based on semantic similarity automatic labeling of the program code is done and is visually explored. Since 3-Dimension visualization makes the concept detection much easier, we have concentrated on visualization of semantic clusters detected in the source code.

Keywords

semantic clustering, LSI, semantics, clustering

URI

http://hdl.handle.net/10266/867

Collections

Masters Theses@CSED

Full item page

Detecting Domain From Source Code Using Semantic Clustering

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By