Detecting Domain From Source Code Using Semantic Clustering
Loading...
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
To understand the software source code lots of approaches have been developed and
many of them concern to the program structural information but this results in the loss of
domain semantic crucial information contained in the text or symbols of source code. To
understand software as a whole, we need to enrich these approaches with conceptual
insights gained from the domain semantics. This thesis proposes the mapping of domain
to the code using the information retrieval techniques to use linguistic information, such
as identifier names and comments in source code. Here we introduce an algorithm based
on the concept of Semantic Clustering to group source artifacts based on how the
synonymy and polysemy is related. The algorithm uses the concept of Latent Semantic
Indexing (LSI). The biggest advantage of the approach used is that it works at the source
code textual level thus making it language independent. It correlates the semantics with
structural information applies at different levels of abstraction (e.g. packages, classes,
methods).
After detecting the clusters, based on semantic similarity automatic labeling of the
program code is done and is visually explored. Since 3-Dimension visualization makes
the concept detection much easier, we have concentrated on visualization of semantic
clusters detected in the source code.
