Design and Development of an Efficient Software Clone Detection Technique
Loading...
Files
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Reusing software by means of copy and paste is a frequent activity in software development. In source code and other software artifacts, the original (code) fragment is copied and pasted with or without modifications. The pasted (code) fragment is said to be a clone and this activity is known as (code) cloning. The presence of code clones in the software may increase the post implementation maintenance (preventive and adaptive) effort. Code cloning increases the probability of bug propagation. Software clones are classified depending upon the type of similarity between two code fragments and the level of granularity. There are many reasons which promote software cloning. Complexity of the large systems makes it difficult for the software developer to understand the functionality. It promotes copying the existing functionality and logic. Sometimes programmers are forced to copy and paste code due to limitations of code reuse in programming languages. Moreover, programmers often fear to bring in new ideas in existing software. It is easier to reuse the existing code than to develop a fresh solution since new code may introduce new errors. There is an urgent need to detect clones in various software artifacts. Now-a-days, model driven development has become standard industry practice, so the objective of the proposed work is to detect clones in object oriented systems by using Unified Modeling Language (UML) models. In the proposed work, two techniques are presented to detect clones in UML models. In our work, we surveyed wide range of literature. 213 articles out of a collection of 2039 are surveyed using the standard systematic literature review guidelines. We put an emphasis on clone management, model clones, and semantic clones and classified the literature in different key areas. The focus of our survey is broader than the earlier surveys and includes the latest research work related to software clones. In addition to clone detection tools and methods, we have addressed other issues related to software clone research such as clone analysis, clone evolution and impact of clones on software quality. We used a systematic method to develop a clone management map which identifies how clone management papers overlap with clone detection method papers and clone detection tool papers. We explored the model based and semantic clones in detail and compared the state of the art techniques.
v
The first approach detects clones in UML class models. The technique accepts the XMI file of a UML class model as input. The core of our technique is the construction of a labeled, ranked tree by carefully mapping the elements parsed from the XMI file to the tree representation. The duplicate subtrees are grouped and clustered with the aim to detect exact and meaningful clones. The major contributions of the first approach are:
Detection of model clones in UML class diagrams at different levels of granularity i.e. single attribute/operation, set of attributes/operations and recurring classes with their members.
Detection and classification of model clones as:
o Type-1 : model clones due to standard modeling/coding practice
o Type-2 : model clones by purpose
o Type-3 : model clones due to design practices
Since UML modeling has got inherent object oriented features, thus our classification of model clones is inspired from these object oriented characteristics of UML class model. We carried out the empirical evaluation on reverse engineered open source systems due to the unavailability of standard repository of UML models recognized by modeling community. Moreover, we are also considering forward designed models for evaluation to capture the essence of model driven development and to check the practical relevance of the proposed approach. We believe that the results of the tool are accurate and relevant for practical purposes and demand further investigation.
The second technique is based on computing similarity across object oriented programs at different levels of granularity. The tool is able to detect concept level similarities by applying latent semantic indexing and principal component analysis. Detection of high level similarities can help in comprehending the design of the system for better maintenance. We have extended our tool to detect similarities for UML diagrams by measuring the distance between two class models. In addition, we mined important change patterns at method level using multi-version program analysis by applying the proposed technique throughout the evolutionary history of the open source software dnsjava. We have validated the similarity score by applying the tool at function level in the source code.
Description
Ph.D, CSED
