Development of an Efficient Semantic Code Clone Detection Technique

Tekchandani, Rajkumar

Development of an Efficient Semantic Code Clone Detection Technique

Files

raj_951003002.pdf (1.92 MB)

Date

2018-08-16

Authors

Tekchandani, Rajkumar

Supervisors

Bhatia, Rajesh

Singh, Maninder

Abstract

Over the last few years, code clones have emerged as an active area of research because of their wide range of applications in di erent domains of software engineering. Code clones are the result of copy paste activities. Similar code fragments that exist at di erent locations are called code clones. Code clones are reported in the form of clone pairs. Clone pairs are further clustered to form code clone groups. Code clones are broadly categorized into four types from Type 1 to 4. In literature, numerous code clone detection techniques exist to nd di erent types of code clones. Knowledge extraction from existing software resources for maintenance, re-engineering and bug removal through code clone detection is an integral part of software systems. Code clone detection techniques are mainly classi ed into text based, token based, tree based, metric based and semantic code clone detection techniques. Most of the existing semantic code clone detection techniques in literature are based on the comparison of program dependence graphs through sub graph isomorphism, which is NP-Complete. Moreover, these techniques for semantic code clone detection are unable to provide heuristic solution for problems such as statement reordering, inversion of control predicates and insertion of irrelevant statements which may cause a performance bottleneck. To address these issues, we proposed a novel approach that nds semantic code clones between code fragments using data ow analysis on the basis of reaching de nition and liveness analysis. The algorithm based on reaching de nition and liveness analysis is designed to nd similar code fragments which are structurally divergent, but semantically equivalent. The results obtained demonstrate that the proiii posed approach using reaching de nition and liveness analysis is e ective in detection of semantic code clones for various applications. Results obtained on subject systems taken from DeCapo Benchmark con rms the e ectiveness of the proposed approach. Further, code clone groups are extracted among di erent versions of the program le distributed over thousands of commit hashes in distributed version control system (DVCS). Code clone group extraction has many software applications that help in refactoring and maintenance of code in open source software systems. The evolution of code clone groups across the history of a software system is termed as code clone genealogy. Most of the existing solutions for code clone group extraction are based on text similarity among di erent versions of program les stored in centralized version control system (CVS). However, existing proposals in literature for code clone group extraction fail to extract code clone groups among di erent versions of program les stored in distributed version control system. To address these issues, we presented a novel Git code clone group extraction model based on transitive closure computation on directed acyclic graphs using Big Data Technologies. Our insight is to extract clone pairs from thousands of commits on a software system in Git by transitive closure computation and mapping of clone pair parameters in genealogy to extract code clone evolution patterns in graph database (Neo4j). We e ciently detected code clone genealogies on Git based e-health care system and created a scalable solution. We performed evaluations on OpenMRS, an open source e-health system on Git and presented interesting code clone evolution relationships in code clone genealogy. The performance of the proposed approach is evaluated using parameters such as transitive depth, ratio of similarity and count of clones.

Keywords

Code clones, Clone groups, Reaching definition analysis, Liveness analysis, Code clone genealogy

URI

http://hdl.handle.net/10266/5248

Collections

Doctoral Theses@CSED

Full item page

Development of an Efficient Semantic Code Clone Detection Technique

Files

Date

Authors

Supervisors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By