Please use this identifier to cite or link to this item:
Title: Designing and Developing a Machine Learning based Code Smell Detection Technique
Authors: Kaur, Amandeep
Supervisor: Jain, Sushma
Goel, Shivani
Keywords: Software Maintenance;Code Smells;Optimization;Machine-learning;J48;Prioritization;Software Quality attributes
Issue Date: 13-Jan-2021
Abstract: Software systems have become prevalent and significant in our today’s society. These systems are becoming the core business of several industrial companies and, for this reason, these systems are getting bigger and more complex. In addition, these systems are subject to frantic modifications every day with respect to the introduction of new functionality or bug fixing operations. In this sense, developers also do not have the ability to design and execute ideal solutions, contributing to "code smells" being introduced. Code smells refer to bad design and development practices commonly observed in software system. These smells reflect the sub-optimal design choices applied in the source code by developers. Code smells are the symptoms that indicate problems in the coding part of software which makes software hard to change and maintain. Several studies demonstrated the negative impact of code smells on the maintainability of software as well as on the ability of developers to comprehend a software system. That is why, several automated techniques and tools have been devised to discover parts of code affected by design flaws in order to improve their quality. Most of these techniques rely on the analysis of the structural properties (e.g., method calls) mined from the source code. Despite the efforts of academicians and practitioners in recent years, there are still limitations that threaten the industrial applicability of techniques and tools for code smell identification. Specifically, there is a lack of evidence regarding the circumstances that lead to the introduction of code smells and the real effect of code smells on maintainability, since previous research focused the attention on a small number of software projects. Furthermore, in literature, the existing code smell detectors might be inadequate for detecting many code smells. One reason for inadequacy includes the dependence of existing techniques on only the structural properties of software systems. However, instead of structural properties extractable from the source code, a variety of code smells are intrinsically characterized by how code elements evolve over time. There is a continual need of high quality software. Therefore, code smell detection and removal in the earlier phase will reduce the maintenance cost, helps the developers to improve software maintainability, readability and extendibility while increasing the speed at which programmer write their code and maintains the software. Code smell detection can be performed at various levels such as, requirement, design and coding. The primary focus of this thesis is on coding level, where software systems are improved using two steps: (a) detecting code smells and (b) prioritizing the code smells and analyzing their impact on software quality. To achieve these objectives, following three contributions are made in this thesis: • First, J48 machine-learning algorithm is utilized for code smell detection. Code smell examples in the form of rules along with metrics specifications are given as a training dataset to J48. Code smell detection model is then trained and tested on considered dataset for finding code smells. The performance of proposed technique is evaluated on three Java open source softwares namely, GanttProject, Xerces-J, and Log4j to identify Blob, Functional Decomposition, Spaghetti Code, Feature Envy, and Data Class code smells. The results of J48 model are compared with some well-known machine-learning techniques and analyzed that the proposed model provides significantly better results. However, J48 suffers from the hyper-parameters tuning issue. Therefore, to tune the hyper-parameters of J48, a novel Sandpiper Optimization Algorithm (SPOA) is proposed. SPOA is further used in conjunction with J48 machine-learning algorithm called "SP-J48" to find the most significant set of metrics in order to identify code smells. SPOA is assessed on standard benchmark test functions to ensure its applicability in terms of convergence and computational complexity. The performance of the proposed algorithm is compared with other well-known optimization algorithms. Extensive experimental results indicate that the proposed SP-J48 provides significantly better results as compared to the competitive machine learning models. Additionally, the proposed approach is extended by using C5.0 over J48 and named as "C5.0-SPOA" because C5.0 is a modification of J48 that provides accurate results, fast speed and generate smaller trees. C5.0-SPOA is then applied to identify eight code smells from five Java open source softwares. The performance of proposed approach is evaluated on the basis of Precision, Recall, and F-measure performance metrics. The results show that the proposed approach provides significantly better results as compared to other existing techniques. • Second, the proposed SPOA algorithm is further employed to prioritize the identified code smells for refactoring. The identified code smells are prioritized on the basis of three parameters such as, versioning history, architectural relevance, and code smell relevance. The performance of proposed prioritization approach is evaluated using three performance metrics namely, Code Smells Correction Ratio (CSCR), Estimated Effort (EE), and Severity of Fixed Code Smells of a system (SFCS). The experimental results reveal that the proposed prioritization approach helps the developers to reduce their efforts, saves time and improves productivity. • Third, the impact of code smells prioritization on software internal quality attributes such as, cohesion, coupling, complexity, inheritance and size is analyzed. Three different software versions of each applications including, original version, version generated after removing code smells in a random order and version generated after removing code smells in prioritized sequence given by proposed prioritization approach are analyzed. Chidamber and Kemerer (C&K) metric suit is used to assess the impact of code smells on internal quality attributes as it is most commonly used and covers all aspects of internal quality measures. The obtained results show that the code smells removed using prioritized sequence enhances the quality of software. Moreover, to validate these results, a pair-wise t-test is also conducted at 5% level of significance.
Appears in Collections:Doctoral Theses@CSED

Files in This Item:
File Description SizeFormat 
Thesis_final (Aman).pdf20.28 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.