Selection of Change-prone Software Components using the Expertise of Semantic Web and Intelligent Computing Methods
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Software component development, like generic software development, deals with “the construction of multi-version software” which will relentlessly be predisposed to changes either to add novel features or to reform source code for accommodating future changes or for removing existing defects as a part of corrective maintenance. Therefore, knowledge regarding the change-prone source code components of any software project or component is vital as they are prospective cradles of modifications and defects and could also depict possible design issues that need to be solved.
This thesis work proposes a novel change-prone software selection mechanism that employs Semantic web technology and Intelligent Computing Methods (ICM). The mechanism essentially selects those code components (Java files) of a software component that will be employed with change in the successive release of the component. Additionally, the work also identifies and establishes a cognitive aspect to the software change process.
We first begin by extensively surveying the existing research in order to understand the topic in concern from three primary standpoints:
(i) Software change prediction with an emphasis on the metrics used for software change prediction along with the prediction techniques employed, statistical tests used and validation strategies applied,
(ii) The cognitive complexity metrics introduced and validated in literature and their use to estimate any software development procedure, and
(iii) Application of Semantic web technologies for software component-based tasks with a key focus on the year wise trend of the research articles and possible justification of the usage of Semantic web technology and tools for a specific phase of component-based software development.
From the literature studied, we observed that it is challenging to modify code fragments from existing software that are difficult to comprehend. However, since systematic software maintenance includes an extensive human activity, cognitive complexity is one of the intrinsic factors that could potentially contribute to or impede an efficient software maintenance practice, the empirical validation of which remains vastly unaddressed. Thusly, we first conduct an experimental analysis in this thesis work in which the software developer’s level of difficulty in comprehending the software: the cognitive complexity, is theoretically computed and empirically evaluated for estimating its relevance to actual software change. For multiple successive releases of two software components (plugin projects written in Java), where the source code of a previous release has been substantively used in a novel release, we calculate the change results and the values of the cognitive complexity for each of the version’s source code Java files. We construct eight datasets and build predictive models using statistical analysis and Machine Learning (ML) techniques. The pragmatic comparative examination of the estimated cognitive complexity against prevailing metrics of software change and software complexity clearly validates the cognitive complexity metric as a noteworthy measure of version to version source code change.
Secondly, several studies exist in literature that focus on finding the association between change-proneness and software metrics. Nevertheless, since the quest for the best classifier for change-prone source code elements is an ongoing process, we then present a pervasive framework for software change prediction by investigating various techniques from the categories of Intelligent Computing Methods (ICMs) and Statistical approaches for creating version to version change-prediction models with respect to Java files. The performance of the models is assessed during two validation scenarios: k-fold intra-release validation and inter-release validation, where the latter is useful for estimating the trend of change-proneness of files in the upcoming versions. Our experiments indicate that the prediction techniques perform differently under the selected validation settings, at the same time confirming the proficiency of the selected prediction techniques in lieu of developing change-proneness prediction models. Such models could aid the software engineers in the initial stages of software development for classifying change-prone Java files for the analyzed plugin project versions, in turn aiding in the trend estimation of change-proneness over future versions.
Lastly, we present the change-prone Java file selection mechanism developed using ICM and Semantic web technology. The framework proceeds with the construction of an ontology in Protégé with respect to the Java files in each of the selected software component versions using static source code metrics as attributes. It then employs Semantic Web Rule Language (SWRL) and Drools inference engine to formulate and induce potential rules acquired via the most appropriate ICM that classifies a Java file as change-prone or stable. The relevant change-prone or stable files are then selected via the Semantic Query-Enhanced Web Rule Language (SQWRL) commonly employed for extracting information from ontologies. Additionally, we demonstrate how information (in the form of patterns or rules) which is constructed by a prediction technique to make relevant predictive decisions can be expressed in the ontology via valid SWRL rules. Since SWRL supports monotonic inference only, this can result in an incomplete inference should the rule contain more than one negation. We provide a methodology to express such rules in SWRL without any loss of relevant inference. The proposed selection mechanism is evaluated on various successive releases of the selected plugin projects and clearly establishes the applicability of Semantic web principles with respect to building a successful change-prone Java file selection mechanism.
More generally, the findings of this thesis identify if cognitive complexity plays a role in software change and if the existing prediction techniques for software change are cogent enough to predict the version to version change-proneness of Java files of software components. It also analyses if the two paradigms: Semantic web technology and an ICM could in fact be employed in unison to select change-prone Java files from a software component version.
