Automatic Identification of Class Refactorings using Vector Space Cosine Similarity | Lecture notes Engineering

An Automatic Approach to identify Class Evolution Discontinuities

Giuliano Antoniol∗Massimiliano Di Penta∗and Ettore Merlo∗∗

[email protected], [email protected], [email protected]

∗RCOST - Research Centre on Software Technology

University of Sannio, Department of Engineering

Palazzo ex Poste, Via Traiano 82100 Benevento, Italy

∗∗ ´

Ecole Polytechnique de Montr´eal

Montr´eal, Canada

Abstract

When a software system evolves, features are added, re-

moved and changed. Moreover, refactoring activities are

periodically performed to improve the software internal

structure. A class may be replaced by another, two classes

can be merged, or a class may be split in two others.

As a consequence, it may not be possible to trace soft-

ware features between a release and another. When study-

ing software evolution, we should be able to trace a class

lifetime even when it disappears because it is replaced by

a similar one, split or merged. Such a capability is also

essential to perform impact analysis.

This paper proposes an automatic approach, inspired on

vector space information retrieval, to identify class evolu-

tion discontinuities and, therefore, cases of possible refac-

toring. The approach has been applied to identify refactor-

ings performed over 40 releases of a Java open source do-

main name server. Almost all the refactorings found were

actually performed in the analyzed system, thus indicating

the helpfulness of the approach and of the developedtool.

Keywords: Software Evolution, Releases, Refactoring,

Traceability

1. Introduction

Software systems continuously evolve to meet ever-

changing user needs. As a system evolves, new functional-

ities are added and existing ones are removed or modified.

In particular, when we look at the evolution of an Object–

Oriented (OO) software system, we see that the lifetime of

a class is only a limited segment in the whole system evo-

lution. When a class is not considered useful anymore, it

can be removed. On the contrary, new features can imply

the creation of new classes. The latter is, however, only part

of the reality. To improve the software internal structure,

maintainability and comprehensibility, refactoring [8, 11]

activities are periodically performed. At class level, such

refactorings may imply that new classes can be obtained by

splitting or merging old ones. Moreover, it may happen that

a class can be obtained factoring out part of another class

or, on the contrary, a class can be merged with another.

Often, for different reasons, those refactorings are not

documented. The lack of configuration management and,

in general, of a well-defined software development process

can cause the lost of traceability between related classes. As

a consequence, the software system maintainability and, in

general, its quality, tend to deteriorate. Software evolution

activities rapidly become extremely difficult as any change

may produce unpredictable side effects on other portions of

the system.

It would be greatly useful to connect the independent

segments representing class evolution during system life-

time. If, at a given release, a class terminates its life and

other two classes, obtained splitting the first one, appear,

then the three segments representing such classes should be

connected to indicate such a relationship.

The first, intuitive consequence of this information is re-

lated to understand software evolution: the lifetime of a

class should be studied also across events such as renam-

ing, replacement, merge and split. Second, the detection of

refactorings helps locating functionalities over classes, thus

giving a relevant support to software maintenance and, in

particular, impact analysis. Last but not least, the approach

can be a support to facilitate the reuse of test cases devel-

oped for the old class(es).

This paper proposes to adopt techniques inspired by In-

formation Retrieval (IR) approaches to automatically iden-

tify and document evolution discontinuities when analyz-

ing the evolution of OO source code at class level. The

approach is inspired from a number of studies [3, 5, 16, 17]

aimed at recovering any mapping between software artifacts

(e.g., free text documentation and code), or between subse-

quent releases of a software system.

Without loosing the generality of the proposed approach,

Automatic Identification of Class Refactorings using Vector Space Cosine Similarity, Lecture notes of Engineering

Related documents

Partial preview of the text

Download Automatic Identification of Class Refactorings using Vector Space Cosine Similarity and more Lecture notes Engineering in PDF only on Docsity!

An Automatic Approach to identify Class Evolution Discontinuities

Giuliano Antoniol∗^ Massimiliano Di Penta∗^ and Ettore Merlo∗∗

[email protected], [email protected], [email protected]

∗ RCOST - Research Centre on Software Technology

University of Sannio, Department of Engineering

Palazzo ex Poste, Via Traiano 82100 Benevento, Italy

∗∗ Ecole Polytechnique de Montr´´ eal

Montr´eal, Canada

Abstract

1. Introduction

2. Related Work

3. Background

4.4 Tool Support

5. Case Study

5.1 Case Study Results

References