Graph-Based Approach for Improving Software Change Analysis, Summaries of Software Engineering

A graph-based approach for software change analysis, which aims to simplify the design and improve the performance and effectiveness of change analysis tools. The approach involves deriving relevant information from system code, saving it in a suitable representation, and using tools to manipulate this representation, generate system views, analyze software changes, and notify analysts of possible ripple effects.

Typology: Summaries

2021/2022

Uploaded on 07/05/2022

allan.dev
allan.dev 🇦🇺

4.5

(86)

1K documents

1 / 32

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Software Change Analysis Via Attributed
Graph-Based Representations
R. Al Zoubi and A. Prakash
Software Systems Research Laboratory
Department of Electrical Engineering and Computer Science
University of Michigan
Ann Arbor, MI 48109-2122
CSE-TR-95-91, May 15, 1991
Abstract
Software change analysis
is a preliminary step in the maintenance
of software systems.
Software understanding
and
ripple-eect analysis
are two major activities of this step. Experience has shown that these
two activities require substantial eort. Currently, they are done man-
ually, using little help from the software developmentenvironments.
In this paper, we propose an automated approachtochange analy-
sis that supports the role of the human analyst in these activities.
The basic idea behind this approach is to derive, from system code,
information relevanttochange analysis; to save this information in a
suitable representation; and to develop software tools that can interac-
tively manipulate this representation, generate system views, analyze
software changes, and notify the analysts for possible ripple eects of
suchchanges. The chosen representation is based on
attributed de-
pendency graphs
. One advantage of using these graphs is that they
explicitly reveal information of the corresponding system that is of vi-
tal importance both for software understanding and for change anal-
ysis. In addition, when change analysis to ols are based on graphs,
rather than on text, their design is simplied and their performance
and eectiveness improved.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20

Partial preview of the text

Download Graph-Based Approach for Improving Software Change Analysis and more Summaries Software Engineering in PDF only on Docsity!

Software Change Analysis Via Attributed

Graph-Based Representations

R. Al Zoubi and A. Prakash

Software Systems Research Lab oratory

Department of Electrical Engineering and Computer Science

University of Michigan

Ann Arb or, MI 48109-

CSE-TR-95-91, May 15, 1991

Abstract Software change analysis is a preliminary step in the maintenance of software systems. Software understanding and ripple-e ect analysis are two ma jor activities of this step. Exp erience has shown that these two activities require substantial e ort. Currently, they are done man- ually, using little help from the software development environments. In this pap er, we prop ose an automated approach to change analy- sis that supp orts the role of the human analyst in these activities. The basic idea b ehind this approach is to derive, from system co de, information relevant to change analysis; to save this information in a suitable representation; and to develop software to ols that can interac- tively manipulate this representation, generate system views, analyze software changes, and notify the analysts for p ossible ripple e ects of such changes. The chosen representation is based on attributed de- pendency graphs. One advantage of using these graphs is that they explicitly reveal information of the corresp onding system that is of vi- tal imp ortance b oth for software understanding and for change anal- ysis. In addition, when change analysis to ols are based on graphs, rather than on text, their design is simpli ed and their p erformance and e ectiveness improved.

1 Intro duction

Among all phases of software development, maintenance is the most exp en- sive. Lientz and Swanson [16] and Bo ehm [3] estimate that b etween 50% to 80% of software development budget is sp ent on maintenance. Change analysis is a preliminary step of this phase. During this step, a change an- alyst builds an understanding of the software system b eing maintained. He determines the changes to the system that are necessary to implement a desired maintenance change, and nds the ripple e ects of these pro jected changes. These activities (software understanding and ripple e ect analysis) are lab orious for many reasons:

 Understanding a software system is time-consuming [19]. Normally, software systems are represented by text that do es not reveal all the information necessary for change analysis. An analyst must derive such information each time he needs it. This may require extensive navigation through numerous do cuments of the software system.

 The limited ability of human analysts to memorize interrelationships b etween the entities of a complex software system hinders their under- standing.

 Change analysis has a recurrent nature: a small change can trigger a long chain of other changes, which, in turn, initiate new change analysis steps, and so on.

 Software development environments lack e ective to ols to supp ort the change analysis pro cess.

Change analysis is thus very exp ensive. Moreover, its failures are costly. Since the success of the whole maintenance phase dep ends directly on the success of change analysis, any failure in this step would require a new main- tenance cycle which would cost extra overhead. We b elieve that we can considerably reduce the cost of maintenance by improving change analysis. The ob jective of our research is to develop automatic aids to improve change analysis. We prop ose the following approach to achieve it:

  1. To derive information vital for change analysis and save it for use during the analysis pro cess.

manage software do cumentation but they also help software maintainers to navigate through them. The development of automated aids to manage and trace low-level infor- mation emb o died in the co de of the software system is of vital imp ortance for change analysis. In this pap er, we are mainly concerned with this devel- opment and we are limiting our discussions to it. We classify the to ols that have b een suggested to improve change analysis into two groups: view generators and ripple-e ect analyzers.

2.1 Software view generators

These to ols are to generate various views of software systems or to answer queries ab out them.

 The simplest view generators, the ma jority of which are batch-oriented, are cross-referencers. Munro [18] discusses the imp ortance of those that are interactive and suggests one for Pascal programs.

 OMEGA [17] is a system of to ols designed b oth to collect detailed information ab out programs written in a Pascal-like language and to use it later to generate program views. OMEGA uses the general pur- p ose relational database Ingres [10 ] to store this information. OMEGA stored very detailed information ab out programs, including individual statements, op erators, etc. This large amount of detailed information and use of Ingres tended to make the system slow [17].

 CIAS [5] is a system that provides view-generating capabilities for soft- ware systems written in C [13]. It provides b etter scalability than OMEGA b ecause it stores relations only b etween global entities in pro- grams. It has three to ols: the C Abstractor, the Information Viewer (InfoView), and the Software Investigator. The C Abstractor collects global relational information ab out software systems and stores this information in a sp ecial database system. Meanwhile, InfoView gener- ates any of several prede ned views of the system from this database. One problem with CIAS is that the generated information is stored in les or a database, rather than in memory. This makes it dicult to build ecient analysis to ols that require recursive traversal of relations b etween entities in a program.

 REFINE^1 is a knowledge-based software development environment that provides facilities for creating abstract syntax trees from language sp ec- i cations and browsing through an abstract syntax tree. The syntax trees can also b e analyzed or manipulated through to ols provided in REFINE. Compared to our system, REFINE is more of a p owerful pro- gramming environment to help build language-sp eci c software to ols rather than a provider of such to ols.

 VIFOR [21 ] is an exp erimental and interactive environment that uses a combination of co de and graphs to mo del FORTRAN programs. Its graphic displays help its users visualize some relationships b etween the comp onents of these programs. VIFOR do es not address issues of scal- ability or change analysis, but fo cuses on visualization techniques.

2.2 Ripple-e ect analyzers

Following exp erimental to ols can nd the side e ects of some of the changes made to software systems. Since the widely accepted textual representation of software systems is not ideal for the development of such analyzers, all these to ols are based on nontextual representations.

 Syntax-directed editors such as the Cornell Program Synthesizer [23, 27 ] and Gandalf [9] have to ols designed to enforce the structural con- straints of acceptable software systems. These to ols are based on at- tributed grammars and, during any editing op eration, implicitly ana- lyze the software representation to nd illegal op erations or p ossible side e ects of legal ones based on attribute propagation. Currently, these systems are limited to programming-in-the-small. Usually, to ols based on these systems require users to b e aware of the underlying syn- tax tree structure of programs and are limited to attribute propagation in a single le.

 MicroScop e [1] is a knowledge-based to ol for program understanding currently b eing develop ed. It is a system of to ols designed around a knowledge base of program information which includes the source co de (^1) REFINE is a registered trademark and co demark of Reasoning Systems, Inc., Palo

Alto, CA.

information. The combination of the co de and the graph is called graph-based representation.

  1. Finally, develop software to ols that use this representation to supp ort change analysis. The combination of text and graphs is a p owerful rep- resentation that allows development of ecient change analysis to ols.

This approach is similar to a handful of recently published pro jects in its ob jectives; these pro jects are describ ed in [1, 4, 6, 29 ]. However, our approach is di erent in two ways. First, we use a sp ecial class of graphs to represent relevant program information. Second, we use these graphs to de ne view generators and ripple-e ect analyzers that supp ort software change analysis. In other words, our approach is di erent in its priorities and in its implementation of the program representation.

I

e

c

a

f

r

e

t

n

User

Program Code Generator

Change Analyzer

View Generator

Program Graph

Graph

Figure 1: An Approach to Software Change Analysis

Figure 1 illustrates this approach. In it, Graph Generator is a set of op- erations that generate a Program Graph (this is an attributed graph) from Program Code; View Generator is a set of op erations that access the Pro- gram Graph to generate a set of program views; meanwhile, Change Analyzer

consists of op erations that analyze prop osed changes to the Program Code according to the information contained in the Program Graph. The Change Analyzer op erations serve another purp ose: they can call graph op erations to change the Program Graph. The arcs show the direction of the ow of pro- gram information b etween these comp onents. We elab orate on the function of these to ols in sections 5 { 9. Since there is an intimate relationship b etween change analysis of a soft- ware system and the programming language in which it is written, language- oriented change-analysis to ols are more p owerful than are other general to ols. We are developing a system of to ols to analyze Pascal programs. Pascal is a high-level language that shares many features with other languages such as Ada [2] and C. Similar to ols can b e develop ed for such languages.

4 Program Dep endency Graphs

A Pascal program^2 consists of a nite set of entities such as pro cedures, typ es, and variables. These entities are either primitive (i.e., language-de ned) or user-de ned. Programmers de ne new entities by rst, using the primitive entities and later, by using those that they previously de ned. In this resp ect, if the de nition of entity a uses another entity b, we say that a depends on b. For example, if pro cedure a uses a lo cal variable b, then a dep ends on b; or if variable a is of typ e b then a dep ends on b. The set of all such dep endencies b etween the entities of a program is a mathematical binary relation that is de ned on the set of these entities; it is called a dependency relation. A natural representation of this view of the program is a directed graph; each no de of the graph represents an entity of the program and each arc (or directed edge) of it represents a dep endency relation b etween the entities corresp onding to its vertices. For instance, if entity a dep ends on entity b then the arc a e ! b represents this relation. This directed graph is called a program dep endency graph (PDG). Dep endency graphs are used extensively for di erent purp oses in software engineering. They are used for program testing [20 ], debugging [28], opti- mization [8], and maintenance [24 ]. These graphs are structurally similar: each graph consists of a nite set of no des and a set of edges b etween them.

(^2) We only consider single- le programs here. In Section 9, we discuss how to extend this representation to multi- le programs.

3. OBJECTS

This class consists of all variables, variable parameters, constants, and values of user-de ned typ es.

  1. STATEMENTS This class consists of statement-entities, each of which corresp onds to a pro cedure/function in the rst class. In this pap er, we consider the outermost b egin{end comp ound statement of a pro cedure, a function, or a program as its statement-entity. We name a statement entity as the name of its subprogram concatenated with the string .st".

Entities of any class have similar characteristics, but entities of di erent classes di er in de nition and purp ose. To preserve the prop erties of these classes, the no des of the PDG are divided into four classes that are named like those of their asso ciated entities. Another attribute of a no de is the class to which it b elongs to. This attribute is vital for view generation and change analysis. The dep endency relations can b e classi ed, to o. By examining any Pas- cal program, we may nd a PROCEDURE that dep ends on an OBJECT parameter, a STATEMENT that dep ends on a lo cal or a global OBJECT, an OBJECT that dep ends on a TYPE, a TYPE that dep ends on an OB- JECT, and so forth. These dep endencies have di erent meanings and are thus, partitioned into three mutually disjoint classes:

  1. LOCAL dep endencies This class contains all dep endencies b etween a pro cedure (or function) and its lo cally de ned entities, and the dep endencies b etween a record typ e and its eld designators.
  2. PARAMETRIC dep endencies This class includes all relations b etween pro cedures (or functions) and their formal parameters.
  3. REFERENCING dep endencies This class includes the following relations:

 from an ob ject or a function to its typ e;  from a typ e to another typ e; and

 from a statement to an ob ject variable, a pro cedure, or a function that is used in the statement. If desired, references to variables can b e further re ned into two sub classes: read references and write references. This helps with data- ow analysis.

The class that an arc b elongs to is an attribute of the arc. The attributes that the elements of an APDG have are imp ortant for displaying the structure of the corresp onding software system: they show how the entities of the program interact with each other. Other attributes, such as the lo cation where each entity is de ned or referenced, can b e used to relate the APDG to its co de.

4.2 An Example

Figure 2 has a partially de ned Pascal program named book. In it, the entities sort and swap are PROCEDURES; the entities class, real, and integer are TYPES; the entities rst, list, j and temp are OBJECTS. As for the STATEMENTS, there are three of them, named book.st, sort.st and swap.st. These corresp ond to the statement parts of book, sort, and swap. The ma jority of these entities are represented by the no des of the graph of Figure 4.2. In this graph, each no de is attributed with the name of the entity to which it corresp onds. Also, no des that represent entities of di erent classes are represented by di erent shap es: a parallelogram is used to represent a PROCEDURE, a triangle is used to represent a TYPE, a square is used to represent a STATEMENT, and an oval is used to represent an OBJECT. There are many dep endencies b etween the entities of the program book. The relationship b etween sort and swap, the relationship b etween swap and swap.st, and the relationship b etween swap and temp are LOCAL dep enden- cies. The relationship b etween sort and last and the relationship b etween swap and p are PARAMETRIC dep endencies. Meanwhile, the relationship b etween list and class and the relationship b etween sort.st and swap are REFERENCING dep endencies. All these dep endencies are represented by arcs in the APDG graph that is shown in Figure 4.2. For instance, the rst dep endency is represented by the arc sor t !e sw ap, the second is represented by the arc sw ap !e sw ap:st, and so on. Notice that, in this gure, a wide arc is used to represent a LOCAL dep endency, a narrow arc is used to rep- resent a PARAMETRIC dep endency, and a dotted arc is used to represent

first last

book

first last

q

i j

temp swap.st

swap sort.st

class list sort

p

book.st

Figure 3: An Attributed Program Dep endency Subgraph

a REFERENCING dep endency. The graph of Figure 4.2 do es not show all the information of the APDG of the program book. For example, the no de of the typ e real and the arcs temp e ! r eal ; p !e r eal , and f ir st !e integ er are not shown. Even a small program has numerous dep endencies to show. To use this information e ectively, automated to ols must b e used to display selective information as desired. APDGs are suitable for this purp ose.

5 APDG Generator

This is a to ol for generating an APDG for any syntactically correct Pascal program. Graph generation is basically a compilation pro cess that generates an APDG instead of a syntax tree. During graph generation, the APDG generator creates a no de representation for each entity of the program, assigns attributes to the no de, and adds the no de to the graph. As mentioned b efore, no de addition is constrained so that the new graph re ects the asso ciation b etween the corresp onding program entity and its neighb ors. Let us describ e, for example, how to construct the subgraph of the pro- cedure swap as shown in Figure 2. One way to do it is as follows:

  1. Use an op eration of the graph generator to construct the subgraph of swap heading. This op eration creates a PROCEDURE no de to repre- sent this entity and assigns the name swap to it. This op eration then, links this new no de to the no de sort by the LOCAL arc sor t !e sw ap. The new no de must b e inserted as a right sibling of the OBJECT no de j. Similarly, this op eration creates an OBJECT no de for each param- eter of swap; i.e., for p and q, and links them to the no de swap by the PARAMETRIC arcs sw ap e ! p and sw ap e ! q. It also adds the REFERENCING arcs p !e integ er and q !e integ er b etween each of these no des and the no de of their typ e integer. These two edges are not shown in Figure 4.2.
  2. Use another op eration to graph the blo ck of swap. Like the parame- ters p and q, this op eration represents the lo cal variable temp by an OBJECT no de and links it to swap by the LOCAL arc sw ap !e temp and links it to real by the arc temp e ! r eal (which is not shown). It nishes the subgraph of swap by creating a STATEMENT no de

6 View Generator

This to ol uses the information contained in an APDG to generate views of the original program and answer many queries ab out it. Its capabilities play a ma jor role in solving the problems of understanding software systems. The APDGs' ready-to-use information facilitates the derivation of many views. The list of these views could b e long; it includes the following:

 Structural information These include:

{ Maps of structured typ es { Lo cal comp onents of pro cedures/functions { The values of a user-de ned typ e { The parameters of a given pro cedure

 Cross-referencing tables These include:

{ Variables of a given typ e { Statements that use a given variable { Variables, pro cedures, and functions used within a pro cedure state- ment { Lo cations of a given reference { Pro cedures that directly call a given pro cedure { Pro cedures that are directly called by a given one

 Miscellaneous views The following views can b e generated from APDGs:

{ Program metrics { Call graphs { Program anomalies { Structure charts

Recall that these views are generated for single- le programs. Extending the graph-based representation to multi- le programs will change, in numb er and imp ortance, the views that can b e generated from it. More examples are given in Section 9.

7 Sample Change Analysis To ols

Changing the co de of a software system usually involves changes to the sys- tem's structure or functionality or b oth. Since no automatic to ols can detect these structural and functional changes, human analysts must carry on this burden. We have designed several change analysis to ols to assist in analyzing Pascal programs. After a structural change has b een detected, these to ols can, rst, nd the e ects of this change on the program and communicate them to the user and, second, direct the human analyst's attention to the regions where functionality may b e a ected by the original change. The to ols may also give hints to indicate the typ e of change that has b een e ected. We based all change analysis to ols on APDGs. These graphs contain the ready-to-use structural information necessary for change analysis. In our prototyp e application, we have designed several op erations to analyze adding (or deleting) any comp onent to (or from) an APDG. We also designed, and are now implementing, op erations to analyze renaming a no de of the graph. Similarly, other op erations such as copy, move, replace, and undo can b e implemented. In fact, we are planning to do so at a later time. Change analysis op erations can manipulate partially de ned graphs. They do not check the completeness of the APDG; they op erate on what informa- tion is available and react accordingly. This allows the incremental develop- ment of graph-based representations.

7.1 Op erations that analyze additions

These op erations analyze the addition of one or more comp onents to an APDG. They are designed to check the validity of any additions to the graph, a step that is required to retain valid prop erties of the APDG b eing changed. If an addition is acceptable, the analyzing op erations would evaluate any direct side e ects of the action on the existing graph-based representation and they would communicate those e ects to the user. If the addition is

Deleting a REFERENCING arc is easy to p erform, but it might create unused or unde ned entities and could thus, initiate a sequence of one or more other deletions. Deleting a no de is a straightforward pro cess. However, it can create referencing problems which must b e solved. In any case, the op erations communicate any side e ects of the change to the user; according to the user's desires, the deletion is either implemented or ab orted. Deleting a subgraph can b e seen as a sequence of the deletions describ ed ab ove. An op eration nds all references to outside no des and checks the way in which their deletion will a ect the no des. It also, nds all references to the deleted no de and checks whether these unresolved references can b e rede ned, using the remaining APDG. Consider, for example, the deletion of a record typ e subgraph r. An analyzing op eration nds the set of all no des referenced by the de nitions of r 's elds. It, then, checks whether deleting these references would leave any outside no des unused. This op eration also nds all references to typ e r and tries to resolve them, using the remaining graph. If resolution is not p ossible, the op eration will either delete or ag these references. In all cases, the analyzer will communicate all its ndings to the user, who can then trigger further analysis steps.

7.3 Op erations that analyze renaming

This op eration analyzes the renaming of an APDG no de. This change cor- resp onds to renaming an entity of the program. First, this op eration will check whether the name is acceptable by its parent no de, its siblings, and its children (if any). This check is required in order to eliminate any p ossible naming con icts in the surrounding environment. If the name is acceptable, this op eration searches the scop e of the new name for any references to a no de with a similar name. If there is any, the user must decide whether to switch these references (if p ermissible) or to ab ort the renaming action. This op eration also checks whether any no de remains unused b ecause all references to it have b een deleted.

8 Current Status

We are now implementing a prototyp e system based on the approach de- scrib ed in this pap er. We are using the programming language C on SUN workstations. To implement the APDG, we are using generalized linked lists with a uniform no de structure. The de nition of the no de typ e of this struc- ture is shown in Figure 4. This de nition reveals the information that the APDGs have in this prototyp e.

typ edef struct GraphNo de /* No de attributes / f int IdNo; / Identi cation numb er / int EntityO set; / Entity lo cation / char Name[MaxIdentLen]; / No de name / int No deTyp e; / No de typ e */

struct GraphNo de LeftmostChild, / Links to represent */ Parent, / context of the */ LeftSibling, / corresp onding entity. */ *RightSibling;

struct Adjacent /* Cross-references */ f struct Adjacent *Next; struct GraphNo de *Reference; g *InEdges, *OutEdges; g GraphNo de ;

Figure 4: A Prototyp e Structure of the APDG No des

In this prototyp e, we designed three layers of op erations: graph-editing op erations, graph-manipulating op erations, and user-interface op erations. The rst layer, the kernel of the system, is a set of op erations used to cre-