






















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A new approach to unit testing object-oriented programs based on the ideas that the natural units to test are classes and that testing classes should focus on whether a sequence of messages puts an object into the 'correct' state. The approach uses a user-supplied equivalence-checking routine to check if objects are in the same abstract state and allows for substantial automation of testing.
Typology: Exams
1 / 30
This page cannot be seen from the preview
Don't miss anything!























This article describes a new approach to the unit testing of object-oriented programs, a set of tools based on this approach, and two case studies. In this approach, each test case consists of a tuple of sequences of messages, along with tags indicating whether these sequences should put objects of the class under test into equivalent states and\or return objects that are in equivalent states. Tests are executed by sending the sequences to objects of the class under test, then invoking a user-supplied equivalence-checking mechanism. This approach allows for substantial automation of many aspects of testing, including test case generation, test driver generation, test execution, and test checking. Experimental prototypes of tools for test generation and test execution are described. The test generation tool requires the availability of an algebraic specification of the abstract data type being tested, but the test execution tool can be used when no formal specification is available. Using the test execution tools, case studies involving execution of tens of thousands of test cases, with various sequence lengths, parameters, and combinations of operations were performed. The relationships among likelihood of detecting an error and sequence length, range of parameters, and relative frequency of various operations were investigated for priority queue and sorted-list implementations having subtle errors. In each case, long sequences tended to be more likely to detect the error, provided that the range of parameters was suffkiently large and likelihood of detecting an error tended to increase up to a threshold value as the parameter range increased. Categories and Subject Descriptors: D.2.1 [Software Engineering]: Requirements/Specifics- tions—languages; D.2.5 [Software Engineering]: Testing and Debugging-symbolzc execution; test data generators; D.3.2 [Programming Languages]: Language Classifications-object- oriented languages; D.3.3 [Programming Languages]: Language Constructs and Features— abstract data types General Terms: Algorithms, Experimentation, Languages, Reliability Additional Key Words and Phrases: Abstract data types, algebraic specification, object-oriented programming, software testing
This research was supported in part by NSF grants CCR-8810287 and CCR-9003006 and by the New York State Science and Technology Foundation, and was performed while the first author was at Polytechnic University. Authors’ addresses: R. K. Doong, Sun Microsystems Laboratories, 2550 Garcia Avenue, Moun- tainview, CA 94043; email: [email protected]. sun.tom; P. G. Frankl, Department of Com- puter Science, Polytechnic University, 6 Metrotech Center, Brooklyn, NY 11201; email: phyllis(tjmorph.poly .edu. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. @ 1994 ACM 1049 -331X/94 /0400-O101 $03. ACM Transactions on Software Engineering and Methodology, Vol. 3, NO ~, A@ 1994, pages 101-130.
Object-oriented programming, based on the concepts of data abstraction, inheritance, and dynamic binding, is becoming an increasingly popular soft- ware development methodology. Much research has been done on developing object-oriented analysis and design techniques, developing object-oriented programming languages, and exploring how the methodology changes the software development process. Yet relatively little research has addressed the question of how object-oriented programs should be tested. We have developed a new approach to unit testing object-oriented pro- grams, which is based on the ideas that the natural units to test are classes, and that in testing classes, one should focus on the question of whether a sequence of messages puts an object of the class under test into the “correct” state. In this approach, roughly speaking, each test case consists of a pair of sequences of messages, along with a tag indicating whether these sequences should result in objects that are in the same “abstract state.” A test case is executed by sending each sequence of messages to an object of the class under test, invoking a user-supplied equivalence-checking routine to check whether the objects are in the same abstract state, then comparing the result of this check to the tag. This testing scheme has several nice properties:
of Tools for Object-Oriented Testing, which includes an interactive specifica- tion-based test case generation tool and a tool that automatically generates test drivers. For any class C, ASTOOT can automatically generate a test driver, which in turn automatically executes test cases and checks their results. Additionally, when an algebraic specification for C is available, ASTOOT can partially automate test generation. Thus the system allows for substantial automation of the entire testing process. The current version of ASTOOT is targeted to testing programs written in Eiffel.TM Throughout this article we assume that the classes being tested are written in Eiffel. However, the underlying ideas and tools can be adapted relatively easily to other object-oriented languages. In Section 2, we review relevant background material on software testing, object-oriented programming, and algebraic specification of abstract data types. Section 3 describes the ideas underlying ASTOOT—correctness of a
‘M Elffel is a trademark of the Nonprofit International Consortium for Eiffel (Nice).
ACM Transactions on Software Engmeermg and Methodology, Vol. 3, No, 2, Aprd 1994.
trivial problem, for example, if there is a great deal of output, or if it is difficult to calculate the correct value [ Weyuker 1982]. Our testing method uses a novel approach which allows the correctness of test cases to be checked automatically by the test execution system.
Object-oriented languages support abstract data type, inheritance, and dy- namic binding. An abstract data type is an entity that encapsulates data and the operations for manipulating that data. In object-oriented programming, the programmer writes class definitions, which are implementations of ab- stract data types. An object is an instance of a class; it can be created dynamically by the instantiation operation, often called “new” or “create.” A language supports inheritance if classes are organized into a directed acyclic graph in which definitions are shared, reflecting common behavior of objects of related classes. A class consists of an interface which lists the operations that can be performed on objects of that class and a body which implements those operations. The state of an object is stored in instance variables (sometimes called attributes), which are static variables, local to the object. A class’s operations are sometimes called methods. In object-oriented programs, computation is performed by “sending mes-
some arguments. The invoked method may then modify the state of its object and/or send messages to other objects. When a method completes execution, it returns control (and in some cases returns a result) to the sender of the message. The inheritance mechanism of object-oriented languages facilitates the development of new classes which share some aspects of the behavior of old ones. A descendent (subclass) Cd of a class C inherits the instance variables and methods of C. Cd may extend the behavior of C by adding additional instance variables and methods, and/or specialize C by redefining some of C’s methods to provide alternative implementations. A dynamic binding mechanism is used to associate methods with objects. In strongly typed object-oriented languages, it is legal to assign an object of class Cd to a variable of class C, but not vice versa. After doing so, a message sent to this object will invoke the method associated with class Cd. For example, consider a class POLYGON with subclasses TRIANGLE and SQUARE, each of which redefines POLYGON’s perimeter method. Assigning an object of class SQUARE to a variable of class POLYGON, then sending the perimeter
polymorphic data types. Some examples of object-oriented langaages include Smalltalk, C + +, and Eiffel [Goldberg and Robson 1983; Meyer 1988; Stroustrup 1991]. While Ada and Modula-2 are not, strictly speaking, object-oriented languages, they do provide support for data abstraction; thus, some of the ideas discussed here
ACM Transactions on Software Engineering and Methodology, Vol 3, No 2, April 1994.
The ASTC)OT Approach to Testing. 105
approach.
Before we can talk about how to test a class C, we must have some concept of what it means for C to be correct. Thus, we must have some means, formal or informal, of specifying the entity that C is intended to implement and of stating the conditions under which the implementation conforms to the specification. In the case where C is intended to implement an abstract data type, algebraic specifications provide a formal means of doing this. An algebraic specification has a syntactic part and a semantic part. The syntactic part consists of function names and their signatures (the types they take as input and produce as output). In an algebraic specification of type T, functions which return values of types other than T are called obseruers, because they provide the only ways for us to query the contents of T. Functions which return values of type T are called constructors or trczrzsforrn- ers. 1 The distinction between constructors and transformers is clarified be- low. The semantic part of the specification consists of a list of axioms describing the relation among the functions. Some specification techniques allow for a list of preconditions describing the domains of the functions, while others allow functions to return error values indicating that a function has been applied to an element outside of its domain. Term rewriting [Knuth and Bendix 1970] has been used to define a formal semantics for algebraic specifications [Goguen and Winkler 1988; Musser 1980]. Two sequences SI and Sz of operations of ADT T are equivalent if we can use the axioms as rewrite rules to transform SI to Sz.2 A specification can then be modeled by a heterogeneous word algebra, in which the elements are equivalence classes of sequences of operations. For a specification & to be useful, it must be consistent and sufficiently complete [Guttag and Horning 1978]. A consistent specification must not contain contradictory axioms, i.e., no contradiction should be derivable from any operation sequences of the specification. Let W be the set containing all the operation sequences consisting of constructors or transformers of 9. S is sufficiently complete, if for every sequence w in W, the result of applying each observer of y to w is defined. Discussion of how to construct useful algebraic specifications can be found in Antoy [1989] and Guttag [ 1977; 1980]. Most algebraic specification languages use a functional notation. For conve- nience, we have designed a specification language, LOBAS, whose syntax is similar to 00 programming language syntax [Doong 1993]. The syntactic part of a LOBAS specification includes an export section which lists opera- tions available to the users of the ADT. In LOBAS, the designer of a
lTransformers are called extensions in Guttag and Horning [1978]. zThis definition follows the assumption of Goguen et al. [1978]; Guttag et al. [1978] makes the opposite assumption, i.e., that two sequences may be assumed to be the equivalent unless provably inequivalent. ACM Transactions on Software Engineering and Methodology, Vol. 3, No 2, April 1994.
The ASTOOT Approach to Teshng. 107
cluss Priority-Queue export create, largest, add, delete, empty, eqn constructor cl-eat e; add (x: Integer) transformer delete observer empty: Boolean; largest: Integer; eqn (B: Priority_ Queue): Boolean var A, B: Priority _Queue, x, y: Integer axiom 1: create. empty – > true; 2: A.add(x). empty - > false, 3: create.largest – > – co; 4: A.add(x).largest – > if x > A.largest then x eke A.largest; 5: create. delete – > create; 6: A.add(x) delete – > if x > A.largest then A else A.delete.add(x); 7: A.eqn(B) – > if A.empty and B.empty then true else if ~A.empty and not B ,empty) or (not A.empty and B.empty) then fake else if A.largest = B.largest then A.delete.eqn(B.delete) else false end
(a) Specdication in LOBAS
type Priority _Queue syntax create: — > Priority Queue; add: Priority Queue x Integer
Priority_Queue; delete: Priority _Queue - > Priority .Queue; empty: Priority _Queue — > Boolean; largest: Priority.Queue – > Integer; eqn: Priority-Queue x Priority-Queue
Boolean; declare A, B: Priority .Queue; x, y: Integer; semantics 1: empty (create) – > true; 2: empty (add(A, x)) – > false; 3: Iargest( create) – > – co; 4: largest(add(A,x)) – > if x > largest(A) then x eke largest (A); 5: delete (create) – > create; 6: delete(add(A,x)) – > if x > largest(A) then A else add(delete (A) ,x); 7: eqn(A,B) – > if empty(A) and empty(B) then true else if (empty(A) and not empty(B)) or (not empty(A) and empty(B)) then false else if largest(A) = largest (B) then eqn(delete(A),delete (B)) else fake end
(b) Specification in functional notation Fig. 1. Specifications of the priority queue.
tions. Furthermore, testing each method individually necessitates the con- struction of complicated drivers and output-checking mechanisms. For exam-
priority queue and an item, and the output would be another priority queue. Thus the driver would have to initialize the input priority queue, and checking the output would entail examining the output priority queue to see if it is ;he correct result. In contrast, our approach to testing classes focuses on the interaction of operations. In this section, we restrict attention to classes intended to implement ADTs. We require that
(1) operations have no side effects on their parameters,
(2) functions (observers) have no side effects,
(3) functions (observers) can only appear as the last operation of a sequence, and
(4) when a sequence is passed as a parameter to an operation it must not contain any functions (observers).
The main reason for placing restrictions 1 and 2 is that we cannot specify these kinds of side effects by using either LOBAS or purely algebraic
ACM TransactIons on Software Engineering and Methodology, Vol. 3, No 2, April 1994.
languages. The reason behind restriction 3 is that sequences that mix func- tions and procedures are not syntactically valid in LOBAS or other algebraic specification languages [Mclean 1984]. Restriction 4 makes it easier to gener- ate test cases using ASTOOT. Note that restriction 4 does not hinder our ability to express test cases involving any parameters to an operation, since when function f has no side effects on its target object (the object to which the message is sent), the target object of a sequence S.f will be observation- ally equivalent to the target object of S. Techniques for relaxing restrictions 2, 3, and 4 are discussed in Doong [ 1993].
Consider a class C, intended to implement abstract data type T. Each function in T corresponds to a method of C’, and inputting a value of type T to a function corresponds to sending a message to an object of class C. In Eiffel, constructors and transformers are typically coded as procedures; rather than explicitly returning an object of class C, such a procedure “returns” a value by modifying the state of the object to which it has been applied. An observer can be coded as a function which explicitly returns an object of another class. We will refer to the object which a function or procedure message is sent as the target object and to the object returned as the returned object. For procedures, the target object and the returned object are the same (though typically the value of the target object will be changed by the procedure call). Notice that in addition to explicitly returning an object, a function also implicitly “returns” its target object. If the function is side effect free then the value of the target object will be unchanged by the function call. The syntax of LOBAS, unlike the functional syntax of most algebraic specification languages, allows us to differentiate between the target and returned values. For example, in the sequence create. add (5) add(3) .Iargest the final value of the target is a priority queue whose elements are 5 and 3, and the returned value is 5. We will say that objects 01 and Oz of class C are observationally equiva- lent if and only ifi
—C is a built in class, and 01 and Oz have identical values; or —C is a user-defined class, and for any sequence S of operations of C ending in a function returning an object of class C’, O1. S is observationally equivalent to Op .S as objects of class C’.
Thus, 01 is observationally equivalent to Oz if and only if it is impossible to distinguish 01 from Oz using the operations of C and related classes. Two observationally equivalent objects are in the same “abstract state,” even though the details of their representations may be different. For example, consider a circular array implementation of a first-in-first-out (FIFO) queue. Two arrays containing the same elements in the same order would be observationally equivalent (as queues), even though the elements could oc- cupy different portions of the underlying arrays. We now define the notion of correctness that underlies our approach.
ACM Transactions on Software Engineering and Methodology, Vol. 3, No 2, Aprd 1994.
automated execution and checking of test cases. Of course, when generating such test cases, it is necessary to consider the specification of the ADT in order to derive the tags. This can either be done semiautomatically by manipulating a formal specification, as described in Section 4, or manually by reasoning about a formal or informal specification. For example, consider a priority queue of integers, whose functions are described informally as follows: create—creates an empty priority queue, add—adds an integer to the priority queue, delete—removes the largest element of the priority queue, largest—returns the value of largest element of the priority queue, without modifying the contents of the priority queue, and
By reasoning about this informal specification, a person can generate test cases such as, (1) (create. add(5) .add(3).delete, create. add(3), equivalent), (2) (create. add(5) .add(3).delete.largest, create. add(3) .largest, equivalent), (3) (create. add(5) .add(3).delete, create. add(5), not-equivalent), and
applying delete should be the same as creating an empty priority queue and adding 3 to it. Test case 2 says that the objects returned by applying largest to those two priority queues should be equivalent. Test case 3 says that if we create an empty priority queue add 5 and 3, then delete, it should not be the same as if we create an empty priority queue and add 5 to it. Test case 4 says that a priority queue obtained by adding 5 then adding 3 should be observa- tionally equivalent to one obtained by adding 3 then adding 5. Unlike the previous three test cases, this test case captures an aspect of the informal specification that is not expressed in the formal specification, and thus it cannot be derived from the formal specification by using term rewriting. This indicates that, even when a formal specification that partially describes the intended semantics of an ADT is available, manual generation of addi- tional test cases may be useful. We refer to test cases consisting of a pair of sequences along with a tag as mstrict~d-format test cases. More general test case formats which are useful for testing classes involving side effects and dynamic binding are introduced in Doong [1993].
We now discuss the EQN operation. Ideally, the EQN operation in class C should check whether two objects 01 and Oz of class C are observationally
31f an axiom such as A add(x) add(y) + A add(y) .add(x) were added to the specification, this aspect of the informal specification would be captured. However, the resulting specification would no longer satisfy the finite termination condition,
ACM Transactions on Software Engineering and Methodology, Vol. 3, No. 2, April 1994
The ASTOOT Approach to Testing. 111
equivalent; that is, it should check whether any sequence of messages ending in an observer yields the same result when sent to 01 as when sent to Oz. Since it is clearly impossible to send every such message sequence to the objects, in practice EQN will approximate a check for observational equiva- lence. It is often quite easy to produce a recursive version of EQN from the specification of the AD’I’ which C is intended to implement. For example, axiom 7 of Figure 1 specifies such an EQN function based on the priority queue specification. Note that this is actually only an approximation of true observational equivalence because it neglects the possible effects of “building up” the priority queues, then removing elements. Thus, it might say that two objects are equivalent when they are not.4 Also, since EQN calls largest, and
to mask out the error. On the other hand, the error propagation can also help in error detection, as demonstrated in Section 5.1. Another approach to developing the EQN function is to write it at the “implementation level.” In this approach, EQN is based on detailed knowl- edge of how data is represented and manipulated in the class body. For example, knowing that a FIFO queue is represented as a linked list, one can traverse the two lists comparing the elements. In general, if sufficient atten- tion is paid to the details of the representation, EQN can implement observa- tional equivalence exactly. On the other hand, it is possible that the same misconceptions which lead to implementation errors in C’s other methods may lead to errors in EQN. Furthermore, for some representations of some data structures, writing an implementation-level EQN operation may be extremely difficult and error prone, even when the other methods are rela- tively simple. It is also sometimes possible to use a very coarse approximation of observa- tional equivalence as the EQN function. For example, we might consider two FIFO queues to be equivalent if they have the same number of elements, or if they have the same front element. This version of EQN may consider two inequivalent objects to be equivalent. Naturally, using a coarser approxima- tion of observational equivalence will lead to less accuracy in the test results. Bernet et al. [1991] discuss a closely related problem and suggest that an “oracle hypothesis” be explicitly stated. In the context of our approach to testing, such a hypothesis would describe the conditions under which the implementation of EQN is equivalent to an actual check for observational equivalence.
ASTOOT is a set of tools based on the approach described in Section 3. The current prototype, which handles test cases in the restricted format, has
4 For example consider an implementation which completely empties the priority queue when- ever the total number of adds performed reaches a parl~icular number N >2. The recursive EQN would consider 01 ,create.add (1 ) add (2) ,delete equivalent to 02.create,add (1), but in fact, perform- ing an additional N – 2 adds followed by N – 2 deletes on each object would leave 01 empty and leave 02 nonempty. ACM Transactions on Software Engmeermg and Methodology, Vol. 3, No, 2, April 1994,
The ASTC)OT Approach to Testing. 113
other hand, drivers are complicated enough that writing them manually is a tedious and error-prone task. In particular, checking the syntactic validity of the operation sequences involves complicated parsing and type checking. For example, our driver for the priority queue class has over 400 lines of code (not counting inherited classes), most of which deals with checking the syntax of the operation sequences. Luckily, drivers for testing different classes are very similar to one another in structure. This has allowed us to write a tool, the driver generator, which automatically generates test drivers. The driver generator can be viewed as a special-purpose parser generator, which, based on the syntax described in the class interfaces, generates test drivers that parse test cases, as well as executing and checking them. The driver generator, DG, operates in three phases. The first phase is to collect information about interfaces of the CUT, its ancestors, and all the classes which are parameter types or return types of CUT’s operations. DG first checks whether each of these classes has an exported EQN operation. (If, like Eiffel, the implementation language has the facility of selective export then we can let EQN be exported only to the test driver, so the integrity of the implementation can be preserved.) In the second phase, DG builds a test driver, which is a class in the implementation language. The current version of the driver generator is targeted to Eiffel 2.1, but the underlying ideas can be applied to other 00 languages. In the third phase, DG compiles and executes the test driver with test cases supplied by the user.
4.2 Test Generation Tools
ASTOOT’s test generation component has two parts, the compiler and the simplifier, both of which are based on an internal representation called an ADT tree. The compiler reads in a specification written in LOBAS and does some syntactic and semantic checking on the specification, then translates each axiom into a pair of ADT trees. An ADT tree is a tree in which nodes represent operations along with their arguments. Each path from the root to a leaf of an ADT tree represents a possible state of the ADT. The branching of an ADT tree arises from axioms having IF_ THEN_ELSE expressions on the right-hand side. Each edge of the ADT tree has a Boolean expression, called the edge condition, attached to it. The path condition of a path from the root to a leaf is the conjunction of all the edge conditions on that path; it indicates the conditions under which the operation sequence on that path is equivalent to the original sequence. The path conditions in a given tree are mutually exclusive. Figure 4 illustrates the ADT tree pair of Axiom 6 in Figure 1. For clarity, the edge conditions are
5For ASTOOT to access functions that are hidden in the implementation, the CUT should export these functions to the test driver generated by ASTOOT. In Eiffel this can be achieved by “selective export” to the test driver; in C + + this can be achieved by making the test driver a friend class of the CUT. 6Because the simplifier and the driver generator operate under the assumption that create is the instantiation operation, the compiler makes sure there is a constructor named create in the specification. Also, the simplifier will insist that the first operation of a sequence is the create operation. ACM Transactions on Software Engineering and Methodology, Vol 3, No. 2, April 1994.
; p“d%create^ .add(a)C,t^ pq.^ segadd(b) add(c) add(d) deleta , P“c,3%S,mpl>fy?w b,teks, pqmp 1 {fy Pq puc, s% cat pq.,, m ,: (meats,dO(h). (^) add(c)add(a) ad,(d))add(b) .add<c). add(,) d’alete, mats add(a) credit,& meat. om (^) acid(a)cre, tsadd(b) largest largest> a & (^) ) meatsc & (^) mea+.add(a) add(a)largest add(b)> b ‘ add(, ).l,rw,t > d ‘, (^) (mmte ddd(a).sad(b) add(c) add(dl. delete, create add(b) ~~: add(c)cmd, t>o”,.add( d))meat, add(a) largest ~ b & crmts. add(a) add[l ),1..3 ,8, ~ c G create add(, ) add(b), add(c) largest > d add(c)(cwt.^ add(d))add(a)^ add(b)^ add(c)^ addid)^ delete,^ create^ add(s) cmd,).add(b) t,on add(c)create .lar9e9t add(. ) (^) > add(b) 6 .larmst > c & create add! ,:, (CPe*teadd(b), add(c))add[a~ add(b) add(c> add, 0, delete> create add(a) cond+ t i on (^) - (create add(a) add(b) add(=) largest ; d) ~~ (=H.+.add(b) add(d))add(al add(b) add(c) add(d) delete, mate. add(a) cmd, t,m crmte, add(a), add(bl add(c) lamest > d pucs3% (j
I
,nhar,LSVWAX. i ERROR, E_TOKEV_CONST, CHECK. E. CONST, :eatu.bRGLMENTS = s. EIFFEL– SCANMER> w=.laet.claas.”me lNTEGE@. STRING, O.l.bmlO.1.pq, D_2_pqem, 0-2. pq,boolsw bools an,
1
-- create10..1 Is tag, O1->s-”o, d, (^) ob,ok i, BOOLEAN,p
..
CONVERT.S
Z, U.3X cat pq Seq -- Test(ORIG1tlbL-SEQ, ,,,.s can b,SIMPLIFIED.SEO, ,“ on. of the hoEQUAL) [.,.s or ---- (ORIGINbL_sEO, 51 MPL1FLED.SEU NOT) (treat.(Crests @d(4)add(3) .dd(3)aad[4J zdd(?)add(2) add(l)add(l) d.lets,delate, c.eate..mts add C31..dd(3) add(2)ldd(2) .dd(l),add(l), EouAL)EQuAL (cw=+e(create a0d(3)add(3) =dd(2)add(2) .dd(~)add(l) add(1)add(4) delet~,d.les, create=...+, &aa(3)=dd(3) adO(2)add(2) .dd[l.dd(l), J, EQUALJEQUAL)
1
,ss 1 v.,,on^ c1 ass>.”^ (1p^ 2) .ss.ss 23. ..“ ,1.ss.1. = pqp as.en.rating 4.. classtest–drl”er p ,scw S“stm11.9 e,a*e.bl^ -d., vv, cowl ate
1’
Wl;g (~N: ;:(.) .dd(31 old(2) add(l) d.lst., cr.. +.. *old(3) adO(21, add\ 1,, ..”, (y:$ ;#(3~ add(. ) .W21 add(l) deletz. c-ate add&3, add!>) add<ll, ewal (:;je add(3) add(2) add(4) add[ll delete, c-rote add(3) &dd:Z) add(l), equal (y;;e add[J) add:?) add(l) addf4) dslete, craa+e .dd(31 add(>) .dd! l), q.,
Fig. 3. Screen dump of an ASTOOT session. The upper left window shows the execution of the test generator in batch mode on a priority queue specification. The file pqseq contains an initial sequence, supplied by the user. The test generator generates five test cases based on this initial sequence and writes them, along with the corresponding constraints on the free variables, to the file pq,sim. The constraint on each test case is obtained by conjoining the condition for that test case with the negations of the conditions on previous test cases. The upper right window shows the four test cases the user has developed by instantiating the free variables with values that satisfy the constraints. (The first of the generated test cases has an unsatisfiable constraint, so it is eliminated by the user). The driver generator is then invoked on an incorrect implementation of the priority queue (described in Section 5.1). It invokes Eiffel to compile the class under test, generates a test driver for the class, compdes it, then executes the given test cases. The first two test cases detect a bug, while the second two do not. The lower left window shows a small portion of the test driver which was automatically generated by the driver generator.
shown in rectangles in the figure; in the implementation, parameters of operations and the operands in the Boolean expressions are, themselves, represented by ADT trees. The simplifier inputs an operation sequence, supplied by the user, trans- lates it into an ADT tree, and applies the transformations to obtain equiva- lent operation sequences. The process of simplification is as follows: (l) Search through the axioms to find an axiom with a left-hand side that matches some partial path of the ADT tree (ignoring the edge conditions). (2) If an axiom is found, bind all the variables in the axiom to the proper arguments in the partial path of the ADT tree and simplify the argu- ACM Transactions on Software Engineering and Methodology, Vol. 3, No !2, April 1994,
create
add(x)
add(y)
delete
T
VOID
/ I. ( create )
add(x) add(x)
m
delete
add(y)
VOID m y<x crest e create
E!@ 1 @b
Fig, 5. Simplification of the sequence create. add(x) add (y) delete.
For an ADT tree with n paths, the simplifier will generate n test cases that
test cases that have not-equivalent tags, where n. is 0( mz’), # is the length of the original sequence, and m is the maximum number of branches in any axiom. Because there are too many such cases in an ADT tree, the current version of the simplifier leaves selection of such test cases to the user. Note that the test cases generated by the simplifier contain symbolic values. To make them acceptable to the test driver, the user has to resolve the path conditions (constraints) and instantiate the symbolic values with the corresponding actual values. In principle, this could sometimes be done automatically by a constraint-solving system. In the current prototype, con- straint solving is left to the user. Two important questions remain: how should one select original sequences to input to the simplifier, and how should one select paths through the resulting ADT trees, in order to increase the likelihood of exposing errors?
To gain insight into what kind of original sequences the person using the test generation tools should select and what kind of paths through the ADT tree should be generated in interactive mode, we performed two case studies, involving generation of many tests for a buggy priority queue implementation and for a buggy sorted-list implementation. We choose the priority queue ADT because we knew it to be sufficiently complicated to exhibit many interesting phenomena. We purposely introduced the bug, but believe that it is one which could easily occur in practice. The sorted list was based on a 2- tree, implemented for a graduate algorithms class. The bug was a slight variation on one which had actually occurred during program development.
ACM TransactIons on Software Engineering and Methodology, Vol. 3, No 2. Aprd 1994
The ASTOOT Approach to Testing. 117
We wished to gain insight into the following questions:
(%) How does the length of the original sequence affect the likelihood that a test case will detect an error?
sequence affect the likelihood that a test case will detect an error?
likelihood that a test case will detect an error?
We addressed these questions by randomly generating and executing several thousand test cases with various original sequence lengths, various ranges in which parameters could lie, and various frequencies of occurrence of different operations. For each original sequence we generated the corresponding sim- plified sequence, then executed the test case (original sequence, simplified sequence, equivalent). Note that it would have been extremely difficult to execute and check so many test cases, had it not been for ASTOOT’s “self- checking” test case concept.
In this case study, the CUT was a priority queue, implemented using a heap
shown in the Appendix. In Figure 6, (a) is the heap resulting from sequence create. add(5).
(a); note that 1 has failed to swap with 2 in the bottom row.
propagated to EQN. Even though the original sequence in test case
the original sequence and the simplified sequence are equivalent due to the
the bug from delete to EQN.
quence produces a heap with 3 in the root, 1 in the root’s left child, and 2 in the root’s right child. The simplified sequence produces a heap with 3 in the root, 2 in the root’s left child, and 1 in the root’s right child. These two heaps are both correct and should be observationally equivalent. However in check- ing executing EQN to check observational equivalence, we call the erroneous
7Recall that a heap is a complete binary tree in which each node is greater than or equal to its children; in the heap implementation of a priority queue, the delete operation is performed by removing the root, replacing it by the rightmost leaf, then “sifting” that element down to its proper position. ACM Transactions on Software Engineering and Methodology, Vol. 3, No. 2, April 1994.
The ASTOOT Approach to Testing. 119
in simplified sequences is approximately
&’( r-l) ifrzl r+l
Results of Priority Queue Case Study The percentages of test cases that expose the bug in each test set are shown in Figure 7. Inspection of these graphs shows the following:
(~)
(p)
(r)
For large values of p, the parameter range, long original sequences are better than short ones. However, if the parameter range is too small, longer original sequences may do worse than shorter ones. In fact, the results of test sets R(IOO lo ~,, R(IOO lo ~}, and R(IOO lo ~, are the worst in
original sequences. As the parameter range p increases, test cases tend to get better. However, in each case there appears to be a threshold above which the error detection probability levels off. Likelihood of exposing an error depends somewhat on r.
In this buggy implementation, failure only occurred when it was necessary to swap with the rightmost element in the bottom row of the heap. Appar- ently, the long sequences were potentially more likely to cause the object to enter such a state, either during application of the original or simplified sequences, or through propagation of the error to the EQN operation. How- ever, simply using a long sequence, without regard to the parameters chosen could lead to objects that never get into these “interesting” states. If the range of parameter values is too small, there will be many duplicates in the heap, so when an item is deleted, it is less likely that the sifted item will be strictly smaller than all of the elements it is compared to; thus it is less likely that it was supposed to swap with the bottom row.
The second case used an abstract data type, sorted list of integer, with six
preconditions, and informal specification of the sorted list are shown in Figure 8. The EQN operation compared the lengths of the lists, then com- pared them element by element. Note that we did not use any formal specification for this sorted list. The test cases were generated by using a C program similar to the one in the case study of priority queue. The sorted list was implemented using a 2-3 tree (a special case of B-tree). The implementation has approximately 1000 lines of Eiffel 2.1 code, and the buggy version was produced by deleting one particular line from the correct
ACM Transactions on Software Engineering and Methodology, Vol. 3, No. 2, April 1994
Range of Parameters (p) (T=l)
E 50
r
** r *
a
s l^ e 10 -:
Range of Parameters (P) (r= 6)
60 -
E 50 - r 000 : 40 - 0 r (^) ‘ c 30 - a 9 : 20 - s l 0 10 -
Range of Parameters (p) (r =3)
r
r
Ir
0
kL__Ll-
Range of Parameters (p) (r= 9) Fig, 7. Results of testing the priority queue using randomly generated test suites.
version of implementation. The absence of this statement affects the state the (1)
2-3 tree only when the following situation occurs: A node (( 0 ) in Figure 9(a)) has three children, such that the first child (
of
a!) has three children, and both the second child ( ~ ) and the third child (y) have two children. One of of y’s children is then deleted. For example, after deleting 6 of y from the 2-3 tree in Figure 9(a), the correct procedure is:
(1) copy 5 from /? to y, (2) delete 5 from ~,
ACM Transactions on Software Engineering and Methodolo~, Vol 3, No 2, April 1994