ASTEC project: Automated Testing

Bengt Jonsson, Uppsala University

Participating Research Groups: Uppsala University, Dept. Computer Systems.

Participating Companies: Primarily Validation AB, but also Telelogic AB and Volvo Teknisk Utveckling AB

Goal of the Project

The overall philosophy is to develop techniques for automating the testing embedded systems, communication protocols, and telecom services.

The current particular goals of the project are:

Background and some Relevant State of the Art

In current software development technology, testing is the dominant form of verification of software products. Testing occurs late in the development process, and it is often there that errors and mistakes are discovered. Therefore, techniques that can support the generation and evaluation of tests by automated tools are important.

Presently, industry performs most testing activities with a rather low degree of automation. Testing includes the following activities (among others):

The quality of a test process can be measured by different notions of coverage, which measure how much of the specification or of the implementation has been tested and found correct.

Tools and techniques have been developed to support different parts of this process. In this section, we consider some particular aspects.

Generation of Test Sequences

Testing theory starts in a classical theory for testing the input-output behavior of sequential procedures. Central problems in this area are: how to generate test sets that achieve a certain coverage, and how to detect whether the output in a test case is correct or incorrect. Work on achieving coverage is usually based on a model of the control flow of the procedure, and can very abstractly be seen as one instance of a very general principle in testing, formulated by Beizer [Bei90, p. 121, bottom] as "testing consists of defining useful graph models and covering them". In the classical theory of testing, the control flow of a program is modeled as a graph, and measures of coverage are defined on this graph. In testing of reactive systems, a graph can model the control aspect of a system's behavior, and test purposes can be translated into sets of paths in this graph. References include [MPS99]

In Protocol Testing, there is a long tradition of research on deriving test suites from a State Machine Specification of a protocol entity. Assume that a specification of an entity is given as an EFSM (Extended Finite State Diagram) state transition diagram, where nodes represent (major) control states, and where transitions are annotated with guards and operations on data variables. In the years around 1990, considerable research concerned generation of test suites to achieve a certain coverage, e.g., executing all state transitions. Some references are [SL89,VCI90,Hol91].

Current widely distributed tools for generating test suites from state machine specifications, e.g., in SDL, often do not implement these techniques. They seem not so concerned with coverage, maybe for the reason that high coverage would result in too large test suites. Instead, test sequences are derived from a combination of a test purpose, which could be e.g., an MSC or an observer which specifies a test objective, and a state machine specification in SDL. The realization of this technique in Autolink and TestComposer is described in, e.g., [SEG$^$98,Gra93]. Research papers describing related techniques are

A commonly used technique in practice for generating test suites is to exercise them manually, and record them so that next time they can be automatically executed. This is the idea of so-called Capture/Replay tools. During manual testing, the tool records the sequences of interactions in test sequences. These sequences can thereafter be replayed to the system under test in order to repeat the test, possibly in another context. The capture is recorded as a script, and some tools offer support for manipulating scripts, e.g., to change parameters in commands. It should be noted that a C/R tool is basically a tool to assist in manual definition of test sequences, it does not assist in achieving coverage, etc.

Tools for Capture/Replay testing include WinRunner, and tools from Rational. These tools work on the level of user interface in a Windows environment. they have the limitation that during playback, the tested system should be configured in the same way as during recording, e.g., that the same objects have been created on the screen of the user. Tools for Capture/Replay testing in the WWW domain include CAPBAK/Web from Software Research Inc. http://www.soft.com/Products/Web/CAPBAK/capbakweb.html This tool is based on a browser (Microsoft Explorer), with added capabilities for recording HTTP commands, and for navigation and measurement. It can be user as a Web spider, or to measure web page latency, etc.

Test Oracle Generation

A problem in any testing is to check that the output of the system conforms to its requirements, in particular safety requirements. This problem is often referred to as the oracle problem. A common approach to the oracle problem is to select certain combinations of input values, and combine them with corresponding correct output values to form test cases or test sequence. Besides the problem of finding the correct output values, this approach assumes that the system under test is deterministic, i.e., that output values are uniquely determined by input values and their occurrence in time, an assumption that is not valid in general for distributed embedded systems.

An alternative approach is to generate a test oracle from the safety requirements. A test oracle is a separate module, which observes both the input and the output of a component, and is able to determine whether the observed sequence of inputs and outputs conforms to the (safety) requirements for the system. This approach does not assume unique output values, and solves the oracle problem in one construction. A potential problem is that it may not be trivial to construct a test oracle for an arbitrary requirement.

Work on the generation of test oracles include work in the framework of the language LUSTRE. Ouabdesselam et al [OP94,PO96] and Raymond et al [RNHW98] expresses requirements as synchronous observers in LUSTRE. These can be seen as test oracles that are expressed in a high-level executable language. Raymond et al [RNHW98] also present methods for generating relevant inputs in test sequences O'Malley et al. [ORD96] present a technique for automatically generating test oracles from formulas in GIL (Graphical Interval Logic), an interval-based temporal logic. The translation resembles the generation of deterministic finite automata from regular expressions. The oracle generation is part of the TAOS (Testing with Analysis and Oracle Support) toolkit. It is described by Richardson [Ric94], where plans to produce oracles from the real-time logic RTIL are mentioned.

The Temporal Rover [Dru00] is a tool which generates executable code in, e.g., C, C++, or Java, from temporal logic assertions that are included as comments in the source file. The Temporal Rover can be seen as a test oracle generator. Generation of test oracles have also been considered in the context of simulation, where they are usually called simulation checkers. An example is FoCs [ABGaYW00], developed at IBM Haifa Research Laboratory, which generates simulation checkers from propositional temporal logic specifications.

Approaches to generating oracles for non-reactive procedures include Peters and Parnas [PP98], who derive oracles for input-output specifications from a formal notation based on boolean expressions and bounded quantification.

Test Execution

There are many tools that allow automated execution of test sequences:

Approach

Previously, the project has consisted of a number of different activities that have run in parallel, mostly as M.Sc. thesis projects. For the continuation, we will focus on one major thread, viz. testing of Web Services, and give less effort to the other tracks.

Main Track: Testing of Web Services

The purpose of the project is to contribute to the technological basis for testing, and to do this in a framework which is relevant to development of telecommunication services. It seems that the most sensible framework to choose is that of WWW based services, since this world comes with standards and a multitude of support tools that can be used for the practical work. Care must be taken not to merely repeat what already exists in the commercial world (e.g., for Capture/Playback tools.

Research work in this area includes work by Ricca and Tonella [RT01] who describe a system for creating a graph of the WWW pages at a web site. The graph is generated by a web spider (called ReWeb). The approach works for static HTML pages, maybe also for dynamically created static pages. Based on the graph, a tool can perform static analysis of the graph structure, e.g., detecting dead pages by comparing with the pages that are present in the file system, or describing which pages must be traversed in order to reach a certain page. Another tool (TestWeb) can use the graph to produce test suites according to some given coverage criterion.

The project, then will focus on the following problems, given in the order that they will be addressed

Generating Graphs of Web Sites

The aim of his work to extend the work of the ReWeb tool developed by Ricca and Tonella. A web spider collects pages by navigating the Web site. The pages are organized into a graph. For pages which are static, and have a very simple frame structure, this is not so difficult. We propose here to address the following problems in this context.

Test Generation from and Static Analysis of Hierarchic Graph Descriptions of Web Sites

. In analogy with other testing and static analysis efforts, techniques will be developed for generating tests, and for performing various analyses of the graph structure. The project will adapt existing notions of coverage, possibly extended with other notions that are developed in the Secondary Track, as a basis for test generation. For static analysis, one might possibly envisage a connection to the MetaFrame tool, where such analyses are formulated in a generic logic (mu-calculus), and thereafter compiled into analysis code.

Secondary Track: API Testing

This track has long-term goals, and aims to increase the understanding of fundamental issues in testing. Testing of sequential procedures is a rather well-researched topic. In comparison, much less is understood about how to test reactive systems with a significant internal state, e.g., the API of a file system, or a WWW service with a database component, etc. Any meaningful notion of coverage when testing these system must be based on some model of the systems internal (persistent) state. As an example, the theory of protocol conformance testing, described previously, uses a state machine model of the protocol and defines coverage in terms of this machine. For a database application, one would like to model the internal database state. Very abstractly, this state could be viewed as an input of each operation, and classical testing theory could be used to partition the possible internal states into equivalence classes. For instance, for a file system such equivalence classes could be: file system empty, full, almost empty, almost full, etc.

API testing, using a model of the internal states of the system has been done, e.g., for GUI testing [MPS00]. There are many other examples. A major problem in this context is of course to create the internal model in the first place. In protocol testing, the protocol is often defined as a state machine, so that the model is available from the start. In more general domains, such models are usually not created. A related line of work is concerned with the use of "reference models" in testing, i.e., when an independent implementation of the system is available. This is the case, e.g., when a system has been ported to a new language or a new platform; the old implementation can then serve as reference model when testing the new implementation.

The protocol will address the problem to model internal state in API testing. The major issue will be the use of how to create crude, abstract models of a system's internal state, and how to use these in system testing. Consider for example the testing of a file system. A model of the file system can be created at various level of abstraction. Very crudely, there may be (say) only three states: "empty", "full", and "neitheremptynorfull". More elaborate models may model aspects of storage, etc. at some level of detail. Similar reasoning can be made in testing of database systems, etc.

To summarize the previous paragraph, the project will address the following work items:

Implementation Work

We foresee to develop the above techniques within the context of testing systems with WWW-based interfaces. One advantage of this approach is that generic tools can be developed, that standard data formats are defined, and that there are many generic tools available. For instance, the M.Sc. project which is now in progress, has as part of its work developed a simple Capture/Replay tool from components already available in Java.

Previous Relevant Work

http://www.docs.uu.se/astec/astec-testing/report-01-01.html has a progress report from the project for 2000.

Budget

The project is expected to run from 01-01-01 and onwards. The activities will be carried out by the following people

Bibliography

ABGaYW00
Y. Abarbanel, I. Beer, L. Gluhovsky, and S. Keidar an Y. Wolfsthal.
FoCS: Automatic generation of simulation checkers from formal specifications.
In Emerson and Sistla, editors, Proc. Int. Conf. on Computer Aided Verification, volume 1855 of Lecture Notes in Computer Science, pages 538-542, 2000.

Bei90
B. Beizer.
Software Testing Techniques.
Van Nostrand Reinhold, New York, 1990.

Dru00
D. Drusinsky.
The remporal rover and the ATG rover.
In K. Havelund, editor, SPIN Model Checking and Software Verification, Proc. 7th SPIN Workshop, volume 1885 of Lecture Notes in Computer Science, pages 323-330, Stanford, California, 2000. Springer Verlag.

Gra93
J. Grabowski.
The generation of TTCN test cases from MSCs.
Technical Report IAM-93-010, University of Berne, Institute for Informatics, April 1993.

Håk00
J. Håkansson.
Automated generation of test scripts from temporal logic specifications.
Master's thesis, Uppsala University, 2000.

HJL99
J. Håkansson, B. Jonsson, and O. Lundqvist.
Automated generation of test scripts from temporal logic specifications.
In 11th Nordic Workshop on Programming Theory, Uppsala, Sweden., Oct. 1999.

HJL00
J. Håkansson, B. Jonsson, and O. Lundqvist.
Generating on-line test oracles from temporal logic specifications, 2000.
Manuscript, submitted.

Hol91
G.J. Holzmann.
Design and Validation of Computer Protocols.
Prentice Hall, 1991.

JP01
B. Jonsson and G. Padilla.
An execution semantics for NSC2000.
In Proc. 10th SDL Forum, Lecture Notes in Computer Science, Copenhagen, Denmark, June 2001. Springer Verlag.

Löf00
D. Löf.
Automated testing of WWW applications.
Master's thesis, Uppsala University, 2000.

MPS99
A.M. Memon, M.E. Pollack, and M.L. Soffa.
Using a goal-driven approach to generate test cases for GUIs.
In Proc. 21st Int. Conf. on Software Engineering, pages 257-266, May 1999.

MPS00
A.M. Memon, M.E. Pollack, and M.L. Soffa.
Automated test oracles for GUIs.
In Proc. FSE-8, 8th Int. Symp. on the Foundations of Software Engineering, San Diego, CA, Nov. 2000.

OP94
F. Ouabdesselam and I. Parissis.
Testing synchronous critical software.
In Proc. 5th Int. Symp. on Software Reliability Engineering, pages 239-248, 1994.

ORD96
T.O. O'Malley, D.J. Richardson, and L.K. Dillon.
Efficient specification-based test oracles for critical systems.
In Proc. 1996 California Software Symposium, April 1996.

PO96
I. Parissis and F. Ouabdesselam.
Specification-based testing of synchronous software.
In Proc. 4th ACM SIGSOFT Symp. on Foundations of Software Engineering, pages 127-134, 1996.

PP98
D.K. Peters and D.L. Parnas.
Using test oracles generated from program documentation.
IEEE Transactions on Software Engineering, 24(3):161-173, March 1998.

Ric94
D. J. Richardson.
TAOS: Testing with analysis and oracle support.
In Proc. ACM SIGSOFT International Symposium on Software Testing and Analysis, Aug. 1994.

RNHW98
P. Raymond, X. Nicollin, N. Halbwachs, and D. Weber.
Automatic testing of reactive systems.
In RTSS98, 1998.

RT01
F. Ricca and P. Tonella.
Building a tool for the analysis and testing of web applications: Problems and solutions.
In T. Margaria and W. Yi, editors, Proc. TACAS '01, Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science, 2001.

SEG$^$98
M. Schmitt, A. Ek, J. Grabowski, D. Hogrefe, and B. Koch.
Autolink - putting sdl-based test generation into practice.
In 11th Int. Workshop on Testing of Communicating Systems (IWTCS'98), Tomsk, Russia, Sept. 1998.

SL89
D.P. Sidhu and T.K. Leung.
Formal methods in protocol testing, a detailed study.
IEEE Trans. on Software Engineering, SE-15(4):413-426, 1989.

VCI90
S.T. Vuong, W.Y.L. Chan, and M.R. Ito.
The UIOv-method for protocol test sequence generation.
In Proc. 2nd Int. Workshop on Protocol Test Systems, pages 161-176. North-Holland, 1990.