Project Plan: Process structure extraction from ERLANG /OTP systems

Project Participants

Project manager ?%
Bengt Jonsson,
Department of Computer Systems, Department of Information Technology, Uppsala University.
Ph.D. student 80%
Jan Nyström,
Department of Computer Systems, Department of Information Technology, Uppsala University.
Industrial contact
Thomas Arts,
Computer Science Laboratory, Ericsson Utveckling AB.
Industrial contact
Ulf Wiger,
ATM Multiservice Networks, Ericsson Telecom AB.

Purpose

ERLANG is a concurrent functional language, which has been successfully used for the development of complex telecommunication software within Ericsson. An ERLANG system typically creates a lanrge number of processes. This is one of the strengths of the language, since this allows modularisation of the code. However, it may also create problems when analysing and debugging system. A large ERLANG system, possibly made up of several applications, consisting of multitude of processes is hard to overview, since processes are dynamically created, destroyed, and may be automatically restarted after failures.

The purpose of this project is to develop algorithms and tools that analyse the static and the dynamic process structure of an ERLANG system. The first goal is to develop algorithms for creating the tree of created process in an ERLANG system. In this tree, each node represents a process, and the processes that it may potentially create are represented as its children. We will use some existing graph handling tool to visualise the tree and animate it to show creation of the processes at setup.

Apart from the apparent use in having a graphical global view the extracted model although rudimentary is far more amenable to analysis then the code itself. If one keeps the attributes of the standard behaviours, such as maximal restart frequency, in the model one can use the knowledge of these behaviours and, e.g., show what parts, if any, are restarted on the failure of a particular process.

Associated with the ERLANG language is the Open Telecom Platform(OTP) with library routines for standard behaviours of telecommunication systems which incorporates failure handling and recovery and other behaviours such as logging. In our work, we will assume that applications are written using the libraries provided in OTP.

Conversely, if you can visualise the system process architecture with trees annotated with attributes to the node and relationships between arcs such as ``created before'', the same means can be used to specify the system. When the same format is used as target for abstraction and for specification, the specification can readily be compared to the abstraction of the implementation.

Background

ERLANG [Armstrong,Barklund] is a simple call by value functional language with semi-lightweight processes that allows a high degree of concurrency. Processes communicates asynchronously via message passing and the distribution is non-transparent. The ERLANG runtime system supports detection of process, as well as ERLANG node, failures. The above, together with the code loading mechanisms support for hot-swap of code, provides a good framework for building fault-tolerant concurrent soft real-time systems that can be run with high availability.

Typical systems that benefit from ERLANG is telecommunication systems[Blau,Däcker,Hinde], but ERLANG has also been used to construct the Bluetail Mail Robustifier[Bluetail] that handles traffic management for clusters of mail server in a robust way.

Coupled with the ERLANG programming language is Open Telecom Platform([OTP]) which a set of libraries originally intended to aid the construction of telecommunication systems. The libraries encode ``design patterns'' used to construct telecommunication systems and effectively supplies code skeletons where the actual behaviour is realised as a call-back module.

State of the art

Here we will present the state of the art in areas connected to the work proposed in this project.

A possible approach would be to use abstract interpretation, in the context of ERLANG this has been explored by Huch in [Huch]. In Huch work the erlang system is viewed as as a set of expression evaluation in the context of the identity of the process executing the expression and the message queue. The abstract consists of truncating the terms in the expression at a predefined depth. It is mentioned in the paper how one could tailor the interpretation so that for selected terms the terms are either kept as they are or truncated at a greater depth. The interpretation which can handle small problems. The interpretation can only handle tail recursive programs and does not handle exceptions, links, nor process termination.

Another interesting method of coming to grips with concurrent systems is to reduce them to a finite state models by means of extracting a control skeleton from the code. Holzmann has extracted finite non-deterministic ERLANG models for a C with threads, by abstracting away statements and procedure not connected to the properties under inspection, in the tools FeaVer[Holzmanna] and AX[Holzmannb]. This method relies on the user to define the abstractions, i.e. deciding what procedures and variables are unimportant, and consequently is not formally sound. A similar but more soundly founded approach is applied by Corbet and colleagues for JAVA in the Bandera projec[Corbet] where they use shape analysis to determine what variables are only accessible from one thread. A more theoretically oriented approach is followed by Nielson et al[Nielson], which derive finite-state control skeleton from a program in Concurrent ML where values abstract to their types.

An important subproblem is the construction of call-graphs for the modules in the system under inspection, notably is resent work by Agrawal[Agrawal] where the call-graph is construct on demand rather then constructing the entire call-graph, which has been case in most earlier work on call-graphs.

Project description

The goal of the project is to, given an ERLANG system constructed using the mechanisms provided for application handling and supervision by the Open Telecom Platform, produce a annotated tree that describes the process structure and dynamics of the system, starting with the process structure. The project will proceed in a number stages presented below.

Stages

Stage1
In a system well written system, i.e. where the OTP System Architecture Support Libraries(SASL) have been used, as well as the generic modules and behaviours, identify the static processes created and in what order they will be created. By static processes is meant the processes that are created in the beginning of the system and exist continuously throughout the systems lifespan. The processes should be annotated by behaviour where that is known, i.e. where OTP standard behaviours such as ``generic server'' is used. The structure will be presented graphically and a simple animation of start and failure behaviours will be provided.
Stage2
Detect what processes can be dynamically created, and which processes are linked. Analyse which processes communicate with each other. Develop mechanisms for specifying systems in the same manner that they are presented when abstracted, i.e., as trees with annotated edges and nodes.
Stage3
Provide a mechanism to compare abstracted system behaviour and specified behaviour. Trough the tracing mechanism provided by the ERLANG runtime system check(possibly having annotated the code with ``names'' to identify what processes are created) that the test runs are within specified(as of Stage2) behaviour of the system.
Stage4
Possible expansion of the problems addressed or more thorough examination of one or more of the problems above.

Milestones

2001-05-01
Stage1 prototype finished.
2001-05-15
Submission to the Erlang Workshop in connection with the conference: Principles, Logics, and Implementations of high-level programming languages.
2001-05-20
ASTEC Technical report describing the Stage1 prototype.
2001-05-??
ASTEC Seminar presenting Stage1.
2001-06-01
Project Report.
2001-09-15
Revised Project Plan.
2001-10-01
Stage2 prototype finished.
2001-10-15
STEC Technical report describing the Stage2
2001-10-??
ASTEC Seminar presenting Stage2.
2001-12-15
Project Report.
2001-12-20
Revised Project Plan.

Approach

We will organise the work as follows:
Stage1)
Develop a technique for extracting code skeleton from ERLANG systems. These skeletons consists of the control flow with function calls where only functions containing primitives dealing with the concurrency and communication constructs of the language and the OTP concepts will remain.

Initially, we will retain only the following language primitives:

The function calls to functions which functional skeleton is empty will be removed until a fix-point is reached.

We will use OTP concepts to find the structure, application resource files which defines how an application should be started and what modules should be loaded etc. and behaviours.

The behaviours provide support for using ``design patterns'' via call-back functions. For example if we want to implement a supervisor we include the compiler directive ``-behaviour(supervisor)'' in our module which instructs the compiler to check for presence of the call-back functions needed. The supervisor will be started with a call to the library module supervisor:start(Name, Module, Args) with the name of the call-back module we have defined as an second argument and the arguments to the Module:init(Args) function. The library supervisor modules will handle restart of children logging etc. whereas we only have to supply the init function.

The following OTP behaviours will be considered:

application behaviour
supervisor behaviour
gen_server behaviour
gen_event behaviour
gen_fsm behaviour

An example is presented below in Figure 1 of a simple server that always starts with a setup from the Data and then waits for a message, depending of the message received it will do the following:

In Figure 2 is shown control flow of the server.

Figure 1: Code
\begin{figure}\begin{verbatim}
server(Data) ->
Setup(Data),
receive
{inform...
... server(Data);
bye -> ok;
X -> exit(''Error'')
end.\end{verbatim}\end{figure}

Figure 2: Code call-flow
\begin{figure}\epsfxsize =.9\hsize\epsffile{call-graph.1}\end{figure}

In Figure 3 is presented the code skeleton resulting from the extraction that we plan. It can be noted that the second clause of the receive statement is entirely omitted since it does not make use of any of the language constructs on which we focus. In the Figure 4 is shown control flow of the code skeleton.

Figure 3: Skeleton
\begin{figure}\begin{verbatim}server() ->
receive
{request, From, _} -> From...
..._), server();
bye -> exit(normal);
_ -> exit()
end.\end{verbatim}\end{figure}

Figure 4: Skeleton call-flow
\begin{figure}\epsfxsize =.9\hsize\epsffile{call-graph.2}\end{figure}

Stage2)
Constructing the control-flow of the system using the abstracted control-flow of the individual processes and where processes can be communication occurs make a more precise analysis tacking process identifiers in order to determine which processes actually partake in the communication.

In a similar manner one has to determine more precisely when dynamic processes can be created in the system and what properties they have.

Much of this work will entail demand driven data-flow analysis in order to track information concerning process identifiers. A limiting factor on the effectiveness of data-flow analysis is that language is dynamically typed, making a direct translation of results from [Nielson].

Stage3)

Stage4)
Since the content of Stage4 is to be decided nothing can be said concerning approach.

Industrial aspects

The use of a tool that extract the architecture, and in a manner reverse engineer is always useful. If one can also add domain specific extraction and analysis of the model that will prove a boon when trying to assert that a particular convention is followed. That particularly when the convention relies on the cooperation of several processes.

Educational aspects

Bibliography

Agrawal
Agrawal, G., 2000: Demand-Driven Construction of Call Graphs, In Proceedings of 9th International Conference in Compiler Construction, ed. D.A. Watt , LNCS Vol.1781, pp.125-140, Springer-Verlag, 2000.

Armstrong
Armstrong, J., Virding, R., Wikström, C., Williams, M., 1993: Concurrent Programming in Erlang, 2nd ed., Prentice Hall, 1996.

Barklund
Barklund, J., Virding, R., 1999: Specification of the Standard Erlang programming language, draft version 0.7, http://www.erlang.org/download/erl_spec47.ps.gz, June 1999.

Blau
Blau, S., Rooth, J., Axell, J., Hellstrand, F., Buhrgard, M., Westin, T., Wicklund, G., 1999: AXD 310: A new generation ATM switching system, Computer Networks, No.31, pp.559-582, Elsevier Science B.V., 1999.

Bluetail
Product information on Bluetail Mail Robustifier, http://www.bluetail.com/products/bmr/.

Corbet
Corbet, J.C., 2000: Using Shape Analysis to Reduce Finite-State Models of Concurrent Java Programs, ACM Transactions on Software Engineering and Methodology, Vol.9, No.1, pp.51-93, January 2000. p

Däcker
Däcker, B., 2000: Concurrent Functional Programming for Telecommunications: A Case Study of Technology Introduction, Licentiate thesis, Computer Communication System Laboratory, Department Of Teleinformatics, Royal Institute of Technology, Sweden, 2000.

Hinde
Hinde, S., 2000: Use of Erlang/OTP as a Service Creation Tool for IN Services, Proceedings of Sixth International Erlang/OTP Users Conference, http://www.erlang.se/euc/00/one2one.pdf, 2000.

Holzmanna
Holzmann, G.J., Smith, M.H., 2000: Automating software feature verification, Bell Labs Technical Journal, April-June 2000, Special Issue on Software Complexity, 2000.

Holzmannb
Holzmann, G.J., 2000: Logic Verification of ANSI-C Code with SPIN, In Proceedings of 7th International SPIN Workshop, eds. K. Havelund, J. Penix and W. Visser, LNCS Vol.1885, pp.131-148, Springer-Verlag, 2000.

Huch
Huch, F., 1999: Verification of Erlang Programs using Abstract Interpretation and Model Checking, Proceedings of ICFP'99, ACM SIGPLAN Notices, Vol.34, No.9, pp.261-272, ACM Press, 1999.

Nielson
Nielson, H.R., Amtoft, T., Nielson, F., 1998: Behaviour Analysis and Safety Conditions: a Case Study in CML, LNCS Vol.1382, pp.255-269, Springer-Verlag, 1998.

OTP
OTP, 2000: OTP Documentation,
http://www.erlang.org/doc/current/doc/index.html.



jann@DoCS.UU.SE