AMOS II

AMOS II
Functional Mediators
for Information Modelling, Querying, and Integration

Tore Risch
UDBL
Uppsala University
PO Box 337
SE-751 05 Uppsala
Sweden

Phone: +46-18 471 63 42 Fax: +46-18 511 925 E-mail: Tore.Risch@it.uu.se

Introduction

The computing environments have become increasingly distributed through the use of Internet and other computer communication networks. What we are experiencing is an ever increasing access to more or less structured information that is furthermore very dynamic and is continuously changing. In this environment it is getting more and more critical to develop methods for building systems that combine relevant data from many sources and present them in a form which is comprehensible for users. It is getting important to develop tools that facilitate the development and maintenance of information systems in a highly dynamic and distributed environment.

The wrapper-mediator approach divides the functionality of a data integration system into two kinds of subsystems. The wrappers provide access to the data in the data sources using a common data model (CDM), and a common query language. The mediators provide coherent views of the data in the repositories by performing semantic reconciliation of the CDM data representations provided by the wrappers.

The purpose of the AMOS II project is to develop and demonstrate a mediator architecture for supporting information systems where applications and users combine and analyze data from many different data sources. A data source can be a conventional database but also text files, data exchange files, WWW pages, programs that collect measurements, or even programs that perform computations and other services. The data sources are distributed over a communication network such as the Internet. The application areas for this kind of architecture include engineering, telecom, and decision support applications. In the AMOS II architecture the applications access the data sources through one or several mediator databases. A mediator database presents high-level abstractions (or views) of combinations of data sources. It makes the combined data accessible and composable through the use of high-level queries and views and relieves the user as well as the application programmer from details of the data sources. Furthermore the mediator database itself can also store its own data. For example there is normally application oriented data that is not tied to any particular data source, but rather to the mediator itself.

A mediator database should represent its information in a form which is suitable for the type of applications that is using it. Therefore a mediator database should be domain oriented in the sense that both its meta-data (i.e. its schema) and its data operations should be specialized for its application domain. The performance requirements on mediator databases can be very high for some application domains.

With the AMOS II approach the mediator databases are designed using a very light weight DBMS, the AMOS II mediator DBMS, which is extensible and permit high-level functional and object-oriented abstractions and queries over data sources. A central enabling technology in AMOS II is extensible and functional database query technology that can be customized for specific application areas. Since the AMOS II mediator DBMS is very light weight it is feasible to embed it in other systems. Furthermore, to integrate and coordinate data from many data sources that are distributed over the network, many AMOS II mediator servers can run in different locations and communicate with each other in a client-server or peer-to-peer fashion. Thus, unlike extensible heavy-weight DBMSs or mediators based on relational DBMSs, the AMOS II mediator DBMS is a small light-weight functional kernel system that allows powerful data modelling, can be extended, and that uses other back-end systems to do much of the work.

Ongoing Work

The following problem areas are studied so far:

The mediator database needs high level data modeling primitives to support advanced applications. Central to the AMOS II approach is to use an extensible next-generation functional and object-oriented query and modelling language. Advanced applications need not only advanced data modeling primitives, but also the possibility to extend the system on all levels. Vanja Josifivski made his Ph.D. on the central problems of methods for defining, processing, optimizing, and executing functional queries in a distributed mediator system.

The extensibility of AMOS II allows us to develop specialized database systems, which we call Domain Mediators, for different application areas. The Ph.D. Thesis of Kjell Orsborn concerns the development of a Domain Mediator for numerical and spatial models within the Mechanical Engineering domain. His work shows that numerical models for Finite Element Analysis can be represented as high-level abstractions in AMOS II with retained (and even improved) efficiency compared with traditional implementations. The application domain requires very high performance and many data representation techniques have been developed in the past to achieve this. It is important to make these techniques available within the mediator databases by being able to extend the mediator DBMS on both abstraction, query processing, and data representation levels.

Many problem need to be solved in order to combine information from many distributed data sources. A central problem is how to build high-level abstractions as views that integrate data from many different sources. Data sources may use different data representation techniques, such as relational databases, data exchange formats, HTML-documents, etc. In order to integrate data sources with different representations formats one first needs to translate the data from a data source into data abstractions of the functional AMOS II data model. For this one needs to develop wrappers for each kind of data source. A wrapper has knowledge of the capabilities of a class of data source. We have, e.g., developed wrappers for relational data sources, XML, and STEP/EXPRESS. The problem of translation is non-trivial since it is not sufficient to regard a data source as a black box, but rather the wrapper needs knowledge of how to use the capabilities of a data source in order to process queries (and updates) to it. In the relational case, the wrapper needs knowledge of the functioning of relational query languages and how to translate high-level functional queries into optimized relational database queries. In the XML case data is represented as nested structures described by some meta-data and stored in regular files.

Semantically similar information can be represented differently in different database schemes. This is called semantic heterogeneity and research on how to reconciliate semantic heterogeneity between databases has gotten substantial attention recently. However, many problems remain to be solved here, especially when combining many non-trivial schemes. In the AMOS II case semantic heterogeneity often occurs when combining translated data from related data sources into integrated views. It is then often necessary to specify transformations and translations for representing semantically equivalent data uniformly. The AMOS II approach to semantic heterogeneity is to use a functional multi-database query language to define views that combine and transform several underlying functional AMOS II schemes. This is a very flexible solution that hides details of the integration from the user and yet allows him or her to make powerful queries against the integrated view. The PhD Thesis of Vanja Josifovski includes the development of functional views for such data integration, and how to efficiently optimize and execute queries to such views.

AMOS II Functional Mediators for Information Modelling, Querying, and Integration

Introduction

Ongoing Work

AMOS II
Functional Mediators
for Information Modelling, Querying, and Integration