AMOS II
Functional Mediators
for Information Modelling, Querying, and Integration
Tore Risch
UDBL
Uppsala University
PO Box 337
SE-751 05 Uppsala
Sweden
Phone: +46-18 471 63 42 Fax: +46-18 511 925 E-mail: Tore.Risch@it.uu.se
The computing environments have become increasingly
distributed through the use of Internet and other computer communication
networks. What we are experiencing is an ever increasing access to more
or less structured information that is furthermore very dynamic and is continuously
changing. In this environment it is getting more and more critical to develop
methods for building systems that combine relevant data from many sources
and present them in a form which is comprehensible for users. It is getting
important to develop tools that facilitate the development and maintenance
of information systems in a highly dynamic and distributed environment.
The wrapper-mediator approach divides the functionality of a
data integration system into two kinds of subsystems. The wrappers
provide access to the data in the data sources using a common data
model (CDM), and a common query language. The mediators
provide coherent views of the data in the repositories by performing semantic
reconciliation of the CDM data representations provided by the wrappers.
The purpose of the AMOS II project is to develop and demonstrate a mediator
architecture for supporting information systems where applications and
users combine and analyze data from many different data sources. A data
source can be a conventional database but also text files, data exchange
files, WWW pages, programs that collect measurements, or even programs
that perform computations and other services. The data sources are distributed
over a communication network such as the Internet. The application areas
for this kind of architecture include engineering, telecom, and decision
support applications. In the AMOS II architecture the applications access
the data sources through one or several mediator databases. A mediator
database presents high-level abstractions (or views) of combinations of
data sources. It makes the combined data accessible and composable through
the use of high-level queries and views and relieves the user as well
as the application programmer from details of the data sources. Furthermore
the mediator database itself can also store its own data. For example
there is normally application oriented data that is not tied to any particular
data source, but rather to the mediator itself.
A mediator database should represent its information
in a form which is suitable for the type of applications that is using it.
Therefore a mediator database should be domain oriented in the sense that
both its meta-data (i.e. its schema) and its data operations should be specialized
for its application domain. The performance requirements on mediator databases
can be very high for some application domains.
With the AMOS II approach the mediator databases are
designed using a very light weight DBMS, the AMOS II mediator DBMS, which
is extensible and permit high-level functional and object-oriented abstractions and queries
over data sources. A central enabling technology in AMOS II is extensible
and functional database query technology that can be customized for
specific application areas. Since the AMOS II mediator DBMS is very light
weight it is feasible to embed it in other systems. Furthermore, to integrate
and coordinate data from many data sources that are distributed over the
network, many AMOS II mediator servers can run in different locations and
communicate with each other in a client-server or peer-to-peer fashion.
Thus, unlike extensible heavy-weight DBMSs or mediators based on relational
DBMSs, the AMOS II mediator DBMS is a small light-weight functional
kernel system that allows powerful data modelling,
can be extended, and that uses other back-end systems to do much
of the work.
The following problem areas are studied so far:
The mediator database needs high level data modeling
primitives to support advanced applications. Central to the AMOS II approach
is to use an extensible next-generation functional and object-oriented query and modelling language. Advanced applications need
not only advanced data modeling primitives, but also the possibility
to extend the system on all levels. Vanja Josifivski made his Ph.D. on the
central problems of methods for defining, processing, optimizing, and executing
functional queries in a distributed mediator system.
The extensibility of AMOS II allows us to develop
specialized database systems, which we call Domain Mediators, for different
application areas. The Ph.D. Thesis of Kjell Orsborn concerns the development
of a Domain Mediator for numerical and spatial models within the Mechanical
Engineering domain. His work shows that numerical models for Finite Element
Analysis can be represented as high-level abstractions in AMOS II with retained
(and even improved) efficiency compared with traditional implementations.
The application domain requires very high performance and many data representation
techniques have been developed in the past to achieve this. It is important
to make these techniques available within the mediator databases by being
able to extend the mediator DBMS on both abstraction, query processing,
and data representation levels.
Many problem need to be solved in order to combine
information from many distributed data sources. A central problem is how
to build high-level abstractions as views that integrate data from many
different sources. Data sources may use different data representation techniques,
such as relational databases, data exchange formats, HTML-documents, etc.
In order to integrate data sources with different representations formats
one first needs to translate the data from a data source into data abstractions
of the functional AMOS II data model. For this one needs to develop wrappers
for each kind of data source. A wrapper has knowledge of the capabilities
of a class of data source. We have, e.g., developed wrappers for relational
data sources, XML, and STEP/EXPRESS. The problem of translation is non-trivial
since it is not sufficient to regard a data source as a black box, but rather
the wrapper needs knowledge of how to use the capabilities of a data source
in order to process queries (and updates) to it. In the relational case,
the wrapper needs knowledge of the functioning of relational query languages
and how to translate high-level functional queries into optimized relational
database queries. In the XML case data is represented as nested structures
described by some meta-data and stored in regular files.
Semantically similar information can be represented
differently in different database schemes. This is called semantic
heterogeneity and research on how to reconciliate semantic heterogeneity
between databases has gotten substantial attention recently. However,
many problems remain to be solved here, especially when combining
many non-trivial schemes. In the AMOS II case semantic heterogeneity
often occurs when combining translated data from related data sources
into integrated views. It is then often necessary to specify transformations
and translations for representing semantically equivalent data uniformly.
The AMOS II approach to semantic heterogeneity is to use a functional
multi-database query language to define views that combine and transform
several underlying functional AMOS II schemes. This is a very
flexible solution that hides details of the integration from the
user and yet allows him or her to make powerful queries against
the integrated view. The PhD Thesis of Vanja Josifovski includes
the development of functional views for such data integration,
and how to efficiently optimize and execute queries to such views.
|