Uppsala universitet
Hoppa över länkar

Information Technology

UDBL Home

People
Research
Publications
Theses
Openings
Master Projects
Contact
Amos II
Wrappers

GRID Stream Data Manager (GSDM)

This project was funded by Vinnova.

Project description

The GRID initiative provides an infrastructure for distributed computations among widely distributed high-performance computers. This can allow for efficient exchange and processing of very large amounts of data. For example for Space Physics applications such as LOIS and LOFAR, the data is often produced in real time by large amounts of sensors that receive signals from the space. The data is delivered as very large volumne data streams over very high-speed network connections. It will not be possible or desirable to physically store all produced data in conventional storage nodes; an alternative approach is to do as much processing as possible over moving windows of the data streams. On these data stream windows various numerical data reduction, stream combination, and transformation algorithms are applied in real time before the data is further delivered for visualizations, storage, and other processing. For example, unusual transients and signal patterns can be detected, noise can be removed, and differences between different data streams detected. Such data reduction requires customized numerical algorithms.

GSDM (GRID Stream Data Manager) is a stream oriented data manager and customizable query processor being developed at UDBL that allows very efficient execution of database queries accessing distributed scientific and other customized data representations using GRID technology. The approach is to utilize a number of object-relational main-memory database engines running on GRID nodes and connected through high-speed networks. We leverage upon high-performance, extensible, and object-oriented database technology combined with high-performance LANs, WANs, and GRID technology. Applications include, e.g., engineering, bioinformatics, neuroscience, and space environmental physics. Of particular interest is the development of new distributed data population and query processing techniques utilizing distributed and scalable data structures for very high-performance processing of data from digital space receivers. The architecture allows for including external programs for user-defined data computations in stream queries.

The database community has long experience of frameworks for optimisation and processing of distributed queries to access large volumes of data. However, high volume scientific sensor arrays require orders of magnitude better data processing performance than conventional database management systems and it also requires support for queries involving complex customized numerical computations in real time over data streams.

An approach to achieve high performance over large data streams is to develop a stream data manager, GSDM, running on highly connected clusters of main memory database nodes and which is extensible through user-defined data representations and computational models. The streamed database should scale by being able to dynamically incorporate new nodes as the database grows. Unlike conventional databases, stream oriented databases require continuous queries (CQs) that continuously deliver result streams once they are activated.

GSDM utilizes the AMOS II database management system that provides object-relational DBMS functionality, peer to peer communication, declarative query language AmosQL, and interfaces to C and Java. The kernel is being extended and modified in order to implement the architecture.

Publications


People

Responsible for this project is Tore Risch. It was  the basis for the PhD work of Milena Ivanova.


Last update: 22/09/2004. Responsible: Tore Risch
Copyright © 2004 Uppsala University, Department of Information Technology