GRID Stream Data Manager (GSDM)
This project was funded by Vinnova.Project description
The GRID initiative provides an infrastructure for distributed computations among widely distributed high-performance computers. This can allow for efficient exchange and processing of very large amounts of data. For example for Space Physics applications such as LOIS and LOFAR, the data is often produced in real time by large amounts of sensors that receive signals from the space. The data is delivered as very large volumne data streams over very high-speed network connections. It will not be possible or desirable to physically store all produced data in conventional storage nodes; an alternative approach is to do as much processing as possible over moving windows of the data streams. On these data stream windows various numerical data reduction, stream combination, and transformation algorithms are applied in real time before the data is further delivered for visualizations, storage, and other processing. For example, unusual transients and signal patterns can be detected, noise can be removed, and differences between different data streams detected. Such data reduction requires customized numerical algorithms.
GSDM (GRID Stream Data Manager) is a stream oriented data manager and customizable query processor being developed at UDBL that allows very efficient execution of database queries accessing distributed scientific and other customized data representations using GRID technology. The approach is to utilize a number of object-relational main-memory database engines running on GRID nodes and connected through high-speed networks. We leverage upon high-performance, extensible, and object-oriented database technology combined with high-performance LANs, WANs, and GRID technology. Applications include, e.g., engineering, bioinformatics, neuroscience, and space environmental physics. Of particular interest is the development of new distributed data population and query processing techniques utilizing distributed and scalable data structures for very high-performance processing of data from digital space receivers. The architecture allows for including external programs for user-defined data computations in stream queries.
The database community has long experience of frameworks for
optimisation and processing of distributed queries to access large
volumes of data. However, high volume scientific sensor arrays require
orders of magnitude better data processing performance than
conventional database management systems and it also requires support
for queries involving complex customized numerical computations in
real time over data streams.
An approach to achieve high performance over large data streams is to
develop a stream data manager, GSDM, running on highly
connected clusters of main memory database nodes and which is
extensible through user-defined data representations and computational
models. The streamed database should scale by being able to
dynamically incorporate new nodes as the database grows. Unlike
conventional databases, stream oriented databases require continuous
queries (CQs) that continuously deliver result streams once they are
activated.
GSDM utilizes the AMOS
II database management system that provides object-relational DBMS
functionality, peer to peer communication, declarative query language
AmosQL,
and interfaces to C and Java. The kernel is being extended and
modified in order to implement the architecture.
Publications
M.Ivanova and T.Risch: Customizable Parallel Execution of Scientific Stream QueriesTechnical report 2005-012, Department of Information Technology, Uppsala University, ISSN 1404-3203, The 31st International Conference on Very Large Databases, VLDB2005, Trondheim, Norway, 2005.
-
M.G.Koparanova and T.Risch: High-performance Stream-oriented GRID Database Manager for Scientific Data, Proc. 1st European Across Grids Conference, Universidad de Santiago de Compostela, Spain, Feb. 13-14, 2003.
-
T,Risch, M.Koparanova, and B.Thide: High Performance GRID Database Manager for Scientific Data, Proc. 4th Distributed Data and Structures, WDAS'03, Paris, France, pp 99-106, Carleton Scientific, ISBN 1-894145-13-5, 2002.
Milena Ivanova: Scalable Scientific Stream Query Processing Uppsala Dissertations from the Faculty of Science and Technology, No. 66 ISBN: 91-554-6351-7, Acta Universitatis Upsaliensis, 2005.
People
Responsible for this project is Tore Risch. It was the basis for the PhD work of Milena Ivanova.
Last
update: 22/09/2004. Responsible: Tore Risch
Copyright © 2004 Uppsala University, Department of Information Technology