Spatial and multi-media databases --------------------------------- Regular DBs: - Tables - Records - Limited length fields - well structured New kinds of data: - Text, documents - HTML,XML, XML-Schema documents - bitmaps, raster images - audio - video - maps - time series - vector data, geometrical models Properties: - (very) large data sets - more or less complex internal structure - Need for special software (hardware) for presentation Conventional solutions for storing multimedia data in relational databases -------------------------------------------------------------------------- - Store file names in database (more later) - BLOBs, CLOBx Binary/Character Large Objects (p 423, 658) Very long unstructured fields Supported by most commercial DBMSs Need special APIs FILMS | NAME STRING | LENGTH INTEGER | CONTENTS BLOB | ------+-------------+----------------+---------------+ |Gone with the| 4 hours | large BLOB | |wind | | | +-------------+----------------+---------------+ | | | | select contents from films where name = 'Gone with the wind' - Naive implementation would retrieve several gigabytes of data -> Crash - Instead BLOBs returned as steam handle with operators: - open - close - read(blob,from,to) Advanced interfaces have: - insert(blob,pos,length,bitstring) - delete(blob,pos,length) Enough to make video-recorder and editor! - Special buffer management for BLOBs - Store file names in RDB: - Disadvantage: managed outside DBMS file can be renamed/deleted file system deleted/renamed network link down - Modern solution, Data Links (IBM) - New data type, datalink - point to filename - managed by DBMS - OS intercepted, cannot be changed by user - insert into DB by handing over file to DBMS Technical issues in multi-media databases ----------------------------------------- - Data Model - BLOBS have no structure => just a data store - multi-media data have structure, e.g. - chapters, scenes, sections, frames, regions => need to model these meta-data - Should also have references between meta-data and their contents - e.g. be able to query contents of BLOBs - access inside BLOB from queries => need support for complex objects, 20.5.2 => need for object-relational techniques - Design - Define meta-data - Choice of physical clustering, indexing, etc. for efficient retrieval - Storage methods - for efficient representation - compression - encryptation - memory hierarchies (multi-level stores), 13.1.1 - special buffering - replication - standards JPEG, MPEG - striping, RAID (13.10) paralleization, high disk speed - Retrieval - Retrieval by ranking Information retrieval techniques - presentation through special devices 3D-modeler, VRML, web browser - Performance - streamed access - early delivery - real time requirements, QoS, e.g. 60 frames /s - special indexing - special query processing => object-relational techniques - Indexing - Extract features at insert e.g. color spectrum, sharpness, face forms, finger print forms, bounding boxes - Manually extract features e.g. transcripts, scenes, etc. - Store feature vectors together with raw image e.g. {1.3,3.4,4.5} feature of blob B1 - Queries often means finding image etc. closest to queried image (proximity queries) - Metod - extract same features from queried image - retrieve images close to or overlapping features of queried image using regular query conditions - make careful image comparison on small number of retrieved images e.g. foreign function in C => object relational features useful - important that feature extraction is restrictive and relevant - How to make efficient retrieval of overlapping regions (bounding boxes): - Can always use < > etc. in queries - However a B-tree is good only for interval searches, 1D => > 1D may make B-tree very slow - R-tree (region tree) special index structure to efficiently index region overlaps (in ORACLE, IBM, ...) - Based on hierarchy of minimum bounding boxes - Indexing techniques works only for rather small number of dimensions (< ca 8) - Alt. if DBMS does not have multi-media management: - do filtering in client => lots of communication and copying - filtering in DBMS avoids lots of copying