spatial databases

75 %
25 %
Information about spatial databases

Published on October 23, 2007

Author: CoolDude26


PART III SPATIAL AND MULTIMEDIA DATABASES:  PART III SPATIAL AND MULTIMEDIA DATABASES Overview:  Overview Spatial and Geographic Databases Multimedia Databases Spatial and Geographic Databases:  Spatial and Geographic Databases Spatial databases store information related to spatial locations, and support efficient storage, indexing and querying of spatial data. Geographic databases store geographic information (e.g., maps): often called geographic information systems (GIS). Special purpose index structures are important for accessing spatial data, and for processing spatial join queries. Examples of geographic data map data for vehicle navigation distribution network information for power, telephones, water supply, and sewage Slide4:  Types of Spatial Data Spatial data can be either in raster or vector format. Raster data consist of bit maps or pixel maps, in two or more dimensions. Example 2-D raster image: satellite image of cloud cover, where each pixel stores the cloud visibility in a particular area. Additional dimensions might include the temperature at different altitudes at different regions, or measurements taken at different points in time. Slide5:  Vector data are constructed from basic geometric objects: points, line segments, triangles, and other polygons in two dimensions, and cylinders, spheres, cuboids, and other polyhedrons in three dimensions. Vector format often used to represent map data. Roads can be considered as two-dimensional and represented by lines and curves. Some features, such as rivers, may be represented either as complex curves or as complex polygons, depending on whether their width is relevant. Features such as regions and lakes can be depicted as polygons. Representation of geometric constructs:  Representation of geometric constructs Various geometric constructs can be represented in a spatial/geographic database in a normalized fashion: Points Line segments: by the coordinates of their endpoints. Curves: approximated by partitioning the curve into a sequence of segments and create a list of vertices in order, or represent each segment as a separate tuple that also carries with it the identifier of the curve (2D features such as roads). Closed polygons list of vertices in order, starting vertex is the same as the ending vertex, or represent boundary edges as separate tuples, with each containing identifier of the polygon, or use triangulation — divide polygon into triangles and note the polygon identifier with each of its triangles. Slide8:  Spatial data is typically queried using a graphical query language; results are also displayed in a graphical manner. Graphical interface constitutes the front-end Extensions of SQL with abstract data types, such as lines, polygons and bit maps, have been proposed to interface with back-end. allows relational databases to store and retrieve spatial information queries can use spatial conditions (e.g. contains or overlaps). queries can mix spatial and nonspatial conditions Slide9:  Types of Spatial Queries Spatial Range Queries Find all cities within 50 miles of Paris Query has associated region (location, boundary) Answer includes ovelapping or contained data regions Nearest-Neighbor Queries Find the 10 cities nearest to Paris Results must be ordered by proximity Spatial Join Queries Find all cities near a lake Expensive, join condition involves regions and proximity Slide10:  Applications of Spatial Data and common queries Geographic Information Systems (GIS) E.g., ArcInfo; OpenGIS Consortium Geospatial information All classes of spatial queries and data are common Computer-Aided Design/Manufacturing Store spatial objects such as surface of airplane fuselage Range queries and spatial join queries are common Multimedia Databases Images, video, text, etc. stored and retrieved by content First converted to feature vector form; high dimensionality Nearest-neighbor queries are the most common Indexing Spatial data:  Indexing Spatial data Single-Dimensional Indexes B+ Tree Indexes Leaf pages contain data entries, and are chained (prev & next) Non-leaf pages contain index entries and direct searches: Slide12:  B+ Tree: Most Widely Used Index B+ tree is a dynamic structure, ideal for range searches and equality searches. Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. Typical order: 100. Typical fill-factor: 67%. Supports equality and range-searches efficiently Inserts/deletes leave tree height-balanced High fanout means depth rarely more than 3 or 4. Slide13:  Search at cost log (F N) (F = # entries/index pg, N = # leaf pgs) Search begins at root, and key comparisons direct it to a leaf find 28*? 29*? : All > 15* and < 30* . Insert/delete at cost log (F N) Find data entry in leaf, then change it. Need to adjust parent sometimes. And change sometimes bubbles up the tree Slide14:  B+ trees can be used also for spatial data. In fact, when we create a composite search key B+ tree, e.g., an index on <age, sal>, we effectively linearize the 2-dimensional space since we sort entries first by age and then by sal. Consider entries: <11, 80>, <12, 10> <12, 20>, <13, 75> Slide15:  Multidimensional Indexes A multidimensional index clusters entries so as to exploit “nearness” in multidimensional space. Keeping track of entries and maintaining a balanced index structure presents a challenge! Consider entries: <11, 80>, <12, 10> <12, 20>, <13, 75> Slide16:  Motivation for Multidimensional Indexes Spatial queries (GIS, CAD). Find all hotels within a radius of 5 miles from the conference venue. Find the city with population gt 500,000 that is nearest to Paris. Find all cities that lie on the Nile in Egypt. Ideally, want to support non-point data as well (e.g., lines, shapes). An index based on spatial location needed: one-dimensional indexes don’t support multidimensional searching efficiently Similarity queries (content-based retrieval). Given a face, find the five most similar faces. Multidimensional range queries. 50 < age < 55 AND 80KEuros < sal < 90KEuros Slide17:  Spatial Index Structures for Points Tuple t = (x1, …, xk) point in k dimensions grid file kd tree KDB tree …. Slide18:  Grid file (Nievergelt, Hinterberger & Sevcik 84) Slide19:  kd-tree (Bentley 75) kd-tree: early structure used for indexing in multiple dimensions. Each level of a kd-tree partitions the space into two. choose one dimension for partitioning at the root level of the tree. choose another dimension for partitioning in nodes at the next level and so on, cycling through the dimensions. Slide20:  Division of space with kd-Trees: each line in the figure (other than the outside box) corresponds to a node in the kd-tree the maximum number of points in a leaf node has been set to 1. The numbering of the lines in the figure indicates the level of the tree at which the corresponding node appears. Slide21:  In each node, approximately half of the points stored in the sub-tree fall on one side and half on the other. Partitioning stops when a node has less than a given maximum number of points. Range queries: Search those nodes of coordinates xy whose distance from vector (a,b) is less than d The minimum cannot be less than a-d, b-d The maximum cannot be greater than a+d, b+d Therefore at each node x,y values must be compared with values stored in the node to decide which subtree has to be discarded The k-d-B tree extends the k-d tree to allow multiple child nodes for each internal node; well-suited for secondary storage. Slide22:  Spatial Index Structures for Rectangles Rectangles are more difficult than points since do not fall into a single cell of a bucket partition. Most important solution: Overlapping bucket regions These index structures offering operations support for new queries Find regions adjacent to A. Find regions within distance d from B. Slide23:  The R-tree is a tree-structured index that remains balanced on inserts and deletes. Widely used with its variants. R-trees are useful for indexing sets of rectangles and other polygons. Each key stored in a leaf entry is intuitively a box, or collection of intervals, with one interval per dimension. Partition the space into rectangles, without requiring the rectangles to be disjoint. Enclosing rectangular regions are drawn with sides coinciding with the sides of the enclosed rectangles Supported in many modern database systems, along with variants like R+ -trees and R*-trees. Will consider only the two-dimensional case (N = 2) generalization for N > 2 is straightforward, although R-trees work well only for relatively small N The R-Tree (Guttmann 84 ) Example in 2D: Slide25:  A set of rectangles (solid line) and the bounding boxes (dashed line) of the nodes of an R-tree for the rectangles. The R-tree is shown on the right. Slide26:  R-Tree Properties Leaf entry = < n-dimensional box, rid > Box is the tightest bounding box for a data object. Non-leaf entry = < n-dim box, ptr to child node > Box covers all boxes in child node (in fact, subtree) Nodes can be kept 50% full (except root). Can choose a parameter m that is <= 50%, and ensure that every node is at least m% full. All leaves at same distance from root. The R-tree order (n,M) is the minimum and maximum number of rectangles stored at each node. A R-tree of order (n,M) contains at each node between n <= M/2 and M entries Slide27:  Search for Objects Overlapping Box Q Start at root. If current node is non-leaf, for each entry <E, ptr>, if box E overlaps Q, search subtree identified by ptr. If current node is leaf, for each entry <E, rid>, if E overlaps Q, rid identifies an object that might overlap Q. Note: May have to search several subtrees at each node! (In contrast, a B-tree equality search goes to just one leaf.) Slide28:  Insert Entry <B, ptr> Start at root and go down to “best-fit” leaf L. Go to child whose box needs least enlargement to cover B; resolve ties by going to smallest area child. If best-fit leaf L has space, insert entry and stop. Otherwise, split L into L1 and L2. Adjust entry for L in its parent so that the box now covers (only) L1. Add an entry (in the parent node of L) for L2. (This could cause the parent node to recursively split.) Slide29:  Splitting a Node During Insertion The entries in node L plus the newly inserted entry must be distributed between L1 and L2. Goal is to reduce likelihood of both L1 and L2 being searched on subsequent queries. Delete entry consists of searching for the entry to be deleted, removing it, and if the node becomes under-full, deleting the node and then re-inserting the remaining entries. Slide30:  Comments on R-Trees Search advantage: each spatial object (or key) is in a single bucket disadvantage: multiple search paths due to overlapping bucket regions. can improve search performance by using a convex polygon to approximate query shape (instead of a bounding box) and testing for polygon-box intersection. Works very well for 2D and 3D datasets. Several variants (notably, R+ and R* trees) have been proposed and are widely used. Slide31:  R-Tree Variants The R* tree uses the concept of forced reinserts to reduce overlap in tree nodes. When a node overflows, instead of splitting: Remove some (say, 30% of the) entries and reinsert them into the tree. Could result in all reinserted entries fitting on some existing pages, avoiding a split. R* trees also use a different heuristic, minimizing box perimeters rather than box areas during insertion. Another variant, the R+ tree (Beckmann et al. 90 ), avoids overlap by inserting an object into multiple leaves if necessary. Searches now take a single path to a leaf, at cost of redundancy. Slide32:  R+-tree (Sellis, Rossopoulos & Faloutsos 87, Faloutsos, Sellis & Rossopoulos 87) R+ Trees are an alternative to R Trees that avoids overlap of enclosing rectangles. One rectangle is associated with all the enclosing rectangles that it intersects. Data rectangles cut into several pieces, if necessary. The same rectangle can be reached from multiple paths starting from the root Root A B C D E F J J K L Slide33:  Indexing High-Dimensional Data Typically, high-dimensional datasets are collections of points, not regions. E.g., Feature vectors in multimedia applications. Very sparse Nearest neighbor queries are common. R-tree becomes worse than sequential scan for most datasets with more than a dozen dimensions. As dimensionality increases, contrast (ratio of distances between nearest and farthest points) usually decreases; “nearest neighbor” is not any more meaningful. In any given data set, empirically test contrast. Slide34:  Commercial extensible DBMS with spatial extensions, for example: Informix Universal Server, with Geodetic DataBlade (Informix 97) IBM DB2 Spatial Extender (Davis 98) Design Databases:  Design Databases Computer Aided Design (CAD) databases store design information about how objects are constructed e.g. designs of buildings, aircraft, layouts of integrated-circuits. They represent design components as objects (generally geometric objects); the connections between the objects indicate how the design is structured. Design databases also store non-spatial information about objects (e.g., construction material, color, etc.) Spatial integrity constraints are important. e.g., pipes should not intersect, wires should not be too close to each other, etc. Slide36:  Design databases represent different types of objects in different ways: Simple two-dimensional objects: points, lines, triangles, rectangles, polygons. Complex two-dimensional objects: formed from simple objects via union, intersection, and difference operations. Complex three-dimensional objects: formed from simpler objects such as spheres, cylinders, and cuboids, by union, intersection, and difference operations. Wireframe models represent three-dimensional surfaces as a set of simpler objects. Design databases generally do not store raster data. Slide37:  face vertex edge Formed by union, intersection…. Wireframe model Slide38:  Arbitrary polyhedra are represented by dividing them into tetrahedrons, like triangulating polygons. Alternative: List their faces, each of which is a polygon, along with an indication of which side of the face is inside the polyhedron. Representation of points and line segment in 3-D similar to 2-D, except that points have an extra z component Multimedia databases:  Multimedia databases Multimedia Database Systems: Multimedia Systems Multimedia data types Multimedia Indexing and Retrieval Multimedia Databases (compared to Retrieval and Indexing) Multimedia Systems:  Multimedia Systems Definition A computer hardware/software system used for Acquiring and Storing Managing Indexing and Filtering Manipulating (quality, editing) Transmitting (multiple platforms) Accessing large amount of visual information like, Images, video, graphics, and associated multimedia Examples: image and video databases , web media search engines, home media gateway, mobile media navigator, etc. Slide41:  Why it’s important? Adoption of Digital Video New Content Creation Tools Deployment of High-Speed Networks New Content and Services Mobile Internet 3D graphics, network games Media portals Standards become available: coding, delivery, description.... Slide42:  Multimedia systems application chain Multimedia data types :  Multimedia data types Image: storage: encoded as a set of pixels or cell values image descriptor: describes the geometric shape of the raw image: rectangle of m by n grid of cells each cell contains a pixel (=picture element) value that describes the cell content in one (black/white image) or more bits (gray scale, e.g., 8 bits or color image, e.g., 24 bits)  Slide44:  Video Storage: stream of images (sequence of frames) and audio frame = still image presentation at specified rates per time unit Typically video is divided into video segments: each segment is identified by its starting and ending frames is made up of a sequence of contiguous frames that include the same objects/activities= semantic unit and corresponding audio phrases Slide45:  Audio data: Distinguished in speech, music, ... can be structured in sequences: characterized by tone, duration, … when sequence contains speech: characteristics of a certain person’s voice: e.g., loudness, intensity, pitch and clarity when sequence contains music: beat, pitch, chords, ...   Slide46:  Composite or mixed multimedia data (e.g. video and audio data): may be physically mixed to yield a new storage format or logically mixed while retaining original types and formats additional control information describing how the information should be rendered Multimedia data formats and representations:  Multimedia data formats and representations Store and transmit multimedia data in compressed form JPEG and GIF the most widely used formats for image data. MPEG standard for video data use commonalties among a sequence of frames to achieve a greater degree of compression. MPEG-1 quality comparable to VHS video tape. stores a minute of 30-frame-per-second video and audio in approximately 12.5 MB MPEG-2 designed for digital broadcast systems and digital video disks; negligible loss of video quality. Compresses 1 minute of audio-video to approximately 17 MB. Several alternatives of audio encoding MPEG-1 Layer 3 (MP3), RealAudio, WindowsMedia format, etc. Slide48:  Multimedia objects are typically represented as set of features (e.g., as vector of features) features can be weighted (expressing uncertainty or significance of its value) can be stored and searched in an index tree …. Multimedia data indexing:  Multimedia data indexing Indexing is the process of assigning or extracting features that will be used for unstructured and structured queries (refers typically to low-level features of multimedia objects). For video, often indexing is used instead of segmentation and refers to the process of detection of video clips. Two main approaches to indexing: manual: segmentation naming of objects and their relationships with key terms (natural language or controlled language) automatic (identify the mathematical characteristics of the contents): different techniques depending on the type of multimedia source (image, text, video, or audio) based on pattern recognition. possible manual correction Automatic indexing is of great importance in Multimedia database applications Slide50:  Automatic indexing of images: segmentation in homogeneous segments: homogeneity predicate defines the conditions for automatically grouping the cells e.g., in a color image, cells that are adjacent to one another and whose pixel values are close are grouped into a segment indexing: recognition of objects: simple patterns: recognition of low level features : color histograms, textures, shapes (e.g., person, house), position Slide51:  Automatic indexing of video: segment: basic unit for retrieval objects and activities identified in each video segment: can be used to index the segment segmentation: detection of video shot breaks, camera motions boundaries in audio material (e.g., other music tune, changes in speaker) textual topic segmentation of transcripts of audio and of close-captions (see below) heuristic rules based on knowledge of: type-specific schematic structure of video (e.g., documentary, sports) certain cues: appearance of anchor person in news Slide52:  Automatic indexing of audio: segmentation into sequences (= basic units for retrieval): often manually indexing: speech recognition and indexing of the resulting transcripts (cf. indexing written text retrieval) acoustic analysis (e.g., sounds, music, songs: melody transcription: note encoding, interval and rhythm detection and chords information): translated into string (e.g., key melody extraction) An example of indexing:  An example of indexing Learning of textual descriptions of images from surrounding text (Mori et al., 2000): training: images segmented in image parts of equal size feature extraction for each image part (by quantization): 4 x 4 x 4 RGB color histogram 8 directions x 4 resolutions intensity histogram words that accompany the image are inherited by each image part: words are selected from the text of the document that contains the image by selecting nouns and adjectives that occur with a frequency above a threshold cluster similar image parts based on their extracted features: single-pass partitioning algorithm with minimum similarity threshold value An example of indexing:  An example of indexing for each word and each cluster is estimated: P(wi|cj) as where mji = total frequency of word wi in cluster cj Mj = total frequency of all words in cj testing: unknown image is divided into parts and image features are extracted for each part, the nearest cluster is found as the cluster whose centroid is most similar to the part the average likelihood of all the words of the nearest clusters is computed k words with largest average likelihood are chosen to index the new image (in example k = 3) Slide56:  source Mori et al. Slide57:  source Mori et al. Multimedia Information Retrieval :  Multimedia Information Retrieval Multimedia information retrieval : deals with the storage, retrieval, transport and presentation of different types of multi-media data (e.g., images, video clips, audio clips, texts,…) similarity matching of uncertain query and document representations result: list of documents according to relevance retrieval process: queries indexing the documents matching document and query representations Slide59:  Examples of similarity based retrieval Pictorial data: Two pictures or images that are slightly different as represented in the database may be considered the same by a user. E.g., identify similar designs for registering a new trademark. Audio data: Speech-based user interfaces allow the user to give a command or identify a data item by speaking. E.g., test user input against stored commands. Handwritten data: Identify a handwritten data item or command stored in the database QBIC system (IBM 90’s) :  QBIC system (IBM 90’s) Color: QBIC computes the average Munsell (Miyahara,, 1988) coordinates of each object and image, plus a k element color histogram (k is typically 64 or 256) that gives the percentage of the pixels in each image in each of the k colors. Texture: QBIC's texture features are based on modified versions of the coarseness, contrast, and directionality features proposed in (H. Tamura,, 1978). Coarseness measures the scale of the texture, contrast describes the vividness of the pattern, and directionality describes whether or not the image has a favored direction or is isotropic (grass versus a smooth object). Shape: QBIC has used several different sets of shape features. One is based on a combination of area, circularity, eccentricity, major axis orientation and a set of algebraic moment invariants. A second is the turning angles or tangent vectors around the perimeter of an object, computed from smooth splines fit to the perimeter. The result is a list of 64 values of turning angle. Multimedia Database Management Systems:  Multimedia Database Management Systems To provide database functions like indexing and consistency, it is desirable to store multimedia data in a database rather than storing them outside the database, in a file system The database must handle large object representation. Similarity-based retrieval must be provided by special index structures. Must provide guaranteed steady retrieval rates for continuous-media data. Slide62:  Multimedia database management systems should combine the Database Management System (DBMS) and information retrieval (IR) technology: data modeling capabilities of DBMSs with the advanced and similarity based query capabilities of IR systems Challenge = finding a data model that ensures: effective query formulation and document representation efficient storage efficient matching effective delivery Slide63:  Query formulation: must accommodate information needs of users of multimedia systems Document representations and storage: appropriate modeling of the structure and content of the wide range of data of many different formats (= indexing) XML MPEG-7 dealing with thousands of images, documents, audio and video segments, and free text modeling of physical properties for compression/ decompression, synchronization, delivery MPEG-21 Slide64:  Matching of query and document representations: taking into account the variety of attributes and their relationships of query and document representations combination of exact matching of structured data with uncertain matching of unstructured data Delivery of data: browsing, retrieval temporal constraints of video and audio presentation merging of data from different sources (e.g., in medical networks) MMDBMS Queries:  MMDBMS Queries As in many retrieval systems, the user has the opportunity to browse and navigate through hyperlinks with querying: need of: topic maps summary descriptions of the multimedia objects Queries specifying the conditions of the objects of interest idea of multimedia query language: should provide predicates for expressing conditions on the attributes, structure and content (semantics) of multimedia objects Slide66:  attribute predicates: concern the attributes of multimedia objects with an exact value (cf. traditional DB attributes): e.g., date of a picture, name of a show structural predicates: temporal predicates to specify temporal synchronization: for continuous media such as audio and video for expressing temporal relationships between the frame representations of a single audio or video e.g., “Find all the objects in which a jingle is playing for the duration of an image display” Slide67:  spatial predicates: to specify spatial layout properties for the presentation of multimedia objects: examples of predicates: contain, is contained in, intersect, is adjacent to, e.g., “Find all the objects containing an image overlapping the associated text” temporal and spatial predicates can be combined : e.g., “ Find all the objects in which the logo of the car company is displayed, and when it disappears, a graphic showing the increase in the company sales is shown in the same position where the logo was” temporal and spatial predicates can: refer to whole objects refer to subcomponents of objects: with data model that supports complex object representation Slide68:  semantic predicates: concern the semantic and unstructured content of the data involved. Are represented by the features that have been extracted and stored for each multimedia object e.g.,”Find all the objects containing the word OFFICE or “Find all red houses” uncertainty, proximity and weights can be expressed in query Slide69:  Multimedia query language: structured language users do not formulate queries in this language, but enter query conditions by means of interfaces (natural language queries) interface translates query to correct query syntax Query by example: e.g., video, audio: the query is composed by picking an example and choosing the features the object must comply with e.g., in a graphical user interface (GUI): users choose the image of a house and domain features for the query: “Retrieve all houses of similar shape and different color” e.g., music: recorded melody, note sequence being entered by Musical Instruments Digital Interface (MIDI) VideoQ:  VideoQ MMDBMS Example: Oracle’s interMedia :  MMDBMS Example: Oracle’s interMedia Enables Oracle to manage rich content, including images, audio, and video information in an integrated fashion with other traditional business data. interMedia can parse, index, and store rich content, develop content rich Web applications, deploy rich content on the Web, and tune Oracle content repositories. interMedia enables data management services to support the rich data types used in electronic commerce catalogs, corporate repositories, Web publishing, corporate communications and training, media asset management, and other applications for internet, intranet, extranet, and traditional application in an integrated fashion

Add a comment

Related presentations

Related pages

Spatial database - Wikipedia, the free encyclopedia

A spatial database, or geodatabase is a database that is optimized to store and query data that represents objects defined in a geometric space.
Read more

An Introduction to Spatial Database Systems - UF CISE

An Introduction to Spatial Database Systems Ralf Hartmut Güting Praktische Informatik IV, FernUniversität Hagen D-58084 Hagen, Germany gueting@fernuni ...
Read more

20 Minutes to Understanding Spatial Database | CUBRID Blog

Spatial RDBMS is an RDBMS that can process spatial data. Popular RDBMSs, such as Oracle, offer their own Spatial RDBMS features or add-ons so that spatial ...
Read more

What is a Spatial Database? - Definition from Techopedia

A spatial database is a database that is enhanced to store and access spatial data or data that defines a geometric space. These data are often associated ...
Read more

Spatial Database Systems - FernUniversität in Hagen ...

Spatial Database Systems Tutorial Notes Ralf Hartmut Güting Fernuniversität Hagen Praktische Informatik IV D-58084 Hagen Germany
Read more

Spatial Data (SQL Server)

Database Features Spatial Data. Spatial Data. Spatial Data. ... For a detailed description and examples of spatial features introduced in SQL Server 2012, ...
Read more

Object-based spatial database - Wikipedia, the free ...

An object-based spatial database is a spatial database that stores the location as objects. The object-based spatial model treats the world as surface ...
Read more

Working with Spatial Data (Database Engine)

SQL Server 2008 supports the geometry and geography data types for storing spatial data. These types support methods and properties that allow for the ...
Read more

Spatial Databases, 1st Edition | Philippe Rigaux, Michel ...

Elsevier Store: Spatial Databases, 1st Edition from Philippe Rigaux, Michel Scholl, Agnès Voisard. ISBN-9781558605886, Printbook , Release Date: 2001
Read more


Read more