Ranking in Spatial Database for Data Structures | CMSC 420, Study notes of Data Structures and Algorithms

Material Type: Notes; Professor: Samet; Class: Data Structures; Subject: Computer Science; University: University of Maryland; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-mvw
koofers-user-mvw 🇺🇸

10 documents

1 / 47

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
nn0
Copyright © 1998 Hanan Samet
These notes may not be reproduced by any means (mechanical or elec-
tronic or any other) without the express written permission of Hanan Samet
RANKING IN SPATIAL DATABASES
GÍSLI R. HJALTASON
HANAN SAMET
COMPUTER SCIENCE DEPARTMENT AND
CENTER FOR AUTOMATION RESEARCH AND
INSTITUTE FOR ADVANCED COMPUTER STUDIES
UNIVERSITY OF MARYLAND
COLLEGE PARK, MARYLAND 20742-3411 USA
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f

Partial preview of the text

Download Ranking in Spatial Database for Data Structures | CMSC 420 and more Study notes Data Structures and Algorithms in PDF only on Docsity!

nn

Copyright © 1998 Hanan Samet

These notes may not be reproduced by any means (mechanical or elec- tronic or any other) without the express written permission of Hanan Samet

RANKING IN SPATIAL DATABASES

GÍSLI R. HJALTASON

HANAN SAMET

COMPUTER SCIENCE DEPARTMENT AND

CENTER FOR AUTOMATION RESEARCH AND

INSTITUTE FOR ADVANCED COMPUTER STUDIES

UNIVERSITY OF MARYLAND

COLLEGE PARK, MARYLAND 20742-3411 USA

nn

RANKING PROBLEM

  • Ex: Find all the houses in the database in the order of

their distance from locationp or objecto

  • Ex: Find the nearest city of population greater than 30,000 to Portland, Maine
  • If we use an index on the locational attributes corresponding to the location of the cities, then we want to obtain the cities in the order of their distance from Portland, Maine
  • The process ceases once the condition on the non- locational population attribute is satisfied
  • Query is also a partial ranking query
  • We are interested in a solution where if the closest record does not satisfy our query, then we can get the next clos- est record by continuing from where we last left off rather that having to restart from the reference point of the index
  • Used in browsing applications
    1. can restrict query region
    2. can vary nature of features retrieved as well as query object
    3. can serve as basis of incremental spatial join operations
  • Our solution is illustrated by an example spatial index making use of a disjoint decomposition (e.g., PR quad- tree) although implemented for other spatial indexes including R-trees

Copyright © 1998 by Hanan Samet

nn

SPATIAL DATABASES

  • Distinguished from conventional databases, in part, by fact that some of the attributes are locational in which case they have a common dimensional unit which is distance in space
    1. distance unit is the same whether we are in one, two, or three dimensions, etc.
    2. if combine attributes and seek to find nearest record of

typet to Chicago, then unit is distance regardless of

whether there are one, two, three, ... (or even more)

locational attributes associated witht

  1. nearest is not meaningful for combinations of non- locational attributes
  2. not all attributes with type distance are locational (e.g., size corresponding to pant length, height, waist, etc.)
  • Spatial databases are differentiated by the type of records that they store
  1. points (e.g., locations of features)
  • zero volumetric measure
  • discrete
  1. features (i.e., space occupied by the features)
  • nonzero volumetric measure (i.e., extent)
  • continuous
  • Records in a conventional database are always discrete
  1. can be viewed as points in a higher dimensional space
  2. difference is that for spatial attributes, dimensional unit is always distance in space

Copyright © 1998 by Hanan Samet

nn

ORDERING DATA

  • Use a combination of the values of the locational attributes
  • Facilitates storage of records
  • Desirable for ordering to preserve proximity — i.e., records close to each other in the multidimensional space should also be close to each other in the ordering
  • Hashing is a way of achieving ordering
    1. explicit order
      • mapping from higher dimensional space to one- dimensional space
      • e.g., bit interleaving (Morton order), Peano-Hilbert, Sierpinski, etc.
      • result is a space-filling curve
      • no order has property that ALL records that are close to each other in the multidimensional space of the locational attributes are also close to each other in the range of the mapping
    2. implicit order
      • bucketing methods
      • sort records on the basis of the space they occupy and group into cells or buckets of finite capacity

Copyright © 1998 by Hanan Samet

nn R-TREES

Objects grouped into hierarchies, stored in another structure such as a B-tree

Object has single bounding rectangle, yet area that it spans may be included in several bounding rectangles

Does not result in disjoint decomposition of space

a

b

c

d

e

f

g

h

i

1 b

Order (m,M ) R-tree

1. betweenm M/2 andM entries in each node

except root

  1. (^) at least 2 entries in root unless a leaf node

Copyright © 1998 by Hanan Samet

nn R-TREES

Objects grouped into hierarchies, stored in another structure such as a B-tree

Object has single bounding rectangle, yet area that it spans may be included in several bounding rectangles

Does not result in disjoint decomposition of space

a

b

c

d

e

f

g

h

i

1 b

Order (m,M ) R-tree

1. betweenm M/2 andM entries in each node

except root

  1. (^) at least 2 entries in root unless a leaf node

Copyright © 1998 by Hanan Samet

(^2) nn r

R

R

R R

R3: a b R4: d g h R5: c i R6: e f Copyright © 1998 by Hanan Samet

nn R-TREES

Objects grouped into hierarchies, stored in another structure such as a B-tree

Object has single bounding rectangle, yet area that it spans may be included in several bounding rectangles

Does not result in disjoint decomposition of space

a

b

c

d

e

f

g

h

i

1 b

Order (m,M ) R-tree

1. betweenm M/2 andM entries in each node

except root

  1. (^) at least 2 entries in root unless a leaf node

Copyright © 1998 by Hanan Samet

(^2) nn r

R

R

R R

R3: a b R4: d g h R5: c i R6: e f Copyright © 1998 by Hanan Samet

(^3) nn z

R3 R4 R5 R

R

R

R1: R2:

Copyright © 1998 by Hanan Samet

(^4) nn g

R0: R1 R

R

Copyright © 1998 by Hanan Samet

nn SEARCHING FOR A POINT OR LINE SEGMENT IN AN R-TREE

1 b

a b d g h c i e f

R1 R

R3 R4 R5 R

a

b

c

d

e

f

g

h

i

R

R R

R2 R

R

Q

May have to examine many nodes since a line segment can be contained in the covering rectangles of many nodes yet its record is contained in only one leaf node (e.g., i in R2, R3, R4, and R5)

Ex: Search for a line segment containing point Q

R3: R4: R5: R6:

R1: R2:

R0:

R

Copyright © 1998 by Hanan Samet

nn SEARCHING FOR A POINT OR LINE SEGMENT IN AN R-TREE

1 b

a b d g h c i e f

R1 R

R3 R4 R5 R

a

b

c

d

e

f

g

h

i

R

R R

R2 R

R

Q

May have to examine many nodes since a line segment can be contained in the covering rectangles of many nodes yet its record is contained in only one leaf node (e.g., i in R2, R3, R4, and R5)

Ex: Search for a line segment containing point Q

R3: R4: R5: R6:

R1: R2:

R0:

R

Copyright © 1998 by Hanan Samet

nn

Q is in R

2 v

Copyright © 1998 by Hanan Samet

nn

Q can be in both R1 and R

3 r

Copyright © 1998 by Hanan Samet

nn SEARCHING FOR A POINT OR LINE SEGMENT IN AN R-TREE

1 b

a b d g h c i e f

R1 R

R3 R4 R5 R

a

b

c

d

e

f

g

h

i

R

R R

R2 R

R

Q

May have to examine many nodes since a line segment can be contained in the covering rectangles of many nodes yet its record is contained in only one leaf node (e.g., i in R2, R3, R4, and R5)

Ex: Search for a line segment containing point Q

R3: R4: R5: R6:

R1: R2:

R0:

R

Copyright © 1998 by Hanan Samet

nn

Q is in R

2 v

Copyright © 1998 by Hanan Samet

nn

Q can be in both R1 and R

3 r

Copyright © 1998 by Hanan Samet

(^4) nn z

Searching R1 first means that R4 is searched but this leads to failure even though Q is part of i which is in R

Copyright © 1998 by Hanan Samet

nn DISJOINT CELLS

Objects decomposed into disjoint subobjects; each subobject in different cell

Drawback: in order to determine area covered by object, must retrieve all cells that it occupies

Techniques differ in degree of regularity

R+-tree (also k-d-B-tree) and cell tree are examples of this technique

a

b

c

d

e

f

g

h

i

1 b

Q

Copyright © 1998 by Hanan Samet

nn DISJOINT CELLS

Objects decomposed into disjoint subobjects; each subobject in different cell

Drawback: in order to determine area covered by object, must retrieve all cells that it occupies

Techniques differ in degree of regularity

R+-tree (also k-d-B-tree) and cell tree are examples of this technique

a

b

c

d

e

f

g

h

i

1 b

Q

Copyright © 1998 by Hanan Samet

(^2) nn r

R R

R

R

R3: d g h R4: c h i R5: c f i R6: a b e i

Copyright © 1998 by Hanan Samet

nn DISJOINT CELLS

Objects decomposed into disjoint subobjects; each subobject in different cell

Drawback: in order to determine area covered by object, must retrieve all cells that it occupies

Techniques differ in degree of regularity

R+-tree (also k-d-B-tree) and cell tree are examples of this technique

a

b

c

d

e

f

g

h

i

1 b

Q

Copyright © 1998 by Hanan Samet

(^2) nn r

R R

R

R

R3: d g h R4: c h i R5: c f i R6: a b e i

Copyright © 1998 by Hanan Samet

(^3) nn z

R3 R4 R5 R

R

R

R1: R2:

Copyright © 1998 by Hanan Samet

(^4) nn g

R0: R1 R

R

Copyright © 1998 by Hanan Samet

nn

UNIFORM GRID

  • Ideal for uniformly distributed data
  • Supports set-theoretic operations
  • Spatial data (e.g., line segment data) is rarely uniformly distributed

Copyright © 1998 by Hanan Samet