Data Indexing: Understanding Tree-based and Hash-based Indexes - Prof. Yicheng Tu, Study notes of Deductive Database Systems

Learn about data indexing, its types including tree-based and hash-based indexes, primary and secondary indexes, and clustered and non-clustered indexes. The concepts, features, and examples of tree-based indexes, specifically the isam and b+ tree.

Typology: Study notes

Pre 2010

Uploaded on 02/09/2009

koofers-user-7ri-1
koofers-user-7ri-1 🇺🇸

9 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Indexing
Index: a data structure used to speed up selections on data records based on some attributes
Example: Index on ‘age’ attribute of Employee table.
age is a search key
An index consists of data entries that point to some raw data records
<key_value, data_pointer>
data_pointer raw data
Cost of calling pointer to raw data is constant e.g. 1 page I/O
Type of Indexes:
1) Tree-based
2) Hash-based
3) Bitmap-based
Primary Index vs. Secondary Index
Primary: uses the key attribute of a relation as the search key
Example: an index based on SSN of Employee table
Secondary: search key is NOT the key attribute
Example: an index based on ‘age’ of Employee
Clustered vs. un-clustered index
If the order of data records in file is the same as the order of data entries in the index, that index
is called CLUSTERED.
Index
interface b/w ______________________________________________________
raw data
There can only be ONE clustered index
a
0
p
0
a
1
p
1
a
0
a
1
….
a
n
-
1
a
n
pf3
pf4
pf5

Partial preview of the text

Download Data Indexing: Understanding Tree-based and Hash-based Indexes - Prof. Yicheng Tu and more Study notes Deductive Database Systems in PDF only on Docsity!

Data Indexing

Index: a data structure used to speed up selections on data records based on some attributes

Example: Index on ‘age’ attribute of Employee table. age is a search key

An index consists of data entries that point to some raw data records <key_value, data_pointer>

data_pointer raw data Cost of calling pointer to raw data is constant e.g. 1 page I/O

Type of Indexes:

  1. Tree-based
  2. Hash-based
  3. Bitmap-based

Primary Index vs. Secondary Index

Primary: uses the key attribute of a relation as the search key

Example: an index based on SSN of Employee table

Secondary: search key is NOT the key attribute

Example: an index based on ‘age’ of Employee

Clustered vs. un-clustered index

If the order of data records in file is the same as the order of data entries in the index, that index is called CLUSTERED.

Index

interface b/w ______________________________________________________

raw data

There can only be ONE clustered index

a 0 p 0 a 1 p 1

a 0 a 1 …. an- 1 an

Cluster indexes give better performance in searching a range of values.

Tree structure is static Just insert/delete by modifying the leaf nodes

Example: Insert 23, 48, 41, 42

40

root

20 33 51 63

10 15 20 27 33 37 40 46 51 55 63 97

this part of the tree is never touched

23 48 41

42

Example: Delete 51, 97

2. B+^ -tree Follows the idea of ISAM. But optimizes towards search performance (I/O time) in a dynamic environment.

F: fanout d: order – number of data entries are maintained in range [d, 2d] meaning we leave some empty space in the tree nodes.

We chain all the leaf nodes together

Example:

B+-tree in real life Fanout = 133 each node = 1 page Typical order = 100

Height 4 tree: 133^4 ≈ 312M records Height 3 tree: 235M