Download Indexing Techniques for Sequential Files and more Slides Database Management Systems (DBMS) in PDF only on Docsity!
Indexes on Sequential Files
How to Represent a Relation
- Suppose we scatter its records arbitrarily
among the blocks of the disk
- How to answer SELECT * FROM R?
- Scan every block:
- ridiculously slow
- would require lots of overhead info in each block and each record header
Indexes
- Use indexes -- special data structures -- that allow us to find all the records that satisfy a condition "efficiently"
- Possible data structures:
- simple indexes on sorted files
- secondary indexes on unsorted files
- B-trees
- hash tables
Sorted Files
- Sorted file: records (tuples) of the file (relation) are in sorted order of the field (attribute) of interest.
- This field might or might not be a key of the relation.
- This field is called the search key.
- A sorted file is also called a sequential file.
Dense Indexes
- An index with one entry for every key in the data file
- What's the point?
- Index is much smaller than data file when record contains much more than just the search key
- If index is small enough to fit in main memory, record with a certain search key can be found quickly: binary search in memory, followed by only one disk I/O
Example of a Dense Index
8
Sequential File
20
10
40
30
60
50
80
70
100
90
Dense Index 10 20 30 40 50 60 70 80 90 100 110 120
Sparse Index
- Uses less space than a dense index
- Requires more time to find a record with a
given key
- In a sparse index, there is just one
(key,pointer) pair per data block.
- The key is for the first record in the block.
Sparse Index Example
11
Sequential File
20
10
40
30
60
50
80
70
100
90
Sparse Index
10 30 50 70 90 110 130 150 170 190 210 230
Comparing Sparse and Dense
Indexes
- Sparse index uses much less space
- In the previous numeric example, sparse index size is now only 1000 index blocks, about 4 Mbytes
- Dense index, unlike sparse, lets us answer "is
there a record with key K ?" without having to retrieve a data block
Multiple Levels of Index
- Make an index for the index
- Can continue this idea for more levels, but
usually only two levels in practice
- Second and higher level indexes must be
sparse, otherwise no savings
Numeric Example Again
- Suppose we put a second-level index on the first- level sparse index
- Since first-level index uses 1000 blocks and 100 key- pointer pairs fit per block, we need 10 blocks for second-level index
- Very likely to keep the second-level index in memory
- Thus search requires at most two disk I/O's (one for block of first-level index, one for data block)
Duplicate Search Keys
- What if more than one record has a given search key value? (Then the search key is not a key of the relation.)
- Solution 1: Use a dense index and allow duplicate search keys in it.
- To find all data records with search key K , follow all the pointers in the index with search key K
Duplicate Search Keys with Dense
Index
- Solution 2: only keep record in index for first data record with each search key value (saves some space in the index)
- To find all data records with search key K, follow the one pointer in the index and then move forward in the data file
Solution 2 Example
20
10
10
20
10
30
20
30
30
45
40
10 20 30 40