Indexing Techniques for Sequential Files, Slides of Database Management Systems (DBMS)

Various methods for efficiently representing and accessing relations stored as sequential files on a disk. It covers the concept of indexes, their data structures like sorted files, dense and sparse indexes, b-trees, and hash tables. The document also compares the advantages and disadvantages of sparse and dense indexes and explains how to handle duplicate search keys.

Typology: Slides

2012/2013

Uploaded on 04/26/2013

divyesh
divyesh 🇮🇳

4.2

(6)

83 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Indexes on Sequential Files
1
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Indexing Techniques for Sequential Files and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Indexes on Sequential Files

How to Represent a Relation

  • Suppose we scatter its records arbitrarily

among the blocks of the disk

  • How to answer SELECT * FROM R?
  • Scan every block:
    • ridiculously slow
    • would require lots of overhead info in each block and each record header

Indexes

  • Use indexes -- special data structures -- that allow us to find all the records that satisfy a condition "efficiently"
  • Possible data structures:
    • simple indexes on sorted files
    • secondary indexes on unsorted files
    • B-trees
    • hash tables

Sorted Files

  • Sorted file: records (tuples) of the file (relation) are in sorted order of the field (attribute) of interest.
  • This field might or might not be a key of the relation.
  • This field is called the search key.
  • A sorted file is also called a sequential file.

Dense Indexes

  • An index with one entry for every key in the data file
  • What's the point?
  • Index is much smaller than data file when record contains much more than just the search key
  • If index is small enough to fit in main memory, record with a certain search key can be found quickly: binary search in memory, followed by only one disk I/O

Example of a Dense Index

8

Sequential File

20

10

40

30

60

50

80

70

100

90

Dense Index 10 20 30 40 50 60 70 80 90 100 110 120

Sparse Index

  • Uses less space than a dense index
  • Requires more time to find a record with a

given key

  • In a sparse index, there is just one

(key,pointer) pair per data block.

  • The key is for the first record in the block.

Sparse Index Example

11

Sequential File

20

10

40

30

60

50

80

70

100

90

Sparse Index

10 30 50 70 90 110 130 150 170 190 210 230

Comparing Sparse and Dense

Indexes

  • Sparse index uses much less space
    • In the previous numeric example, sparse index size is now only 1000 index blocks, about 4 Mbytes
  • Dense index, unlike sparse, lets us answer "is

there a record with key K ?" without having to retrieve a data block

Multiple Levels of Index

  • Make an index for the index
  • Can continue this idea for more levels, but

usually only two levels in practice

  • Second and higher level indexes must be

sparse, otherwise no savings

Numeric Example Again

  • Suppose we put a second-level index on the first- level sparse index
  • Since first-level index uses 1000 blocks and 100 key- pointer pairs fit per block, we need 10 blocks for second-level index
  • Very likely to keep the second-level index in memory
  • Thus search requires at most two disk I/O's (one for block of first-level index, one for data block)

Duplicate Search Keys

  • What if more than one record has a given search key value? (Then the search key is not a key of the relation.)
  • Solution 1: Use a dense index and allow duplicate search keys in it.
  • To find all data records with search key K , follow all the pointers in the index with search key K

Duplicate Search Keys with Dense

Index

  • Solution 2: only keep record in index for first data record with each search key value (saves some space in the index)
  • To find all data records with search key K, follow the one pointer in the index and then move forward in the data file

Solution 2 Example

20

10

10

20

10

30

20

30

30

45

40

10 20 30 40