Disk Management in Database Management Systems - Prof. Kristen R. Lefevre, Study notes of Database Management Systems (DBMS)

The role of disks in database management systems, the memory hierarchy, and the performance implications of data storage and retrieval. It also covers disk geometry, arranging blocks on disk, disk space management, and buffer management.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-19d
koofers-user-19d 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 1
Storing Data: Disks and Files
Chapter 9
2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 2
Disk
Query Evaluation Engine
SQL Query / Application
Files & Access Methods
Buffer Management
Disk Space Management
DBMS Organization
This
lecture
2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 3
Non-volatile
The Memory Hierarchy
Price Speed
cache
main memory
magnetic disk
tape
CPU Size
Performance of Microprocessors and Memory
improving faster than disks and tapes
pf3
pf4
pf5
pf8

Partial preview of the text

Download Disk Management in Database Management Systems - Prof. Kristen R. Lefevre and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 1

Storing Data: Disks and Files

Chapter 9

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 2 Disk Query Evaluation Engine SQL Query / Application Files & Access Methods Buffer Management Disk Space Management

DBMS Organization

This lecture 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 3 Non-volatile

The Memory Hierarchy

Price Speed cache main memory magnetic disk tape CPU Size Performance of Microprocessors and Memory improving faster than disks and tapes

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 4

Why Not Store Everything in Main Memory?

  • Too expensive: RAM: $90 per GB, Disk: $0.25 per GB
  • Main memory is volatile: Want data to persist between runs
  • Typical storage hierarchy:
    • Main memory (RAM) for currently used data
    • Disk for the main database (secondary storage)
      • Read and write data between disk and main memory
      • Both are high-cost operations, relative to in-memory operations, so must be planned carefully!
    • Tapes for archiving older versions of data (tertiary storage)
      • Sequential access devices 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 5

Disks

  • Secondary storage device of choice.

• Main advantage over tapes: random access vs.

sequential.

• Data is stored and retrieved in units called disk

blocks or pages.

  • Unlike RAM, time to retrieve a disk page varies

depending upon location on disk.

  • Therefore, relative placement of pages on disk has major impact on DBMS performance! 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 6

Magnetic Disks

Platters Disk head Arm movement Disk arm Tracks Sector Set of tracks with same diameter called a cylinder Data stored in blocks. Size of block is a multiple of sector size. Only one disk head reads or writes at a time.

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 10

Buffer Management in a DBMS

  • Data must be in RAM for DBMS to operate on it!
    • Can’t keep all the DBMS pages in main memory
  • Buffer Manager: Efficiently uses main memory
    • Memory divided into buffer frames: slots for holding disk pages Upper levels:
      • release pages when done
      • indicate if a page is modified DB MAIN MEMORY DISK disk page free frame BUFFER POOL^ Page Requests from Higher Levels choice of frame dictated by replacement policy 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 11

Buffer Manager

• Bookkeeping per frame:

• pin count: # of active users of the frame

  • Pinning – “I’m using this page”
  • Unpinning – “I’m no longer using this page”

• dirty bit: indicates if page has been modified

since it was brought in from disk

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 12

Buffer Manager

  • When a page is requested:
  • If page is in buffer pool
    • Increment the frame’s pin count
    • Return main memory address of the frame
  • Else
    • Choose a frame for replacement
      • Use replacement policy
      • Only choose a frame with pin count == 0
    • If frame is dirty, write old page to disk
    • Read requested page into chosen frame
      • Set pin count = 1 (“pin the page”)
      • Set dirty bit = false

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 13

Buffer Replacement Policy

  • Chose a frame for replacement
    • Least-recently-used (LRU), Clock, MRU etc.
  • LRU: queue of pointers to “empty” frames
    • Add to end of queue, grab frames from front of queue
  • Clock: variant of LRU, but lower overhead
  • Policy can have big impact on # of I/O’s; depends on the access pattern.
  • Sequential flooding : Nasty situation caused by LRU + repeated sequential scans. - # buffer frames < # pages in file - Use MRU 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 14

Files of Records (tuples)

• Pages are great for storage system, but

DBMS views data as a collection of records

• Important questions:

  • How is a record organized / implemented?
  • How are records laid out on pages?
  • How are pages organized into files? 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 15

Record Formats

• Two kinds:

  • Fixed-length records
    • Each field has a fixed length, and the number of fields is fixed
  • Variable-length records
    • Some of the fields are of variable length
    • E.g., variable-length strings

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 19

Page Formats

  • Page contains a collection of records
  • Think of page as a collection of slots, each of which contains a record
  • Record ID (RID): <page id, slot #>
  • Many different slotted page organizations.
    • Must support search, insert, delete records 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 20

Page Formats: Fixed Length Records

Record id = <page id, slot #>

First alternative: moving records changes rid

 may not be acceptable. Slot 1 Slot 2 Slot N

N PACKED Free Space number of records

... 0 1 M M ... 3 2 1 UNPACKED, BITMAP Slot 1 Slot 2 Slot N Slot M 1 1 number of slots 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 21

Page Formats: Variable Length Records

120, 40 -1, 0 560, 90 -1, 0 0, 70 70, 50 6 Book-keeping 5 4 3 2 1 0 Slot directory Free Space Pointer

  • Directory grows backwards!
  • Move records on same page
    • rid unchanged
  • Good for fixed-length records too. Rid=?Rid= (11, 1) Page num = 11 Delete a record? Slot Entry: Offset, reclen

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 22

Unordered (Heap) Files

• Simplest file structure contains records in

no particular order.

• Page contains a set of records

• As file grows and shrinks, disk pages are

allocated and de-allocated.

• Many alternatives for storing file

2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 23

Heap File Implemented as a List

  • (heap file name, header page id) recorded in a known location
  • Each page contains two pointers plus data: Pointer = Page ID (pid)
  • Pages in the free space list have “some” free space
  • What happens with variable length records?
  • Fetch a record with rid Header Page Data Page Data Page Data Page Data Page Data Page Data Page Pages with Free Space Full Pages 2/3/09 EECS 484: Database Management Systems, Kristen LeFevre 24

Heap File Using a Page Directory

  • Entry for a page:
    • Free/full
    • Number of free bytes
  • Can locate pages for new tuples faster! Data Page 1 Data Page 2 Data Page N Header Page DIRECTORY Directory itself is implemented as a linked list of pages