Understanding Disk Access & Record Layout in Database Systems, Slides of Introduction to Database Management Systems

An in-depth exploration of physical data organization in database systems, focusing on disks and the importance of minimizing disk i/o's. Topics include record and block layout, storage hierarchy, and performance tricks for optimizing disk access.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Physical Data Organization
CPS 116
Introduction to Database Systems
2
Announcements
Homework #2 due today (Nov. 2)
One free election-day extension for the part to be turned
in (affix the “I voted” sticker on your submission)
No extension for the Gradiance part (very short)
Course project milestone #2 due next Thursday
No new homework until then
3
Outline
It’s all about disks!
That’s why we always draw databases as
And why the single most important metric in database
processing is the number of disk I/O’s performed
Record layout
Block layout
4
Storage hierarchy
Registers
Cache
Memory
Disk
Tapes
5
How far away is data?
Location Cycles
Registers 1
On-chip cache 2
On-board cache 10
Memory 100
Disk 106
Tape 109
Location Time
My head 1 min.
This room 2 min.
Duke campus 10 min.
Washington D.C. 1.5 hr.
Pluto 2 yr.
Andromeda 2000 yr.
(Source: AlphaSort paper, 1995)
)I/O dominates—design your algorithms to reduce I/O!
6
A typical disk
Spindle rotation
Platter
Platter
Spindle
Platter
Tracks
Arm movement
Disk arm
Disk head
Cylinders
“Moving parts” are slow
pf3
pf4

Partial preview of the text

Download Understanding Disk Access & Record Layout in Database Systems and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Physical Data Organization

CPS 116

Introduction to Database Systems

Announcements

™ Homework #2 due today (Nov. 2)

ƒ One free election-day extension for the part to be turned

in (affix the “I voted” sticker on your submission)

ƒ No extension for the Gradiance part (very short)

™ Course project milestone #2 due next Thursday

ƒ No new homework until then

3

Outline

™ It’s all about disks!

ƒ That’s why we always draw databases as

ƒ And why the single most important metric in database

processing is the number of disk I/O’s performed

™ Record layout

™ Block layout

4

Storage hierarchy

Registers

Cache

Memory

Disk

Tapes

5

How far away is data?

Location Cycles

Registers 1

On-chip cache 2

On-board cache 10

Memory 100

Disk 10 6

Tape 10 9

Location Time

My head 1 min.

This room 2 min.

Duke campus 10 min.

Washington D.C. 1.5 hr.

Pluto 2 yr.

Andromeda 2000 yr.

(Source: AlphaSort paper, 1995)

) I/O dominates—design your algorithms to reduce I/O!

6

A typical disk

Spindle rotation

Platter

Platter Spindle

Platter

Tracks

Arm movement

Disk arm

Disk head

Cylinders

“Moving parts” are slow

Top view

Track

Track

Track

Sectors

Higher-density sectors on inner tracks and/or more sectors on outer tracks

A block is a logical unit of transfer consisting of one or more sectors

Disk access time

Sum of:

™ Seek time: time for disk heads to move to the

correct cylinder

™ Rotational delay: time for the desired block to rotate

under the disk head

™ Transfer time: time to read/write data in the block

(= time for disk to rotate over the block)

9

Random disk access

Seek time + rotational delay + transfer time

™ Average seek time

ƒ Time to skip one half of the cylinders?

ƒ Not quite; should be time to skip a third of them (why?)

ƒ “Typical” value: 5 ms

™ Average rotational delay

ƒ Time for a half rotation (a function of RPM)

ƒ “Typical” value: 4.2 ms (7200 RPM)

10

Sequential disk access

Seek time + rotational delay + transfer time

™ Seek time

ƒ 0 (assuming data is on the same track)

™ Rotational delay

ƒ 0 (assuming data is in the next block on the track)

™ Easily an order of magnitude faster than random

disk access!

11

Performance tricks

™ Disk layout strategy

ƒ Keep related things (what are they?) close together: same sector/block → same track → same cylinder → adjacent cylinder

™ Double buffering

ƒ While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing)

™ Disk scheduling algorithm

ƒ Example: “elevator” algorithm

™ Track buffer

ƒ Read/write one entire track at a time

™ Parallel I/O

ƒ More disk heads working at the same time

12

Record layout

Record = row in a table

™ Variable-format records

ƒ Rare in DBMS—table schema dictates the format

ƒ Relevant for semi-structured data such as XML

™ Focus on fixed-format records

ƒ With fixed-length fields only, or

ƒ With possible variable-length fields

Cache behavior of NSM

™ Query: SELECT SID FROM Student WHERE GPA > 2.0;

™ Assumption: cache block size < record size

™ Lots of cache misses

ƒ ID and GPA are not close enough by memory standards

142 Bart 10 2.3 123 Milhouse 10 3.

456 Ralph 8 2.

857 Lisa 8 4.

142 Bart 10 2.3 123 Milhouse 10 3.1 857 Lisa 8 4. 456 Ralph 8

Cache

PAX

™ Most queries only access a few columns

™ Cluster values of the same columns in each block

ƒ When a particular column of a row is brought into the cache, the same column of the next row is brought in together

142 123 857 456

1111

Bart Milhouse Lisa Ralph

10 10 8 8

2.3 3.1 4.3 2.

4 (number of records)

1111

Reorganize after every update (for variable-length records only) and delete to keep fields together

(IS NOT NULL bitmap)

21

Summary

™ Storage hierarchy

ƒ Why I/O’s dominate the cost of database operations

™ Disk

ƒ Steps in completing a disk access ƒ Sequential versus random accesses

™ Record layout

ƒ Handling variable-length fields ƒ Handling NULL ƒ Handling modifications

™ Block layout

ƒ NSM: the traditional layout ƒ PAX: a layout that tries to improve cache performance