Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Disk Access & Record Layout in Database Systems, Slides of Introduction to Database Management Systems

Duke University Introduction to Database Management Systems

An in-depth exploration of physical data organization in database systems, focusing on disks and the importance of minimizing disk i/o's. Topics include record and block layout, storage hierarchy, and performance tricks for optimizing disk access.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

1

Physical Data Organization

CPS 116

Introduction to Database Systems

2

Announcements

Homework #2 due today (Nov. 2)

One free election-day extension for the part to be turned

in (affix the “I voted” sticker on your submission)

No extension for the Gradiance part (very short)

Course project milestone #2 due next Thursday

No new homework until then

3

Outline

It’s all about disks!

That’s why we always draw databases as

And why the single most important metric in database

processing is the number of disk I/O’s performed

Record layout

Block layout

4

Storage hierarchy

Registers

Cache

Memory

Disk

Tapes

5

How far away is data?

Location Cycles

Registers 1

On-chip cache 2

On-board cache 10

Memory 100

Disk 106

Tape 109

Location Time

My head 1 min.

This room 2 min.

Duke campus 10 min.

Washington D.C. 1.5 hr.

Pluto 2 yr.

Andromeda 2000 yr.

(Source: AlphaSort paper, 1995)

)I/O dominates—design your algorithms to reduce I/O!

6

A typical disk

Spindle rotation

Platter

Spindle

Platter

Tracks

Arm movement

Disk arm

Disk head

Cylinders

“Moving parts” are slow

Discover Slides of Introduction to Database Management Systems Duke University

Partial preview of the text

Download Understanding Disk Access & Record Layout in Database Systems and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Physical Data Organization

CPS 116

Introduction to Database Systems

Announcements

Homework #2 due today (Nov. 2)

One free election-day extension for the part to be turned

in (affix the “I voted” sticker on your submission)

No extension for the Gradiance part (very short)

Course project milestone #2 due next Thursday

No new homework until then

3

Outline

It’s all about disks!

That’s why we always draw databases as

And why the single most important metric in database

processing is the number of disk I/O’s performed

Record layout

Block layout

4

Storage hierarchy

Registers

Cache

Memory

Disk

Tapes

5

How far away is data?

Location Cycles

Registers 1

On-chip cache 2

On-board cache 10

Memory 100

Disk 10 6

Tape 10 9

Location Time

My head 1 min.

This room 2 min.

Duke campus 10 min.

Washington D.C. 1.5 hr.

Pluto 2 yr.

Andromeda 2000 yr.

(Source: AlphaSort paper, 1995)

) I/O dominates—design your algorithms to reduce I/O!

6

A typical disk

Spindle rotation

Platter

Platter Spindle

Platter

Tracks

Arm movement

Disk arm

Disk head

Cylinders

“Moving parts” are slow

Top view

Track

Sectors

Higher-density sectors on inner tracks and/or more sectors on outer tracks

A block is a logical unit of transfer consisting of one or more sectors

Disk access time

Sum of:

Seek time: time for disk heads to move to the

correct cylinder

Rotational delay: time for the desired block to rotate

under the disk head

Transfer time: time to read/write data in the block

(= time for disk to rotate over the block)

9

Random disk access

Seek time + rotational delay + transfer time

Average seek time

Time to skip one half of the cylinders?

Not quite; should be time to skip a third of them (why?)

“Typical” value: 5 ms

Average rotational delay

Time for a half rotation (a function of RPM)

“Typical” value: 4.2 ms (7200 RPM)

10

Sequential disk access

Seek time + rotational delay + transfer time

Seek time

0 (assuming data is on the same track)

Rotational delay

0 (assuming data is in the next block on the track)

Easily an order of magnitude faster than random

disk access!

11

Performance tricks

Disk layout strategy

Keep related things (what are they?) close together: same sector/block → same track → same cylinder → adjacent cylinder

Double buffering

While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing)

Disk scheduling algorithm

Example: “elevator” algorithm

Track buffer

Read/write one entire track at a time

Parallel I/O

More disk heads working at the same time

12

Record layout

Record = row in a table

Variable-format records

Rare in DBMS—table schema dictates the format

Relevant for semi-structured data such as XML

Focus on fixed-format records

With fixed-length fields only, or

With possible variable-length fields

Cache behavior of NSM

Query: SELECT SID FROM Student WHERE GPA > 2.0;

Assumption: cache block size < record size

Lots of cache misses

ID and GPA are not close enough by memory standards

142 Bart 10 2.3 123 Milhouse 10 3.

456 Ralph 8 2.

857 Lisa 8 4.

142 Bart 10 2.3 123 Milhouse 10 3.1 857 Lisa 8 4. 456 Ralph 8

Cache

PAX

Most queries only access a few columns

Cluster values of the same columns in each block

When a particular column of a row is brought into the cache, the same column of the next row is brought in together

142 123 857 456

1111

Bart Milhouse Lisa Ralph

10 10 8 8

2.3 3.1 4.3 2.

4 (number of records)

1111

Reorganize after every update (for variable-length records only) and delete to keep fields together

(IS NOT NULL bitmap)

21

Summary

Storage hierarchy

Why I/O’s dominate the cost of database operations

Disk

Steps in completing a disk access Sequential versus random accesses

Record layout

Handling variable-length fields Handling NULL Handling modifications

Block layout

NSM: the traditional layout PAX: a layout that tries to improve cache performance

Understanding Disk Access & Record Layout in Database Systems, Slides of Introduction to Database Management Systems

Related documents

Partial preview of the text

Download Understanding Disk Access & Record Layout in Database Systems and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Physical Data Organization

CPS 116

Introduction to Database Systems

Announcements

 Homework #2 due today (Nov. 2)

 One free election-day extension for the part to be turned

in (affix the “I voted” sticker on your submission)

 No extension for the Gradiance part (very short)

 Course project milestone #2 due next Thursday

 No new homework until then

Outline

 It’s all about disks!

 That’s why we always draw databases as

 And why the single most important metric in database

processing is the number of disk I/O’s performed

 Record layout

 Block layout

Storage hierarchy

How far away is data?

Location Cycles

Registers 1

On-chip cache 2

On-board cache 10

Memory 100

Disk 10 6

Tape 10 9

Location Time

My head 1 min.

This room 2 min.

Duke campus 10 min.

Washington D.C. 1.5 hr.

Pluto 2 yr.

Andromeda 2000 yr.

) I/O dominates—design your algorithms to reduce I/O!

A typical disk

Top view

Disk access time

Sum of:

 Seek time: time for disk heads to move to the

correct cylinder

 Rotational delay: time for the desired block to rotate

under the disk head

 Transfer time: time to read/write data in the block

(= time for disk to rotate over the block)

Random disk access

Seek time + rotational delay + transfer time

 Average seek time

 Time to skip one half of the cylinders?

 Not quite; should be time to skip a third of them (why?)

 “Typical” value: 5 ms

 Average rotational delay

 Time for a half rotation (a function of RPM)

 “Typical” value: 4.2 ms (7200 RPM)

Sequential disk access

Seek time + rotational delay + transfer time

 Seek time

 0 (assuming data is on the same track)

 Rotational delay

 0 (assuming data is in the next block on the track)

 Easily an order of magnitude faster than random

disk access!

Performance tricks

 Disk layout strategy

 Double buffering

 Disk scheduling algorithm

 Track buffer

 Parallel I/O

Record layout

Record = row in a table

 Variable-format records

 Rare in DBMS—table schema dictates the format

 Relevant for semi-structured data such as XML

 Focus on fixed-format records

 With fixed-length fields only, or

 With possible variable-length fields

Cache behavior of NSM

Homework #2 due today (Nov. 2)

One free election-day extension for the part to be turned

No extension for the Gradiance part (very short)

Course project milestone #2 due next Thursday

No new homework until then

It’s all about disks!

That’s why we always draw databases as

And why the single most important metric in database

Record layout

Block layout

Seek time: time for disk heads to move to the

Rotational delay: time for the desired block to rotate

Transfer time: time to read/write data in the block

Average seek time

Time to skip one half of the cylinders?

Not quite; should be time to skip a third of them (why?)

“Typical” value: 5 ms

Average rotational delay

Time for a half rotation (a function of RPM)

“Typical” value: 4.2 ms (7200 RPM)

Seek time

0 (assuming data is on the same track)

Rotational delay

0 (assuming data is in the next block on the track)

Easily an order of magnitude faster than random

Disk layout strategy

Double buffering

Disk scheduling algorithm

Track buffer

Parallel I/O

Variable-format records

Rare in DBMS—table schema dictates the format

Relevant for semi-structured data such as XML

Focus on fixed-format records

With fixed-length fields only, or

With possible variable-length fields

Query: SELECT SID FROM Student WHERE GPA > 2.0;

Assumption: cache block size < record size

Lots of cache misses

ID and GPA are not close enough by memory standards

Most queries only access a few columns

Cluster values of the same columns in each block

Storage hierarchy

Disk

Record layout

Block layout