Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Disk Access & Record Layout in Advanced Database Systems, Slides of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

This document from the cps 216 advanced database systems course explores physical data organization, focusing on disks and record layout. It covers topics such as disk access time, record layout for fixed-length and variable-length records, and block layout using nsm and pax. Students will learn about the importance of minimizing disk i/o's and the role of cache in database systems.

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Physical Data Organization

CPS 216

Advanced Database Systems

Announcements (January 27)

Reading assignment for next week

System R paper and Lomet’s B+-tree tricks

Due next Thursday night

Homework #1 due in 12 days

Outline

It’s all about disks!

That’s why we always draw databases as

And why the single most important metric in database

processing is the number of disk I/O’s performed

Record layout

Block layout

Discover Slides of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download Understanding Disk Access & Record Layout in Advanced Database Systems and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Physical Data Organization

CPS 216

Advanced Database Systems

Announcements (January 27)

Reading assignment for next week

System R paper and Lomet’s B+^ -tree tricks

Due next Thursday night

Homework #1 due in 12 days

Outline

It’s all about disks!

That’s why we always draw databases as

And why the single most important metric in database

processing is the number of disk I/O’s performed

Record layout

Block layout

Storage hierarchy

Registers

Cache

Memory

Disk

Tapes

How far away is data?

Location Cycles

Registers 1

On-chip cache 2

On-board cache 10

Memory 100

Disk 10 6

Tape 10 9

Location Time

(Source: AlphaSort paper, 1995)

) I/O dominates—design your algorithms to reduce I/O!

A typical disk

Spindle rotation

Platter

Platter Spindle

Platter

Tracks

Arm movement

Disk arm

Disk head

Cylinders

“Moving parts” are slow

Sequential disk access

Seek time + rotational delay + transfer time

Seek time

0 (assuming data is on the same track)

Rotational delay

0 (assuming data is in the next block on the track)

Easily an order of magnitude faster than random

disk access!

Performance tricks

Disk layout strategy

Keep related things (what are they?) close together: same sector/block → same track → same cylinder → adjacent cylinder

Double buffering

While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing)

Disk scheduling algorithm

Example: “elevator” algorithm

Track buffer

Read/write one entire track at a time

Parallel I/O

More disk heads working at the same time

Record layout

Record = row in a table

Variable-format records

Number and types of fields not known in advance

Rare in DBMS—table schema dictates the format

Relevant for semi-structured data such as XML

Focus on fixed-format records

With fixed-length fields only, or

With possible variable-length fields

Fixed-length fields

All field lengths and offsets are constant

Can be pre-computed from schema

Example: CREATE TABLE Student(SID INT, name

CHAR(20), age INT, GPA FLOAT);

Bart (padded with space)

Watch out for alignment

May need to pad; reorder columns if that helps

What about NULL?

Variable-length records

Example: CREATE TABLE Student

(SID INT, name VARCHAR(20), age INT, GPA FLOAT, comment VARCHAR(100));

Approach 1: use field delimiters (“\0” okay?)

Approach 2: use an offset array

Put all variable-length fields at the end (why?)

Update is messy if it changes the length of a field

10 2.3 Bart\

Weird kid!\

10 2.3 Bart

Weird kid!

Record layout in commercial systems

DB2, SQL Server, Informix, Sybase: all variants of

the offset array approach

DB2: in the fixed-length part of the record, store (offset,

length) for a variable-length field, where offset points to

the start of the field in the variable-length part of the

record; no need to reorder fields

Oracle: records are structured as if all fields are

potentially of variable length

A record is a sequence of (length, data) pairs, with a

special length value denoting NULL

Options

Reorganize after every update/delete to avoid

fragmentation (gaps between records)

Need to rewrite half of the block on average

What if records are fixed-length?

Reorganize after delete

Only need to move one record
In slot directory, keep a pointer to the beginning of free space

Do not reorganize after update

In slot directory, keep a bitmap showing which slots are in use

Cache behavior of NSM

Query: SELECT SID FROM Student WHERE GPA > 2.0;

Say cache block size < record size

Lots of cache misses

ID and GPA are not close enough by memory standard

142 Bart 10 2.3 123 Milhouse 10 3.

456 Ralph 8 2.

857 Lisa 8 4.

142 Bart 10 2.3 123 Milhouse 10 3.1 857 Lisa 8 4. 456 Ralph 8

Cache

Do caches misses matter in DBMS?

No? Compared to disk I/O’s, memory-related stall

time is nothing

Yes?

You may mask some I/O cost

You may avoid some I/O’s by memory buffering

Percentage of memory-related stall time due to data

cache misses is high

90% for OLAP workloads (lots of large, complex, range-based queries, few updates)
50-70% for OLTP workloads (lots of small, simple, point-based queries and updates)

PAX

Most queries only access a few columns

Cluster same columns in “minipages” in each block

When a particular column of a row is brought into the cache, the same column of the next row is brought in together

142 123 857 456

1111

Bart Milhouse Lisa Ralph

10 10 8 8

2.3 3.1 4.3 2.

4 (number of records)

1111

Reorganize after every update (for variable-length records only) and delete to keep fields together

(IS NOT NULL bitmap)

PAX versus NSM

Space requirement: roughly the same

Cache performance: PAX incurs 75% less data cache misses

than NSM

Overall performance

For OLAP queries (TPC-H), PAX is 11-48% faster For updates, PAX is 10-16% faster (assuming NSM also reorganizes) Unanswered question: How about OLTP queries/updates (typically very selective)?

Understanding Disk Access & Record Layout in Advanced Database Systems, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Understanding Disk Access & Record Layout in Advanced Database Systems and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Physical Data Organization

CPS 216

Advanced Database Systems

Announcements (January 27)

 Reading assignment for next week

 System R paper and Lomet’s B+^ -tree tricks

 Due next Thursday night

 Homework #1 due in 12 days

Outline

 It’s all about disks!

 That’s why we always draw databases as

 And why the single most important metric in database

processing is the number of disk I/O’s performed

 Record layout

 Block layout

Storage hierarchy

How far away is data?

Location Cycles

Registers 1

On-chip cache 2

On-board cache 10

Memory 100

Disk 10 6

Tape 10 9

Location Time

) I/O dominates—design your algorithms to reduce I/O!

A typical disk

Sequential disk access

Seek time + rotational delay + transfer time

 Seek time

 0 (assuming data is on the same track)

 Rotational delay

 0 (assuming data is in the next block on the track)

 Easily an order of magnitude faster than random

disk access!

Performance tricks

 Disk layout strategy

 Double buffering

 Disk scheduling algorithm

 Track buffer

 Parallel I/O

Record layout

Record = row in a table

 Variable-format records

 Number and types of fields not known in advance

 Rare in DBMS—table schema dictates the format

 Relevant for semi-structured data such as XML

 Focus on fixed-format records

 With fixed-length fields only, or

 With possible variable-length fields

Fixed-length fields

 All field lengths and offsets are constant

 Can be pre-computed from schema

 Example: CREATE TABLE Student(SID INT, name

 Watch out for alignment

 May need to pad; reorder columns if that helps

 What about NULL?

Variable-length records

 Example: CREATE TABLE Student

 Approach 1: use field delimiters (“\0” okay?)

 Approach 2: use an offset array

 Put all variable-length fields at the end (why?)

 Update is messy if it changes the length of a field

Record layout in commercial systems

 DB2, SQL Server, Informix, Sybase: all variants of

the offset array approach

 DB2: in the fixed-length part of the record, store (offset,

length) for a variable-length field, where offset points to

the start of the field in the variable-length part of the

record; no need to reorder fields

 Oracle: records are structured as if all fields are

potentially of variable length

 A record is a sequence of (length, data) pairs, with a

special length value denoting NULL

Options

 Reorganize after every update/delete to avoid

Reading assignment for next week

System R paper and Lomet’s B+^ -tree tricks

Due next Thursday night

Homework #1 due in 12 days

It’s all about disks!

That’s why we always draw databases as

And why the single most important metric in database

Record layout

Block layout

Seek time

0 (assuming data is on the same track)

Rotational delay

0 (assuming data is in the next block on the track)

Easily an order of magnitude faster than random

Disk layout strategy

Double buffering

Disk scheduling algorithm

Track buffer

Parallel I/O

Variable-format records

Number and types of fields not known in advance

Rare in DBMS—table schema dictates the format

Relevant for semi-structured data such as XML

Focus on fixed-format records

With fixed-length fields only, or

With possible variable-length fields

All field lengths and offsets are constant

Can be pre-computed from schema

Example: CREATE TABLE Student(SID INT, name

Watch out for alignment

May need to pad; reorder columns if that helps

What about NULL?

Example: CREATE TABLE Student

Approach 1: use field delimiters (“\0” okay?)

Approach 2: use an offset array

Put all variable-length fields at the end (why?)

Update is messy if it changes the length of a field

DB2, SQL Server, Informix, Sybase: all variants of

DB2: in the fixed-length part of the record, store (offset,

Oracle: records are structured as if all fields are

A record is a sequence of (length, data) pairs, with a

Reorganize after every update/delete to avoid

Need to rewrite half of the block on average

What if records are fixed-length?

Reorganize after delete

Do not reorganize after update

Query: SELECT SID FROM Student WHERE GPA > 2.0;

Say cache block size < record size

Lots of cache misses

ID and GPA are not close enough by memory standard

No? Compared to disk I/O’s, memory-related stall

Yes?

You may mask some I/O cost

You may avoid some I/O’s by memory buffering

Percentage of memory-related stall time due to data

Most queries only access a few columns

Cluster same columns in “minipages” in each block

Space requirement: roughly the same

Cache performance: PAX incurs 75% less data cache misses

Overall performance

Logical record id: value of the primary key

Physical record id: (disk block id, slot number)

Pros and cons