Download Understanding Disk Access & Record Layout in Advanced Database Systems and more Slides Database Management Systems (DBMS) in PDF only on Docsity!
Physical Data Organization
CPS 216
Advanced Database Systems
2
Announcements (January 27)
Reading assignment for next week
System R paper and Lomet’s B+^ -tree tricks
Due next Thursday night
Homework #1 due in 12 days
3
Outline
It’s all about disks!
That’s why we always draw databases as
And why the single most important metric in database
processing is the number of disk I/O’s performed
Record layout
Block layout
Storage hierarchy
Registers
Cache
Memory
Disk
Tapes
5
How far away is data?
Location Cycles
Registers 1
On-chip cache 2
On-board cache 10
Memory 100
Disk 10 6
Tape 10 9
Location Time
(Source: AlphaSort paper, 1995)
) I/O dominates—design your algorithms to reduce I/O!
6
A typical disk
Spindle rotation
Platter
Platter Spindle
Platter
Tracks
Arm movement
Disk arm
Disk head
Cylinders
“Moving parts” are slow
Sequential disk access
Seek time + rotational delay + transfer time
Seek time
0 (assuming data is on the same track)
Rotational delay
0 (assuming data is in the next block on the track)
Easily an order of magnitude faster than random
disk access!
11
Performance tricks
Disk layout strategy
Keep related things (what are they?) close together: same sector/block → same track → same cylinder → adjacent cylinder
Double buffering
While processing the current block in memory, prefetch the next block from disk (overlap I/O with processing)
Disk scheduling algorithm
Example: “elevator” algorithm
Track buffer
Read/write one entire track at a time
Parallel I/O
More disk heads working at the same time
12
Record layout
Record = row in a table
Variable-format records
Number and types of fields not known in advance
Rare in DBMS—table schema dictates the format
Relevant for semi-structured data such as XML
Focus on fixed-format records
With fixed-length fields only, or
With possible variable-length fields
Fixed-length fields
All field lengths and offsets are constant
Can be pre-computed from schema
Example: CREATE TABLE Student(SID INT, name
CHAR(20), age INT, GPA FLOAT);
Bart (padded with space)
Watch out for alignment
May need to pad; reorder columns if that helps
What about NULL?
14
Variable-length records
Example: CREATE TABLE Student
(SID INT, name VARCHAR(20), age INT, GPA FLOAT, comment VARCHAR(100));
Approach 1: use field delimiters (“\0” okay?)
Approach 2: use an offset array
Put all variable-length fields at the end (why?)
Update is messy if it changes the length of a field
10 2.3 Bart\
Weird kid!\
10 2.3 Bart
Weird kid!
15
Record layout in commercial systems
DB2, SQL Server, Informix, Sybase: all variants of
the offset array approach
DB2: in the fixed-length part of the record, store (offset,
length) for a variable-length field, where offset points to
the start of the field in the variable-length part of the
record; no need to reorder fields
Oracle: records are structured as if all fields are
potentially of variable length
A record is a sequence of (length, data) pairs, with a
special length value denoting NULL
Options
Reorganize after every update/delete to avoid
fragmentation (gaps between records)
Need to rewrite half of the block on average
What if records are fixed-length?
Reorganize after delete
- Only need to move one record
- In slot directory, keep a pointer to the beginning of free space
Do not reorganize after update
- In slot directory, keep a bitmap showing which slots are in use
20
Cache behavior of NSM
Query: SELECT SID FROM Student WHERE GPA > 2.0;
Say cache block size < record size
Lots of cache misses
ID and GPA are not close enough by memory standard
142 Bart 10 2.3 123 Milhouse 10 3.
456 Ralph 8 2.
857 Lisa 8 4.
142 Bart 10 2.3 123 Milhouse 10 3.1 857 Lisa 8 4. 456 Ralph 8
Cache
21
Do caches misses matter in DBMS?
No? Compared to disk I/O’s, memory-related stall
time is nothing
Yes?
You may mask some I/O cost
You may avoid some I/O’s by memory buffering
Percentage of memory-related stall time due to data
cache misses is high
- 90% for OLAP workloads (lots of large, complex, range-based queries, few updates)
- 50-70% for OLTP workloads (lots of small, simple, point-based queries and updates)
PAX
Most queries only access a few columns
Cluster same columns in “minipages” in each block
When a particular column of a row is brought into the cache, the same column of the next row is brought in together
142 123 857 456
1111
Bart Milhouse Lisa Ralph
10 10 8 8
2.3 3.1 4.3 2.
4 (number of records)
1111
Reorganize after every update (for variable-length records only) and delete to keep fields together
(IS NOT NULL bitmap)
23
PAX versus NSM
Space requirement: roughly the same
Cache performance: PAX incurs 75% less data cache misses
than NSM
Overall performance
For OLAP queries (TPC-H), PAX is 11-48% faster For updates, PAX is 10-16% faster (assuming NSM also reorganizes) Unanswered question: How about OLTP queries/updates (typically very selective)?
) Adaptive hybrid of PAX and NSM
Hankins and Patel. “Data Morphing…” VLDB 2003
) Dynamic adjustment of layout when fetching fro
Shao et al. “Clotho: Decoupling…” VLDB 2004
24
“Pointers” to records
Logical record id: value of the primary key
Used in references (e.g., Enroll ( SID , CID ))
Physical record id: (disk block id, slot number)
Used in index entries: (key, physical record id)
Pros and cons