Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Advanced Database Systems-Lecture 13 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Duke University Database Management Systems (DBMS)

Query Processing, Systems View, Physical Plan Execution, Iterator Interface, Iterator for Table Scan, open(), getNext(), close(), Iterator for Nested-loop Join, open(), Iterator for 2-pass Merge Sort, Blocking vs. Non-Blocking Iterators, Execution of an Iterator Tree, Memory Management for DBMS, Buffer Manager Basics, Standard OS Replacement Policies, Problems With OS Buffer Management, Performance Problems, Replacement Policy, Prefetch Policy, Crash Recovery, Old Algorithms, Query Locality S

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold 🇺🇸

4.7

(24)

372 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

Query Processing: A Systems View

CPS 216

Advanced Database Systems

Announcements (March 1)

Reading assignment due Wednesday

Buffer management

Homework #2 due this Thursday

Course project proposal due in one week

Midterm next Thursday in class

Open book, open notes

Physical (execution) plan

A complex query may involve multiple tables and

various query processing processing algorithms

E.g., table scan, index nested-loop join, sort-merge join,

hash-based duplicate elimination…

A physical plan for a query tells the DBMS query

processor how to execute the query

A tree of physical plan operators

Each operator implements a query processing algorithm

Each operator accepts a number of input tables/streams

and produces a single output table/stream

Discover Slides of Database Management Systems (DBMS) Duke University

Partial preview of the text

Download Advanced Database Systems-Lecture 13 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Processing: A Systems View

CPS 216

Advanced Database Systems

Announcements (March 1)

Reading assignment due Wednesday

Buffer management

Homework #2 due this Thursday

Course project proposal due in one week

Midterm next Thursday in class

Open book, open notes

Physical (execution) plan

A complex query may involve multiple tables and

various query processing processing algorithms

E.g., table scan, index nested-loop join, sort-merge join,

hash-based duplicate elimination…

A physical plan for a query tells the DBMS query

processor how to execute the query

A tree of physical plan operators

Each operator implements a query processing algorithm

Each operator accepts a number of input tables/streams

and produces a single output table/stream

Examples of physical plans

Many physical plans for a single query

Equivalent results, but different costs and assumptions! )DBMS query optimizer picks the “best” possible physical plan

PROJECT ( title ) INDEX-NESTED-LOOP-JOIN ( CID )

Index on Enroll( SID )

Index on Course ( CID )

Index on Student ( name )

INDEX-SCAN ( name = “Bart”)

INDEX-NESTED-LOOP-JOIN ( SID )

PROJECT ( title ) MERGE-JOIN ( CID )

SORT ( CID )SCAN (Course) MERGE-JOIN ( SID )

SCAN ( Enroll )

SORT ( SID )

SCAN ( Student )

FILTER ( name = “Bart”)

SELECT Course.title FROM Student, Enroll, Course WHERE Student.name = ‘Bart’ AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID;

Physical plan execution

How are intermediate results passed from child

operators to parent operators?

Temporary files

Compute the tree bottom-up
Children write intermediate results to temporary files
Parents read temporary files

Iterators

Do not materialize intermediate results
Children pipeline their results to parents

Iterator interface

Every physical operator maintains its own execution

state and implements the following methods:

open(): Initialize state and get ready for processing

getNext(): Return the next tuple in the result (or a null

pointer if there are no more tuples); adjust state to allow

subsequent tuples to be obtained

close(): Clean up

Blocking vs. non-blocking iterators

A blocking iterator must call getNext()

exhaustively (or nearly exhaustively) on its children

before returning its first output tuple

Examples:

A non-blocking iterator expects to make only a few

getNext() calls on its children before returning its

first (or next) output tuple

Examples:

Execution of an iterator tree

Call root.open()

Call root.getNext() repeatedly until it returns null

Call root.close()

) Requests go down the tree

) Intermediate result tuples go up the tree

) No intermediate files are needed

Memory management for DBMS

DBMS operations require main memory

While data resides on disk, it is manipulated in memory Sometimes the more memory the better, e.g., sort

One approach: let each operation pre-allocate some amount

of “private” memory and manage it explicitly

Not very flexible Limits sharing and reuse

Alternative approach: use a buffer manager

Responsible for reading/writing data blocks from/to disk as needed Higher-level code can be written without worrying about whether data is in memory or not

Buffer manager basics

Buffer pool: a global pool of frames (main-memory blocks)

)Some systems use separate pools for different objects (e.g., tables and indexes) and for different operations (e.g., sorting and others)

Higher-level code can pin and unpin a frame

Pin: I need to work on this frame in memory Unpin: I no longer need this frame A completely unpinned frame is a candidate for replacement )In some systems you can hate a frame (i.e., suggesting it for replacement)

A frame becomes dirty when it is modified

Only dirty frames need to be written back to disk )Related to transaction processing

Standard OS replacement policies

Example

Current buffer pool: 0, 1, 2 Past requests: 0, 1, 2 Incoming requests: 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, … )Which frame to replace?

Optimal: replace the frame that will not be used for the

longest time (2)

Random (0, 1, or 2 with equal probability)

LRU: least recently used (0)

LRU approximation: clock, aging

MRU: most recently used (2)

Problems with OS buffer management

Stonebraker. “Operating System Support for Database Management.” CACM , 1981.

Performance problems

Getting a page from the OS to user space is usually a system call (process switch) and copy

Replacement policy

LRU, clock, etc. often ineffective DBMS knows access pattern in advance and therefore should dictate policy → major OS/DBMS distinction

Prefetch policy

DBMS knows of multiple “orders” for a set of records; OS only knows physical order

Crash recovery

DBMS needs more control

Hot set algorithm

) Exploit query behavior more!

A set of pages that are accessed over and over form a hot set

“Hot points” in the graph of buffer size vs. number of page faults Example: For nested-loop join R S , size of hot set is B ( S ) + 1 (under LRU)

Each query is given enough memory for its hot set

Admission control: Do not let a query into the system

unless its hot set fits in memory

Replacement: LRU within each hot set (seems arbitrary)

Derivation of hot set assumes LRU, which may be

suboptimal

Example: What is better for nested-loop join?

Query locality set model

Observations

DBMS supports a limited set of operations

Reference patterns are regular and predictable

Reference patterns can be decomposed into simple

patterns

Reference pattern classification

Sequential

Random

Hierarchical

Sequential reference patterns

Straight sequential: read something sequentially once

Example: selection on unordered table )Each page is only touched once, so just buffer one page

Clustered sequential: repeatedly read a “chunk” sequentially

Example: merge join; rows with the same join column value are scanned multiple times )Keep all pages in the chunk in buffer

Looping sequential: repeatedly read something sequentially

Example: nested-loop join )Keep as many pages as possible in buffer, with MRU replacement

Random reference patterns

Independent random: truly random accesses

Example: index scan through a non-clustered (e.g.,

secondary) index yields random data page access

)The larger the buffer the better?

Clustered random: random accesses that happen to

demonstrate some locality

Example: in an index nested-loop join, inner index is

non-clustered and non-unique, while outer table is

clustered and non-unique

)Try to keep in buffer data pages of the inner table

accessed in one cluster

Hierarchical reference patterns

Example: operations on tree indexes

Straight hierarchical: regular root-to-leaf traversal

Hierarchical with straight sequential: traversal

followed by straight sequential on leaves

Hierarchical with clustered sequential: traversal

followed by clustered sequential on leaves

Looping hierarchical: repeatedly traverse an index

Example: index nested-loop join

)Keep the root index page in buffer

DBMIN algorithm

Associate a chunk of memory with each file instance (each

table in FROM)

This chunk is called the file instance’s locality set Instances of the same table may share buffered pages But each locality set has its own replacement policy )Based on how query processing uses each relation (finally!) )No single policy for all pages accessed by a query )No single policy for all pages in a table

Estimate locality set sizes by examining the query plan and

database statistics

Admission control: a query is allowed to run if its locality

sets fit in free frames

Locality sets for more ref. patterns

Straight hierarchical, hierarchical/straight sequential: just

like straight sequential

Size = 1 Just replace as needed

Hierarchical/clustered sequential: like clustered sequential

Size = number of index pages in the largest cluster FIFO or LRU

Looping hierarchical

At each level of the index you have random access among pages Use Yao’s formula to figure out how many pages need to be accessed at each level Size = sum over all levels that you choose to worry about LIFO with 3-4 buffers should be okay

Simulation study

Hybrid simulation model

Trace-driven simulation

Recorded from a real system (running Wisconsin Benchmark)
For each query, record its execution trace
- Page read/write, file open/close, etc.

Distribution-driven simulation

Generated by some stochastic model
Synthesize the workload by merging query execution traces

Simulator models CPU, memory, and one disk

Performance metric: query throughput

Workload

Mix 1: all six types equally likely

Mix 2: I and II together appear 50% of the time

Mix 3: I and II together appear 75% of the time

Mix 1 (no data sharing)

Thrashing is evident

for simple algorithms

with no load control

Working set (a popular

OS choice) fails to

capture join loops for

queries with high

memory demand (types

V and VI)

It still functions (though suboptimally) with large number of current queries (NCQ)

Advanced Database Systems-Lecture 13 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Advanced Database Systems-Lecture 13 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Query Processing: A Systems View

CPS 216

Advanced Database Systems

Announcements (March 1)

 Reading assignment due Wednesday

 Buffer management

 Homework #2 due this Thursday

 Course project proposal due in one week

 Midterm next Thursday in class

 Open book, open notes

Physical (execution) plan

 A complex query may involve multiple tables and

various query processing processing algorithms

 E.g., table scan, index nested-loop join, sort-merge join,

hash-based duplicate elimination…

 A physical plan for a query tells the DBMS query

processor how to execute the query

 A tree of physical plan operators

 Each operator implements a query processing algorithm

 Each operator accepts a number of input tables/streams

and produces a single output table/stream

Examples of physical plans

 Many physical plans for a single query

Physical plan execution

 How are intermediate results passed from child

operators to parent operators?

 Temporary files

 Iterators

Iterator interface

 Every physical operator maintains its own execution

state and implements the following methods:

 open(): Initialize state and get ready for processing

 getNext(): Return the next tuple in the result (or a null

pointer if there are no more tuples); adjust state to allow

subsequent tuples to be obtained

 close(): Clean up

Blocking vs. non-blocking iterators

 A blocking iterator must call getNext()

exhaustively (or nearly exhaustively) on its children

before returning its first output tuple

 Examples:

 A non-blocking iterator expects to make only a few

getNext() calls on its children before returning its

first (or next) output tuple

 Examples:

Execution of an iterator tree

 Call root.open()

 Call root.getNext() repeatedly until it returns null

 Call root.close()

) Requests go down the tree

) Intermediate result tuples go up the tree

) No intermediate files are needed

Memory management for DBMS

 DBMS operations require main memory

 One approach: let each operation pre-allocate some amount

of “private” memory and manage it explicitly

 Alternative approach: use a buffer manager

Buffer manager basics

 Buffer pool: a global pool of frames (main-memory blocks)

 Higher-level code can pin and unpin a frame

 A frame becomes dirty when it is modified

Standard OS replacement policies

 Example

 Optimal: replace the frame that will not be used for the

longest time (2)

 Random (0, 1, or 2 with equal probability)

 LRU: least recently used (0)

 LRU approximation: clock, aging

 MRU: most recently used (2)

Problems with OS buffer management

 Performance problems

 Replacement policy

 Prefetch policy

 Crash recovery

Hot set algorithm

) Exploit query behavior more!

Reading assignment due Wednesday

Buffer management

Homework #2 due this Thursday

Course project proposal due in one week

Midterm next Thursday in class

Open book, open notes

A complex query may involve multiple tables and

E.g., table scan, index nested-loop join, sort-merge join,

A physical plan for a query tells the DBMS query

A tree of physical plan operators

Each operator implements a query processing algorithm

Each operator accepts a number of input tables/streams

Many physical plans for a single query

How are intermediate results passed from child

Temporary files

Iterators

Every physical operator maintains its own execution

open(): Initialize state and get ready for processing

getNext(): Return the next tuple in the result (or a null

close(): Clean up

A blocking iterator must call getNext()

Examples:

A non-blocking iterator expects to make only a few

Examples:

Call root.open()

Call root.getNext() repeatedly until it returns null

Call root.close()

DBMS operations require main memory

One approach: let each operation pre-allocate some amount

Alternative approach: use a buffer manager

Buffer pool: a global pool of frames (main-memory blocks)

Higher-level code can pin and unpin a frame

A frame becomes dirty when it is modified

Example

Optimal: replace the frame that will not be used for the

Random (0, 1, or 2 with equal probability)

LRU: least recently used (0)

LRU approximation: clock, aging

MRU: most recently used (2)

Performance problems

Replacement policy

Prefetch policy

Crash recovery

A set of pages that are accessed over and over form a hot set

Each query is given enough memory for its hot set

Admission control: Do not let a query into the system

Replacement: LRU within each hot set (seems arbitrary)

Derivation of hot set assumes LRU, which may be

Observations

DBMS supports a limited set of operations

Reference patterns are regular and predictable

Reference patterns can be decomposed into simple

Reference pattern classification

Sequential

Random

Hierarchical

Straight sequential: read something sequentially once

Clustered sequential: repeatedly read a “chunk” sequentially

Looping sequential: repeatedly read something sequentially

Independent random: truly random accesses

Example: index scan through a non-clustered (e.g.,

Clustered random: random accesses that happen to

Example: in an index nested-loop join, inner index is

Example: operations on tree indexes

Straight hierarchical: regular root-to-leaf traversal

Hierarchical with straight sequential: traversal

Hierarchical with clustered sequential: traversal

Looping hierarchical: repeatedly traverse an index

Example: index nested-loop join

Associate a chunk of memory with each file instance (each

Estimate locality set sizes by examining the query plan and

Admission control: a query is allowed to run if its locality

Straight hierarchical, hierarchical/straight sequential: just

Hierarchical/clustered sequential: like clustered sequential

Looping hierarchical

Hybrid simulation model

Trace-driven simulation

Distribution-driven simulation

Simulator models CPU, memory, and one disk

Performance metric: query throughput