Database Query Processing: A Systems Perspective, Slides of Introduction to Database Management Systems

An in-depth look into the process of querying a database system, covering the various stages from parsing and validation to logical and physical plan execution. It includes examples of parse trees, logical plans, and physical plans, as well as discussions on iterator interfaces and the differences between blocking and non-blocking iterators.

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Query Processing: A Systems View
CPS 116
Introduction to Database Systems
2
Announcements
Course project milestone #2 due today (Nov. 11)
No class or office hours next Tuesday (Nov. 16)
Homework #4 will be available next Thursday
3
A query’s trip through the DBMS
Parser
Validator
Optimizer
Executor
Result
SQL query SELECT title, SID
FROM Enroll, Course
WHERE Enroll.CID =
Course.CID;
Parse tree
<SFW>
<select-list>
<from-list>
<where-cond>
<table><table>
<Query>
Enroll Course
Physical plan
PROJECT (title, SID)
MERGE-JOIN (CID)
SCAN (Enroll)
SCAN (Course)
SORT (CID)
Logical plan
πtitle, SID
σEnroll.CID = Course.CID
Enroll Course
×
4
Parsing and validation
Parser: SQL parse tree
Good old lex & yacc
Detect and reject syntax errors
Validator: parse tree logical plan
Detect and reject semantic errors
Nonexistent tables/views/columns?
Insufficient access privileges?
Type mismatches?
–Examples: AVG(name), name + GPA, Student UNION Enroll
Also
•Expand *
Expand view definitions
Information required for semantic checking is found in system
catalog (contains all schema information)
5
Logical plan
Nodes are logical operators (often relational algebra
operators)
There are many equivalent logical plans
πtitle
σStudent.name=“Bart” Student.SID = Enroll.SID Enroll.CID = Course.CID
×
Enroll
Course
×
Student An equivalent plan:
πtitle
Enroll.CID = Course.CID
Enroll
Course
Student
Student.SID = Enroll.SID
σname = “Bart”
6
Physical (execution) plan
A complex query may involve multiple tables and
various query processing algorithms
E.g., table scan, index nested-loop join, sort-merge join,
hash-based duplicate elimination…
A physical plan for a query tells the DBMS query
processor how to execute the query
A tree of physical plan operators
Each operator implements a query processing algorithm
Each operator accepts a number of input tables/streams
and produces a single output table/stream
pf3

Partial preview of the text

Download Database Query Processing: A Systems Perspective and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Query Processing: A Systems View

CPS 116

Introduction to Database Systems

Announcements

™ Course project milestone #2 due today (Nov. 11)

™ No class or office hours next Tuesday (Nov. 16)

™ Homework #4 will be available next Thursday

3

A query’s trip through the DBMS

Parser

Validator

Optimizer

Executor

Result

SQL query SELECT title, SID

FROM Enroll, Course WHERE Enroll.CID = Course.CID;

Parse tree

< SFW >

< select-list > < from-list >

< where-cond >

< table > < table >

< Query >

Enroll Course

Physical plan

PROJECT ( title , SID ) MERGE-JOIN ( CID )

SCAN ( Enroll )

SCAN ( Course )

SORT ( CID )

Logical plan

π title , SID

σ Enroll. CID = Course. CID

Enroll Course

×

4

Parsing and validation

™ Parser: SQL → parse tree

ƒ Good old lex & yacc

ƒ Detect and reject syntax errors

™ Validator: parse tree → logical plan

ƒ Detect and reject semantic errors

  • Nonexistent tables/views/columns?
  • Insufficient access privileges?
  • Type mismatches?
    • Examples: AVG(name), name + GPA, Student UNION Enroll

ƒ Also

  • Expand *
  • Expand view definitions

ƒ Information required for semantic checking is found in system

catalog (contains all schema information)

5

Logical plan

™ Nodes are logical operators (often relational algebra

operators)

™ There are many equivalent logical plans

π title

σ Student. name =“Bart” ∧ Student. SID = Enroll. SID ∧ Enroll. CID = Course.CID

×

Enroll

× Course

Student An equivalent plan:^

π title

 Enroll. CID = Course. CID

Enroll

Course

Student

 Student. SID = Enroll. SID

σ name = “Bart”

6

Physical (execution) plan

™ A complex query may involve multiple tables and

various query processing algorithms

ƒ E.g., table scan, index nested-loop join, sort-merge join,

hash-based duplicate elimination…

™ A physical plan for a query tells the DBMS query

processor how to execute the query

ƒ A tree of physical plan operators

ƒ Each operator implements a query processing algorithm

ƒ Each operator accepts a number of input tables/streams

and produces a single output table/stream

Examples of physical plans

™ Many physical plans for a single query

ƒ Equivalent results, but different costs and assumptions!

)DBMS query optimizer picks the “best” possible physical plan

PROJECT ( title ) INDEX-NESTED-LOOP-JOIN ( CID )

Index on Enroll( SID )

Index on Course ( CID )

Index on Student ( name )

INDEX-SCAN ( name = “Bart”)

INDEX-NESTED-LOOP-JOIN ( SID )

PROJECT ( title ) MERGE-JOIN ( CID )

SORT ( CID )SCAN (Course) MERGE-JOIN ( SID )

SCAN ( Enroll )

SORT ( SID )

SCAN ( Student )

FILTER ( name = “Bart”)

SELECT Course.title FROM Student, Enroll, Course WHERE Student.name = ‘Bart’ AND Student.SID = Enroll.SID AND Enroll.CID = Course.CID;

Physical plan execution

™ How are intermediate results passed from child

operators to parent operators?

ƒ Temporary files

  • Compute the tree bottom-up
  • Children write intermediate results to temporary files
  • Parents read temporary files

ƒ Iterators

  • Do not materialize intermediate results
  • Children pipeline their results to parents

9

Iterator interface

™ Every physical operator maintains its own execution

state and implements the following methods:

ƒ open(): Initialize state and get ready for processing

ƒ getNext(): Return the next tuple in the result (or a null

pointer if there are no more tuples); adjust state to allow

subsequent tuples to be obtained

ƒ close(): Clean up

10

An iterator for table scan

™ open()

ƒ Allocate a block of memory

™ getNext()

ƒ If no block of R has been read yet, read the first block from the

disk and return the first tuple in the block (or the null pointer if R

is empty)

ƒ If there is no more tuple left in the current block, read the next

block of R from the disk and return the first tuple in the block (or

the null pointer if there are no more blocks in R )

ƒ Otherwise, return the next tuple in the memory block

™ close()

ƒ Deallocate the block of memory

11

An iterator for nested-loop join

R: An iterator for the left subtree

S: An iterator for the right subtree

™ open()

R.open(); S.open(); r = R.getNext();

™ getNext()

do { s = S.getNext(); if (s == null) { S.close(); S.open(); s = S.getNext(); if (s == null) return null; r = R.getNext(); if (r == null) return null; } } until (r joins with s); return rs;

™ close()

R.close(); S.close();

NESTED-LOOP-JOIN

R S

12

An iterator for 2-pass merge sort

™ open()

ƒ Allocate a number of memory blocks for sorting

ƒ Call open() on child iterator

™ getNext()

ƒ If called for the first time

  • Call getNext() on child to fill all blocks, sort the tuples, and output a run
  • Repeat until getNext() on child returns null
  • Read one block from each run into memory, and initialize pointers to point to the beginning tuple of each block

ƒ Return the smallest tuple and advance the corresponding pointer;

if a block is exhausted bring in the next block in the same run

™ close()

ƒ Call close() on child

ƒ Deallocate sorting memory and delete temporary runs