Download Understanding Selection, Projection, and Join in Database Systems and more Slides Database Management Systems (DBMS) in PDF only on Docsity!
Relational Model & Algebra
CPS 216
Advanced Database Systems
2
Announcements (January 18)
Homework #1 will be assigned on Thursday
Reading assignment for this week
Posted on course Web page
Review due on Thursday night
3
Relational data model
A database is a collection of relations (or tables)
Each relation has a list of attributes (or columns)
Set-valued attributes not allowed
Each attribute has a domain (or type)
Each relation contains a set of tuples (or rows)
Duplicates not allowed
) Simplicity is a virtue!
Example
Student Course
Enroll
Ordering of rows doesn’t matter
(even though the output is
always in some order)
SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2. ... ... ... ...
CID title CPS216 Advanced Database Systems CPS230 Analysis of Algorithms CPS214 Computer Networks ... ...
SID CID
142 CPS
142 CPS
123 CPS
857 CPS
857 CPS
456 CPS
Why did Codd call them
“relations”?
Each n -tuple relates n elements
from n domains, precisely in the
mathematical sense of a “relation”
5
Schema versus instance
Schema (metadata)
Specification of how data is to be structured logically
Defined at set-up
Rarely changes
Instance
Content
Changes rapidly, but always conforms to the schema
) Compare to type and object of type in a
programming language
6
Example
Schema
Student ( SID integer, name string, age integer, GPA float)
Course ( CID string, title string)
Enroll ( SID integer, CID integer)
Instance
{ h142, Bart, 10, 2.3i, h123, Milhouse, 10, 3.1i, ...}
{ hCPS216, Advanced Database Systemsi, ...}
{ h142, CPS216i, h142, CPS214i, ...}
More on selection
Selection predicate in general can include any
column of R , constants, comparisons such as =, ·,
etc., and Boolean connectives ∧, ∨, and ¬
Example: straight A students under 18 or over 21
σ GPA ≥ 4.0 ∧ ( age < 18 ∨ age > 21) ( Student )
But you must be able to evaluate the predicate over
a single row
Example: student with the highest GPA
σ GPA ≥ all GPA in Student table ( Student )
11
Projection
Input: a table R
Notation: π L ( R )
L is a list of columns in R
Purpose: select columns to output
Output: same rows, but only the columns in L
12
Projection example
ID’s and names of all students
π SID , name ( Student )
π SID , name
SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.
SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.
More on projection
Duplicate output rows must be removed
Example: student ages
π age ( Student )
π age
SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.
SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.
14
Cross product
Input: two tables R and S
Notation: R × S
Purpose: pairs rows from two tables
Output: for each row r in R and each row s in S ,
output a row rs (concatenation of r and s )
15
Cross product example
Student × Enroll
SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. ... ... ... ...
SID CID
142 CPS
142 CPS
123 CPS
×
SID name age GPA SID CID 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 123 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 123 CPS ... ... ... ... ... ...
Derived operator: natural join
Input: two tables R and S
Notation: R S
Purpose: relate rows from two tables, and
Enforce equality on all common attributes
Eliminate one copy of common attributes
Shorthand for π L ( R p S )
L is the union of all attributes from R and S , with
duplicates removed
p equates all attributes common to R and S
20
SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. ... ... ... ...
SID CID
142 CPS
142 CPS
123 CPS
×
SID name age GPA SID CID 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 123 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 123 CPS ... ... ... ... ... ...
Student. SID = Enroll. SID
Natural join example
Student Enroll = π? ( Student ? Enroll ) =
π Student.ID , name , age , GPA , CID ( Student Student. SID = Enroll. SID Enroll )
21
Union
Input: two tables R and S
Notation: R ∪ S
R and S must have identical schema
Output:
Has the same schema as R and S
Contains all rows in R and all rows in S , with duplicates
eliminated
Difference
Input: two tables R and S
Notation: R − S
R and S must have identical schema
Output:
Has the same schema as R and S
Contains all rows in R that are not found in S
23
Derived operator: intersection
Input: two tables R and S
Notation: R ∩ S
R and S must have identical schema
Output:
Has the same schema as R and S
Contains all rows that are in both R and S
24
Renaming
Input: a table R
Notation: ρ S ( R ), or ρ S ( A
1 ,^ A 2 , …)^
( R )
Purpose: rename a table and/or its columns
Output: a renamed table with the same rows as R
Used to
Avoid confusion caused by identical column names
Create identical columns names for natural joins
An exercise
CID’s of the courses that Lisa is NOT taking
29
A trickier exercise
SID’s of students who take exactly one course
30
Monotone operators
If some old output rows may be removed
Then the operator is non-monotone
Otherwise the operator is monotone
That is, old output rows remain “correct” when more
rows are added to the input
Formally, R ⊆ R’ implies RelOp ( R ) ⊆ RelOp ( R’ )
RelOp
Add more rows
to the input...
What happens
to the output?
Classification of relational operators
Selection: σ p ( R )
Projection: π L ( R )
Cross product: R × S
Join: R p S
Natural join: R S
Union: R ∪ S
Difference: R − S
Intersection: R ∩ S
32
Why is “−” needed for “exactly one”?
Composition of monotone operators produces a
monotone query
Old output rows remain “correct” when more rows are
added to the input
33
Why do we need core operator X?
Difference
Projection
Cross product
Union
Selection? ☺