Understanding Selection, Projection, and Join in Database Systems, Slides of Database Management Systems (DBMS)

An introduction to the relational model and algebra in advanced database systems. It covers the concepts of relations, schemas, instances, and the core set of relational algebra operators such as selection, projection, cross product, union, difference, and renaming. It also discusses additional derived operators like join and natural join.

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Relational Model & Algebra
CPS 216
Advanced Database Systems
2
Announcements (January 18)
Homework #1 will be assigned on Thursday
Reading assignment for this week
Posted on course Web page
Review due on Thursday night
3
Relational data model
A database is a collection of relations (or tables)
Each relation has a list of attributes (or columns)
Set-valued attributes not allowed
Each attribute has a domain (or type)
Each relation contains a set of tuples (or rows)
Duplicates not allowed
Simplicity is a virtue!
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Understanding Selection, Projection, and Join in Database Systems and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Relational Model & Algebra

CPS 216

Advanced Database Systems

2

Announcements (January 18)

™ Homework #1 will be assigned on Thursday

™ Reading assignment for this week

ƒ Posted on course Web page

ƒ Review due on Thursday night

3

Relational data model

™ A database is a collection of relations (or tables)

™ Each relation has a list of attributes (or columns)

ƒ Set-valued attributes not allowed

™ Each attribute has a domain (or type)

™ Each relation contains a set of tuples (or rows)

ƒ Duplicates not allowed

) Simplicity is a virtue!

Example

Student Course

Enroll

Ordering of rows doesn’t matter

(even though the output is

always in some order)

SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2. ... ... ... ...

CID title CPS216 Advanced Database Systems CPS230 Analysis of Algorithms CPS214 Computer Networks ... ...

SID CID

142 CPS

142 CPS

123 CPS

857 CPS

857 CPS

456 CPS

Why did Codd call them

“relations”?

Each n -tuple relates n elements

from n domains, precisely in the

mathematical sense of a “relation”

5

Schema versus instance

™ Schema (metadata)

ƒ Specification of how data is to be structured logically

ƒ Defined at set-up

ƒ Rarely changes

™ Instance

ƒ Content

ƒ Changes rapidly, but always conforms to the schema

) Compare to type and object of type in a

programming language

6

Example

™ Schema

ƒ Student ( SID integer, name string, age integer, GPA float)

ƒ Course ( CID string, title string)

ƒ Enroll ( SID integer, CID integer)

™ Instance

ƒ { h142, Bart, 10, 2.3i, h123, Milhouse, 10, 3.1i, ...}

ƒ { hCPS216, Advanced Database Systemsi, ...}

ƒ { h142, CPS216i, h142, CPS214i, ...}

More on selection

™ Selection predicate in general can include any

column of R , constants, comparisons such as =, ·,

etc., and Boolean connectives ∧, ∨, and ¬

ƒ Example: straight A students under 18 or over 21

σ GPA ≥ 4.0 ∧ ( age < 18 ∨ age > 21) ( Student )

™ But you must be able to evaluate the predicate over

a single row

ƒ Example: student with the highest GPA

σ GPA ≥ all GPA in Student table ( Student )

11

Projection

™ Input: a table R

™ Notation: π L ( R )

ƒ L is a list of columns in R

™ Purpose: select columns to output

™ Output: same rows, but only the columns in L

12

Projection example

™ ID’s and names of all students

π SID , name ( Student )

π SID , name

SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.

SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.

More on projection

™ Duplicate output rows must be removed

ƒ Example: student ages

π age ( Student )

π age

SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.

SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. 857 Lisa 8 4. 456 Ralph 8 2.

14

Cross product

™ Input: two tables R and S

™ Notation: R × S

™ Purpose: pairs rows from two tables

™ Output: for each row r in R and each row s in S ,

output a row rs (concatenation of r and s )

15

Cross product example

™ Student × Enroll

SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. ... ... ... ...

SID CID

142 CPS

142 CPS

123 CPS

×

SID name age GPA SID CID 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 123 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 123 CPS ... ... ... ... ... ...

Derived operator: natural join

™ Input: two tables R and S

™ Notation: R  S

™ Purpose: relate rows from two tables, and

ƒ Enforce equality on all common attributes

ƒ Eliminate one copy of common attributes

™ Shorthand for π L ( R  p S )

ƒ L is the union of all attributes from R and S , with

duplicates removed

ƒ p equates all attributes common to R and S

20

SID name age GPA 142 Bart 10 2. 123 Milhouse 10 3. ... ... ... ...

SID CID

142 CPS

142 CPS

123 CPS

×

SID name age GPA SID CID 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 142 CPS 142 Bart 10 2.3 123 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 142 CPS 123 Milhouse 10 3.1 123 CPS ... ... ... ... ... ...

Student. SID = Enroll. SID

Natural join example

™ Student  Enroll = π? ( Student ? Enroll ) =

π Student.ID , name , age , GPA , CID ( Student  Student. SID = Enroll. SID Enroll )

21

Union

™ Input: two tables R and S

™ Notation: R ∪ S

ƒ R and S must have identical schema

™ Output:

ƒ Has the same schema as R and S

ƒ Contains all rows in R and all rows in S , with duplicates

eliminated

Difference

™ Input: two tables R and S

™ Notation: R − S

ƒ R and S must have identical schema

™ Output:

ƒ Has the same schema as R and S

ƒ Contains all rows in R that are not found in S

23

Derived operator: intersection

™ Input: two tables R and S

™ Notation: R ∩ S

ƒ R and S must have identical schema

™ Output:

ƒ Has the same schema as R and S

ƒ Contains all rows that are in both R and S

24

Renaming

™ Input: a table R

™ Notation: ρ S ( R ), or ρ S ( A

1 ,^ A 2 , …)^

( R )

™ Purpose: rename a table and/or its columns

™ Output: a renamed table with the same rows as R

™ Used to

ƒ Avoid confusion caused by identical column names

ƒ Create identical columns names for natural joins

An exercise

™ CID’s of the courses that Lisa is NOT taking

29

A trickier exercise

™ SID’s of students who take exactly one course

30

Monotone operators

™ If some old output rows may be removed

ƒ Then the operator is non-monotone

™ Otherwise the operator is monotone

ƒ That is, old output rows remain “correct” when more

rows are added to the input

ƒ Formally, R ⊆ R’ implies RelOp ( R ) ⊆ RelOp ( R’ )

RelOp

Add more rows

to the input...

What happens

to the output?

Classification of relational operators

™ Selection: σ p ( R )

™ Projection: π L ( R )

™ Cross product: R × S

™ Join: R  p S

™ Natural join: R  S

™ Union: R ∪ S

™ Difference: R − S

™ Intersection: R ∩ S

32

Why is “−” needed for “exactly one”?

™ Composition of monotone operators produces a

monotone query

ƒ Old output rows remain “correct” when more rows are

added to the input

33

Why do we need core operator X?

™ Difference

™ Projection

™ Cross product

™ Union

™ Selection? ☺