Download lec7-sql-wrapup-and-RA-small.pdf and more Summaries Algebra in PDF only on Docsity!
Introduction to Database Systems
CSE 414
Lecture 7: SQL Wrap-up and
Relational Algebra
CSE 414 - Autumn 2018 1
Announcements
- Additional Office Hours and room
changes
- Website calendar is up-to-date
- Check email for Microsoft Azure invite “Action required: Accept your Azure lab assignment” CSE 414 - Autumn 2018 2
Subqueries
- A subquery is a SQL query nested inside a larger query
- Such inner-outer queries are called nested queries
- A subquery may occur in:
- A SELECT clause
- A FROM clause
- Can return multi-valued relation
- A WHERE clause
- Rule of thumb: avoid nested queries when possible
- But sometimes it’s impossible, as we will see CSE 414 - Autumn 2018 3
FWGHOS
Subqueries in FROM
Sometimes we need to compute an
intermediate table only to use it later in a
SELECT-FROM-WHERE
- Option 1: use a subquery in the FROM
clause
- Option 2: use the WITH clause
- See textbook for details CSE 414 - Autumn 2018 4
2. Subqueries in FROM
CSE 414 - Autumn 2018 5 SELECT X.pname FROM (SELECT * FROM Product AS Y WHERE price > 20) as X WHERE X.price < 500 Product (pname, price, cid) Company (cid, cname, city)
WITH myTable AS (SELECT * FROM Product AS Y WHERE price > 20) SELECT X.pname FROM myTable as X WHERE X.price < 500 A subquery whose result we called myTable
Subqueries in WHERE
- SELECT ……….. WHERE EXISTS (sub);
- SELECT ……….. WHERE NOT EXISTS (sub);
- SELECT ……….. WHERE attribute IN (sub);
- SELECT ……….. WHERE attribute NOT IN (sub);
- SELECT ……….. WHERE attribute > ANY (sub);
- SELECT ……….. WHERE attribute > ALL (sub);
Monotone Queries
- Definition A query Q is monotone if:
- Whenever we add tuples to one or more input tables, the answer to the query will not lose any of the tuples CSE 414 - Autumn 2018 7 Product (pname, price, cid) Company (cid, cname, city) Monotone Queries
- Theorem: If Q is a SELECT-FROM-WHERE query that does not have subqueries, and no aggregates, then it is monotone. CSE 414 - Autumn 2018 8 Monotone Queries
- Theorem: If Q is a SELECT-FROM-WHERE query that does not have subqueries, and no aggregates, then it is monotone.
- Proof. We use the nested loop semantics: if we insert a tuple in a relation Ri, this will not remove any tuples from the answer CSE 414 - Autumn 2018 9 SELECT a 1 , a 2 , …, ak FROM WHERE RConditions 1 AS x 1 , R 2 AS x 2 , …, Rn AS xn for x 1 in R 1 do for … x 2 in R 2 do for if x nConditions in Rn do output (a 1 ,…,ak) Monotone Queries
- The query: is not monotone 10 Find all companies s.t. all their products have price < 200 Product (pname, price, cid) Company (cid, cname, city) CSE 414 - Autumn 2018 Monotone Queries
- The query: is not monotone 11 Find all companies s.t. all their products have price < 200 pname price cid Gizmo 19.99 c cid cname city c001 Sunworks Bonn cname Sunworks Product (pname, price, cid) Company (cid, cname, city) CSE 414 - Autumn 2018 Monotone Queries
- The query: is not monotone
- Consequence: If a query is not monotonic, then we cannot write it as a SELECT-FROM-WHERE query without nested subqueries Find all companies s.t. all their products have price < 200 pname price cid Gizmo 19.99 c cid cname city c001 Sunworks Bonn cname Sunworks pname price cid Gizmo 19.99 c Gadget 999.99 c cid cname city c001 Sunworks Bonn cname Product (pname, price, cid) Company (cid, cname, city) CSE 414 - Autumn 2018 12
Finding Witnesses CSE 414 - Autumn 2018 19 To find the witnesses, compute the maximum price in a subquery (in FROM or in WITH) SELECT DISTINCT u.city, v.pname, v.price FROM Company u, Product v, (SELECT x.city, max(y.price) as maxprice FROM Company x, Product y WHERE x.cid = y.cid GROUP BY x.city) w WHERE u.cid = v.cid and u.city = w.city and v.price = w.maxprice; Product (pname, price, cid) Company (cid, cname, city) Finding Witnesses CSE 414 - Autumn 2018 20 There is a more concise solution here: SELECT u.city, v.pname, v.price FROM Company u, Product v, Company x, Product y WHERE u.cid = v.cid and u.city = x.city and x.cid = y.cid GROUP BY u.city, v.pname, v.price HAVING v.price = max(y.price) Product (pname, price, cid) Company (cid, cname, city) SQL: Our first language for the relational model
- Projections
- Selections
- Joins (inner and outer)
- Inserts, updates, and deletes
- Aggregates
- Grouping
- Ordering
- Nested queries CSE 414 - Autumn 2018 21 Relational Algebra CSE 414 - Autumn 2018 22 Relational Algebra
- Set-at-a-time algebra, which
manipulates relations
- In SQL we say what we want
- In RA we can express how to get it
- Every DBMS implementation converts a
SQL query to RA in order to execute it
- An RA expression is called a query plan CSE 414 - Autumn 2018 23 Why study another relational query language?
- RA is how SQL is implemented in
DBMS
- We will see more of this in a few weeks
- RA opens up opportunities for query
optimization
CSE 414 - Autumn 2018 24
Basics CSE 414 - Autumn 2018 25
- Relations and attributes
- Functions that are applied to relations
- Return relations R2 = σ (R1)
- Can be composed together R3 = p (σ (R1))
- Often displayed using a tree rather than linearly
- Use Greek symbols: σ, p, δ, etc Sets v.s. Bags
- Sets: {a,b,c}, {a,d,e,f}, { },...
- Bags: {a, a, b, c}, {b, b, b, b, b},...
Relational Algebra has two flavors:
- Set semantics = standard Relational Algebra
- Bag semantics = extended Relational Algebra
DB systems implement bag semantics (Why?)
CSE 414 - Autumn 2018 26 Relational Algebra Operators
- Union ∪, intersection ∩, difference -
- Selection σ
- Projection π
- Cartesian product X, join ⨝
- (Rename ρ)
- Duplicate elimination δ
- Grouping and aggregation ɣ
- Sorting # CSE 414 - Autumn 2018 27
RA
Extended RA
All operators take in 1 or more relations as inputs
and return another relation
Union and Difference CSE 414 - Autumn 2018 28
What do they mean over bags?
R1 ∪ R
R1 – R
Only make sense if R1, R2 have the same schema What about Intersection?
- Derived operator using minus
- Derived using join CSE 414 - Autumn 2018 29
R1 ∩ R2 = R1 – (R1 – R2)
R1 ∩ R2 = R1 ⨝ R
Selection
- Returns all tuples which satisfy a condition
- Examples
- σSalary > 40000 (Employee)
- σname = “Smith” (Employee)
- The condition c can be =, <, <=, >, >=, <>
combined with AND, OR, NOT
CSE 414 - Autumn 2018 30 σc(R)
Renaming
- Changes the schema, not the instance
- Example:
- Given Employee(Name, SSN)
- ρN, S(Employee) à Answer(N, S) CSE 414 - Autumn 2018 37 ρB1,…,Bn (R) Natural Join
- Meaning: R1⨝ R2 = PA(sq (R1 R2))
- Where:
- Selection sq checks equality of all common
attributes (i.e., attributes with same names)
- Projection PA eliminates duplicate common
attributes CSE 414 - Autumn 2018 38
R1 ⨝ R Natural Join Example CSE 414 - Autumn 2018 39 A B X Y X Z Y Z Z V B C Z U V W Z V A B C X Z U X Z V Y Z U Y Z V Z V W
R S
R ⨝ S =
PABC(sR.B=S.B(R S)) Natural Join Example 2 CSE 414 - Autumn 2018 40 age zip disease 54 98125 heart 20 98120 flu AnonPatient P Voters V P V name age zip Alice 54 98125 Bob 20 98120 age zip disease name 54 98125 heart Alice 20 98120 flu Bob Natural Join
- Given schemas R(A, B, C, D), S(A, C, E),
what is the schema of R ⨝ S?
- Given R(A, B, C), S(D, E), what is R ⨝ S?
- Given R(A, B), S(A, B), what is R ⨝ S? CSE 414 - Autumn 2018 41 Theta Join
- A join that involves a predicate
- Here q can be any condition
- No projection in this case!
- For our voters/patients example: 42 R1 ⨝q R2 = sq (R1 X R2)
P ⨝ P.zip = V.zip and P.age >= V.age - 1 and P.age <= V.age +1 V
AnonPatient (age, zip, disease) Voters (name, age, zip) CSE 414 - Autumn 2018
Equijoin
- A theta join where q is an equality predicate
- By far the most used variant of join in practice
- What is the relationship with natural join? CSE 414 - Autumn 2018 43 R1 ⨝q R2 = sq (R1 R2) Equijoin Example CSE 414 - Autumn 2018 44 age zip disease 54 98125 heart 20 98120 flu AnonPatient P Voters V P (^) P.age=V.age V name age zip p1 54 98125 p2 20 98120 P.age P.zip P.disease V.name V.age V.zip 54 98125 heart p1 54 98125 20 98120 flu p2 20 98120 Join Summary
- Theta-join : R ⨝q S = σq (R × S)
- Join of R and S with a join condition θ
- Cross-product followed by selection θ
- No projection
- Equijoin : R ⨝θ S = σθ (R × S)
- Join condition θ consists only of equalities
- No projection
- Natural join : R ⨝ S = πA (σθ (R × S))
- Equality on all fields with same name in R and in S
- Projection πA drops all redundant attributes CSE 414 - Autumn 2018 45 So Which Join Is It?
When we write R ⨝ S we usually mean an
equijoin, but we often omit the equality
predicate when it is clear from the context
CSE 414 - Autumn 2018 46 More Joins
- Outer join
- Include tuples with no matches in the output
- Use NULL values for missing attributes
- Does not eliminate duplicate columns
- Variants
- Left outer join
- Right outer join
- Full outer join CSE 414 - Autumn 2018 47 Outer Join Example CSE 414 - Autumn 2018 48 age zip disease 54 98125 heart 20 98120 flu 33 98120 lung AnonPatient P P ⋊ J P.age P.zip P.diseas e J.job J.age J.zip 54 98125 heart lawyer 54 98125 20 98120 flu cashier 20 98120 33 98120 lung null null null AnnonJob J job age zip lawyer 54 98125 cashier 20 98120
Using Extended RA Operators CSE 414 - Autumn 2018 55 SELECT city, sum(quantity) FROM Sales GROUP BY city HAVING count() > 100 Sales(product, city, quantity) g city, sum(quantity)→q, count() → c s (^) c > 100 P (^) city, q Answer Typical Plan for a Query (1/2) CSE 414 - Autumn 2018 56
R S
join condition σselection condition πfields join condition …
SELECT-PROJECT-JOIN
Query Answer SELECT fields FROM R, S, … WHERE condition Typical Plan for a Query (1/2) 57 πfields ɣfields, sum/count/min/max(fields) σhaving condition σwhere condition join condition … … SELECT fields FROM R, S, … WHERE condition GROUP BY fields HAVING condition CSE 414 - Autumn 2018 How about Subqueries? CSE 414 - Autumn 2018 58 Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and not exists (SELECT * FROM Supply P WHERE P.sno = Q.sno and P.price > 100) How about Subqueries? CSE 414 - Autumn 2018 59 Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and not exists (SELECT * FROM Supply P WHERE P.sno = Q.sno and P.price > 100) σ sstate=‘WA’ Supplier π sno Option 1: create nested plans not exists σ price> Supplier SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and not exists (SELECT * FROM Supply P WHERE P.sno = Q.sno and P.price > 100) How about Subqueries? CSE 414 - Autumn 2018 60 Correlation! Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price)
SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and not exists (SELECT * FROM Supply P WHERE P.sno = Q.sno and P.price > 100) How about Subqueries? CSE 414 - Autumn 2018 61 De-Correlation SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and Q.sno not in (SELECT P.sno FROM Supply P WHERE P.price > 100) Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’ and Q.sno not in (SELECT P.sno FROM Supply P WHERE P.price > 100) How about Subqueries? CSE 414 - Autumn 2018 62 (SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’) EXCEPT (SELECT P.sno FROM Supply P WHERE P.price > 100) EXCEPT = set difference Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) Un-nesting (SELECT Q.sno FROM Supplier Q WHERE Q.sstate = ‘WA’) EXCEPT (SELECT P.sno FROM Supply P WHERE P.price > 100) How about Subqueries? CSE 414 - Autumn 2018 63 Supply σ sstate=‘WA’ Supplier σ Price > 100
Finally… π sno^ π sno Supplier(sno,sname,scity,sstate) Part(pno,pname,psize,pcolor) Supply(sno,pno,price) Summary of RA and SQL
- SQL = a declarative language where we
say what data we want to retrieve
- RA = an algebra where we say how we
want to retrieve the data
- Theorem : SQL and RA can express
exactly the same class of queries
RDBMS translate SQLCSE 414 - Autumn 2018à RA, then optimize RA 64
Summary of RA and SQL
- SQL (and RA) cannot express ALL queries
that we could write in, say, Java
- Example:
- Parent(p,c): find all descendants of ‘Alice’
- No RA query can compute this!
- This is called a recursive query
- Next lecture: Datalog is an extension that can
compute recursive queries
CSE 414 - Autumn 2018 65