




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Consider the following SQL query that finds all applicants who want to major in CSE, live in Seattle, and go to a school ranked better than 10 (i.e., rank < 10) ...
Typology: Exercises
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Given the following SQL query: Student (sid, name, age, address) Book(bid, title, author) Checkout(sid, bid, date)
SELECT S.name FROM Student S, Book B, Checkout C WHERE S.sid = C.sid AND B.bid = C.bid AND B.author = ’Olden Fames’ AND S.age > 12 AND S.age < 20 And assuming:
(a) Show a physical query plan for this query, assuming there are no indexes and data is not sorted on any attribute. Solution: Note: many solutions are possible.
Scan: Book
Scan: Checkout
(^1) sid
Scan: Student
(^1) bid
σ 12 <age< 20 ∧author=′OldEnF ames′
Πname
On the fly
On the fly
Block nested loop
Tuple-base nested loop
Figure 1: One possible query plan (all joins are nested-loop joins)
(b) Compute the cost of this query plan and the cardinality of the result.
Solution: Cost Cardinality Remarks S 1 C B(S) + B(S) * B(C) = 1000 + 1000 * 15000 = 15001000
300000 (foreign-key join)
300000 (foreign-key join)
σ and Π On the fly 300000 * σauthor * σage = 300000 * 5001 ∗ 187 ≈ 234
Total 1515001000 234 (1) We are doing page at a time nested loop join. Also, the output is pipelined to next join. (2) The output relation is pipelined from below. Thus, we don’t need the scanning term for outer relation. (3) We assume uniform value distributions for age and author. We assume independence among participating columns.
(e) Explain the steps that the Selinger query optimizer would take to optimize this query.
Solution: A query optimizer explores the space of possible query plans to find the most promising one. The Selinger query optimizer performs the search as follows:
Consider the following SQL query that finds all applicants who want to major in CSE, live in Seattle, and go to a school ranked better than 10 (i.e., rank < 10). Relation Cardinality Number of pages Primary key Applicants (id, name, city, sid) 2,000 100 id Schools (sid, sname, srank) 100 10 sid Major (id, major) 3,000 200 (id,major)
SELECT A.name FROM Applicants A, Schools S, Major M WHERE A.sid = S.sid AND A.id = M.id AND A.city = ’Seattle’ AND S.rank < 10 AND M.major = ’CSE’
And assuming:
(a) What is the cost of the query plan below? Count only the number of page I/Os.
Applicants
sid = sid
π (^) name
(Sort-merge)
(File Scan)
(1) σ (^) city=‘Seattle’
(File scan)
( 3 )
( 6 )
( 2 ) σ (^) srank < 10
Schools
id = id
( 4 )
(B+ tree index on id)
Major
(Index nested loop)
(One-the-fly) (5) σ (^) major = ‘CSE’
(One-the-fly)
Consider the schema R(a,b), S(b,c), T(b,d), U(b,e).
(a) For the following SQL query, give two equivalent logical plans in relational algebra such that one is likely to be more efficient than the other. Indicate which one is likely to be more efficient. Explain.
SELECT R.a FROM R, S WHERE R.b = S.b AND S.c = 3
Solution:
i. πa(σc=3(R ./b=b (S))) ii. πa(R ./b=b σc=3(S))) ii. is likely to be more efficient With the select operator applied first, fewer tuples need to be joined.
(b) Recall that a left-deep plan is typically favored by optimizers. Write a left-deep plan for the following SQL query. You may either draw the plan as a tree or give the relational algebra expression. If you use relational algebra, be sure to use parentheses to indicate the order that the joins should be performed.
SELECT * FROM R, S, T, U WHERE R.b = S.b AND S.b = T.b AND T.b = U.b
Solution: ((R ./b=b S) ./b=b T ) ./b=b U
(c) Physical plans. Assume that all tables are clustered on the attribute b, and there are no secondary indexes. All tables are large. Do not assume that any of the relations fit in memory. For the left-deep plan you gave in (b), suggest an efficient physical plan. Specify the physical join operators used (hash, nested loop, sortmerge, etc.) and the access methods used to read the tables (sequential scan, index, etc.). Explain why your plan is efficient. For operations where it matters, be sure to include the details — for instance, for a hash join, which relation would be stored in the hash tables; for a loop join, which relation would be the inner or outer loop. You should specify how the topmost join reads the result of the lower one. Solution: join order doesn’t matter, sortmerge for every join, seqscan for R,S,T,U. Fully pipelined. “clustered index scan” instead of seqscan is also correct.
(d) For the physical plan you wrote for (c), give the estimated cost in terms of B(...), V (...), and T (...). Explain each term in your expression. Solution: B(R) + B(S) + B(T ) + B(U ). Just need to read each table once.