











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The process of translating SQL queries that contain subqueries into relational algebra, including the normalization of WHERE-subqueries into EXISTS and NOT EXISTS form, the translation of SELECT-FROM-WHERE subqueries, the de-correlation of subqueries appearing in a conjunctive WHERE condition, and the translation of SQL operations such as UNION, INTERSECT, and EXCEPT. It also covers the use of lateral subqueries in SQL-99 and the rewriting of scalar subqueries in the Select-list as LATERAL subqueries in the From-list.
Typology: Study notes
1 / 19
This page cannot be seen from the preview
Don't miss anything!












Before reading these notes, please ensure that you are familiar with (1) the relational data model as defined in Section 2.2 of “Database Management Systems: The Complete Book (second edition)” (hereafter abbreviated as “TCB”); (2) the set-based relational algebra as defined in section 2.4 of TCB; its bag-based variant and extension as defined in sections 5.1 and 5. of TCB; and (3) the SQL query language as defined in chapter 6 of TCB. Throughout these notes we will use the following example database schema about movies, as introduced in TCB Figure 2.5. The attributes of the primary key are underlined.
Translating an arbitrary SQL query into a logical query plan (i.e., a rela- tional algebra expression) is a complex task. In these course notes we try to explain the most important elements of this translation by making the following simplifying assumptions:
(comprising only the traditional select-from-where queries, aggrega- tion, etc).
We will use expressions in the extended relational algebra (see section 5.2 in the book) interpreted over sets as logical query plans.
Provisio To exclude ambiguities, we will assume without loss of generality in what follows that all occurrences of relation symbols in a SQL statement are assigned a distinct name through the alias mechanism of SQL. In other words: we hence assume implicitly that SQL statements in which a relation symbol occurs multiple times, like, for example,
SELECT * FROM R, R
is rewritten into a SQL statement of the form
SELECT * FROM R, R R
in which each occurrence is given a distinct (alias) name. Such rewriting is straightforward to do algorithmically, and can hence be done as a pre- processing step before execution of the algorithm described below.
2 Select-from-where statements without subqueries
Consider a general SELECT-FROM-WHERE statement of the form
SELECT Select-list FROM R 1 ,... , R 2 T 2 ,... WHERE Where-condition
When the statement does not use subqueries in its where-condition, we can easily translate it into the relational algebra as follows:
πSelect-list σWhere-condition(R 1 × · · · × ρT 2 (R 2 ) × · · · ).
Note that:
SELECT movieTitle FROM StarsIn WHERE EXISTS (SELECT name FROM MovieStar WHERE birthdate = 1960 AND name = starName)
Example 3. The SQL-statement
SELECT name FROM MovieExec WHERE netWorth >= ALL (SELECT E.netWorth FROM MovieExec E)
can be rewritten equivalently as
SELECT name FROM MovieExec WHERE NOT EXISTS(SELECT E.netWorth FROM MovieExec E WHERE netWorth < E.netWorth)
Example 4. Consider relations R(A, B) and S(C). Then
SELECT C FROM S WHERE C IN (SELECT SUM(B) FROM R GROUP BY A)
can be rewritten as
SELECT C FROM S WHERE EXISTS (SELECT SUM(B) FROM R GROUP BY A HAVING SUM(B) = C)
Without loss of generality we will hence assume in what follows that all subqueries in the WHERE conditions are of the form EXISTS or NOT EXISTS.
4 Context relations
To translate a query with subqueries into the relational algebra, it seems a logical strategy to work by recursion: first translate the subqueries and then combine the translated results into a translation for the entire SQL state- ment. If the subqueries contain subqueries themselves, we again translate the latter first — continuing recursively until we reach a level that does not contain subqueries.
For subqueries that do not contain subqueries themselves, we could think that we can simply apply the method from Section 2. There is one compli- cation, however: the subquery can refer to attributes of relations appearing in the FROM list of one of the outer lying queries. This is known as correlated subqueries.
Example 5. The following query contains a subquery that refers to the starName attribute of the outer relation StarsIn.
SELECT movieTitle FROM StarsIn WHERE EXISTS (SELECT name FROM MovieStar WHERE birthdate = 1960 AND name = starName)
We call the outer relations from which a correlated subquery uses certain attributes context relations for the subquery. Note that a subquery can have multiple context relations. We call the attributes of the context relations the parameters of the subquery. Note that not all of the parameters must actually occur in the subquery.
Example 6. In Example 5, StarsIn is hence a context relation for the sub- query. (In this example, it is also the only context relation.) The correspond- ing parameters are all attributes of StarsIn, i.e., movieTitle, movieYear, and starName.
Translating select-from-where subqueries To translate a SELECT– FROM–WHERE statement that is used as a subquery we must make the fol- lowing modifications to the method from Section 2:
Example 7. With these modifications, the subquery from Example 5:
SELECT name FROM MovieStar WHERE birthdate = 1960 AND name = starName
is translated into
πmovieTitle,movieYear,starName,name σ (^) birthdate= ∧name=starName
(StarsIn × MovieStar).
Note how we have added the context relation StarsIn to the cartesian prod- uct, and how we have added the parameters movieTitle, movieYear and starName (the attributes of StarsIn) to the projection π.
We will translate it using the method of Section 2, but need to also include the following context relations:
The relational algebra expression that we hence obtain is of the form
σψ (E),
where E is a cartesian product of all relations in the From-list, to which we add context relations for which parameters occur in ψ, or for which parameters occur in some NOT EXISTS subquery. In what follows, we will gradually adapt and refine E when de-correlating the subqueries.
Example 8. Consider the following nested statement over the relations R(A, B) and S(C):
SELECT R1.A, R1.B FROM R R1, S WHERE EXISTS (SELECT R2.A, R2.B FROM R R_ WHERE R2.A = R1.B AND EXISTS (SELECT R3.A, R3.B FROM R R WHERE R3.A = R2.B AND R3.B = S.C))
Let us denote the entire query by Q 1 ; the middle subquery by Q 2 ; and the inner subquery by Q 3. Now assume that we are currently translating Q 2. The subquery-free part of Q 2 is as follows:
SELECT * FROM R R_ WHERE R_2.A = R_1.B
Its translation is hence:
σR 2 .A=R 1 .B (ρR 2 (R) × ρR 1 (R)) (∗)
Note that, although S is a context relation for Q 2 , it does not appear in the translation. This is because the parameter S.C does not occur in the
subquery-free part of Q 2 , but only in the subquery Q 3 itself. Since Q 3 is an EXISTS subquery, S does not need to be included in the cartesian product. In contrast, had Q 3 been a NOT EXISTS subquery, then we would have needed to include S.
After we have translated the subquery-free part, we translate all subqueries EXISTS(Q) in turn. By applying the entire translation algorithm described in these notes recursively to Q, we can already translate Q into a relational algebra expression EQ.
Example 9. Let us continue the translation of Q 2 from Example 8. We must then first translate Q 3 as follows:
σ (^) R 3 .A=R 2 .B ∧R 3 .B=S.C
(ρR 3 (R) × ρR 2 (R) × S)
Note that EQ 3 query already specifies for which tuples in R 2 and R 3 correct tuples in S exist (together with the values of these tuples). We can use this information to de-correlate Q 2 as follows.
Let A 1 ,... , Ap be the list of all parameters of context relations of Q. We can translate EXISTS(Q) by joining E with the “space of parameters” for EQ, namely πA 1 ,...,Ap (EQ):
E := E on πA 1 ,...,Ap (EQ).
Example 10. Let us continue the translation of Q 2 from Examples 8 and
E = ρR 2 (R) × ρR 1 (R) EQ 3 = σ (^) R 3 .A=R 2 .B ∧R 3 .B=S.C
(ρR 3 (R) × ρR 2 (R) × S)
By joining E and EQ 3 on the parameters of Q 3 (i.e., R 2 .A and R 2 .B) we ensure that we “link” the correct tuples from E with the correct tuples of EQ 3. In particular, we calculate the tuples in R 1 for which tuples in R 2 , R 3 , and S exist that satisfy the requirements of Q 2 (together with the values of these tuples).
Actually, we can simplify this expression somewhat. Indeed, note that the following are equivalent because we join R 2 with a subset of itself:
(ρR 1 (R) × ρR 2 (R)) on πR 2 .A,R 2 .B,S.C σ (^) R 3 .A=R 2 .B ∧R 3 .B=S.C
(ρR 3 (R) × ρR 2 (R) × S)
and
ρR 1 (R) on πR 2 .A,R 2 .B,S.C σ (^) R 3 .A=R 2 .B ∧R 3 .B=S.C
(ρR 3 (R) × ρR 2 (R) × S)
We add such an anti-join to E for every NOT EXISTS subquery. Note, however, that in this case we cannot remove context relations from the cartesian product obtained by translating the From-list. (Why not?)
Finally, we must apply to E the projection πSelect-list. If the query that we are translating is used itself as a subquery in an outer lying query, then we must of course take care to add all parameters to Select-list.
6 Operations on relations
SQL-expressions do not simply consist of select-statements, but can also be formed using the operations UNION, INTERSECT, and EXCEPT (possibly in con- junction with the modifier ALL); and with the join-operations CROSS JOIN, NATURAL JOIN, JOIN ON, and OUTER JOIN. Notice that such expressions may also occur as subqueries. In this section, we discuss how to translate these operations. We begin with the join operations.
Consider first the SQL-expression
Q 1 CROSS JOIN Q 2 (†)
where Q 1 and Q 2 are themselves also SQL-expressions. We first translate Q 1 and Q 2 into the relational algebra; which yields relational algebra ex- pressions E 1 and E 2. At first sight we may think that we can then simply translate the expression (†) above as E 1 × E 2. This translation is incorrect, however, when Q 1 CROSS JOIN Q 2 is used as a subquery. Indeed, in that case, Q 1 and Q 2 can be correlated subqueries that may have parameters in common. In order to be able to continue using the translation strategy from Section 5, we need to synchronize the correlated subqueries on the common parameters. This can be done using a natural join over the common parameters, simply taking:
E 1 on E 2
instead of E 1 × E 2. (Since, according to the SQL specification, the operands of a CROSS JOIN must have disjoint sets of attributes, any attributes common to both E 1 and E 2 must be parameters. Hence, it is not necessary to specify that only the parameters should be equal using a theta-join. A natural join suffices.)
Example 12. To illustrate this discussion, consider the relations R(A, B) and S(C), as well as the following query.
Let us translate its EXISTS subquery containing the cross join. The trans- lations of Q 1 and Q 2 are as follows.
E 1 = πS 1 .C,S 2 .C,R 1 .A,R 1 .B σ (^) A=S 1 .C ∧B=S 2 .C
(ρR 1 (R) × ρS 1 (S) × ρS 2 (S))
E 2 = πS 1 .C,R 2 .A,R 2 .B σB=S 1 .C (ρR 2 (R) × ρS 1 (S))
Notice that Q 1 and Q 2 have one context relation in common, namely S 1. The translation of the EXISTS subquery is then:
E 1 on E 2.
Notice that the natural join occurs only on the common parameter S 1 .C.
In a similar manner we can translate the other SQL join-operations. We only have to take care to use the correct algebra operator in the translation:
on for NATURAL JOIN; theta-join for JOIN ON; and outer join
o on for OUTER JOIN.
Example 13. Here is a simple example of a theta-join. The join expression is not a correlated subquery, and hence we do not need to take into account context relations.
Movie JOIN StarsIn ON title = movieTitle AND year = movieYear
It is translated as follows:
Movie on title ∧year==movieTitlemovieYear
StarsIn.
Next consider a SQL-expression
Q 1 UNION Q 2 (‡)
7 The non-conjunctive case
How can we translate WHERE-conditions that contain subqueries, but that are not of the conjunctive form required in Section 5? We first note that we can always rewrite any WHERE condition as a disjunction (OR) of the form:
ϕ 1 OR ϕ 2 OR ...
where every ϕi is a conjunction (AND) of subqueries, possibly with an addi- tional subquery-free condition. Rewriting an arbitrary WHERE condition into this so-called disjunctive normal form can be done by repeatedly applying the laws of De Morgan^1 and distributivity^2 of AND over OR.
Example 17. • The WHERE condition
NOT EXISTS Q AND (A=B OR C<6)
with Q a subquery is already in normal form: it is a conjunction of a subquery and a condition without subqueries. In this case there is hence only one conjunction: the entire condition itself.
A=B OR EXISTS Q
is also in normal form. In this case there are two conjunctions, each consisting of a single atomic condition.
A=B AND NOT(EXISTS Q AND C<6)
is not in normal form. We can rewrite it into the desired form by one application of De Morgan followed by one application of distributivity:
A=B AND (NOT EXISTS Q OR C>=6) (A=B AND NOT EXISTS Q) OR (A=B AND C>=6)
To translate an arbitrary select-statement we hence first rewrite its WHERE condition into disjunctive normal form:
(^1) ¬(p ∨ q) = ¬p ∧ ¬q, ¬(p ∧ q) = ¬p ∨ ¬q. (^2) (p ∨ q) ∧ r = (p ∧ r) ∨ (q ∧ r)
SELECT Select-list FROM From-list WHERE ϕ 1 OR ϕ 2 OR...
We then treat the disjunction as a union:
(SELECT Select-list UNION (SELECT Select-list UNION... FROM From-list FROM From-list WHERE ϕ 1 ) WHERE ϕ 2 )
Since every select-statement subquery of this union has a conjunctive WHERE condition ϕi, we can translate these subqueries using the method from para- graph 5. The UNION operations themselves can then be translated as dis- cussed in Section 6.2.
8 Group by and Having
Up until now we have only considered select-from-where statements without aggregation. In general, however, a select-statement can also contain GROUP BY and HAVING clauses:
SELECT Select-list FROM From-list WHERE Where-condition GROUP BY Group-list HAVING Having-condition
Note that such a statement can also occur as a subquery! To translate such a statement, we first translate its FROM–WHERE part:
SELECT * FROM From-list WHERE Where-condition
Let E be the relational algebra expression hence obtained. (Note in par- ticular that the where-condition may contain subqueries in non-conjunctive form, in which case we have to apply the techniques from Section 7 to ob- tain E). Let A 1 ,... , An be the parameters of the statement (if it occurs as a subquery). The translation of the entire statement then is the following.
πA 1 ,...,An,Select-list σHaving-condition γA 1 ,...,An,Group-list,Agg-list(E).
Here, ‘Agg-list’ consists of all aggregation operations performed in the Hav- ing condition or Select-list. If the HAVING clause is absent, then the σ can be omitted.
Example 18. The SQL statement
Example 21. We hence translate the query from Example 20 as follows:
πmovieTitleσstarName=M.name(StarsIn × ρMπnameσbirthdate=1960(MovieStar)).
SQL-99 also allows FROM-subqueries to be correlated with relations or sub- queries that precede it in the From-list.
Example 22. The following statement is legal in SQL-99 (but not in SQL- 92). It uses the keyword LATERAL to indicate that a subquery is correlated.
SELECT S.movieTitle FROM (SELECT name FROM MovieStar WHERE birthdate = 1960) M, LATERAL (SELECT movieTitle FROM StarsIn WHERE starName = M.name) S
To translate such subqueries it suffices to replace the cartesian product (which we normally use to translate From-lists) by a natural join of the parameters. It is important to note, however, that context relations need no longer be base relations, but can themselves also be translations of earlier subqueries in the From-list. Just as with EXISTS subqueries, we can omit preceding context relations if those are already contained in the parameters space of a subsequent subquery.
Example 23. We translate the statement from Example 22 as follows. We start by translating the first subquery:
E 1 = πname σbirthdate=1960(MovieStar).
We then translate the second subquery, which has E 1 as a context relation:
E 2 = πname,movieTitle σstarName=name(StarsIn × E 1 ).
We would then translate the entire FROM clause by means of a join and not by means of a cartesian product (because of the correlation):
πmovieTitle(E 1 on E 2 ).
In this example, however, all relevant tuples from E 1 are already con- tained in E 2 , and hence we can omit E 1. The obtained end result is then πmovieTitle(E 2 ).
10 Subqueries in the Select-list
Finally, SQL also allows subqueries in the Select-list; the so-called scalar subqueries. These subqueries must always return a single value. They can nevertheless be correlated. In that case they must hence return a single value for each possible assignment of values to the parameters.
Example 24. Consider the relations R(A, B) and S(C), and assume that A is a key for R. Then the following query is allowed in SQL.
SELECT C, (SELECT B FROM R WHERE A=C) FROM S
Consider in addition a relation T (D, E). The following is also allowed.
SELECT D, (SELECT B FROM R WHERE A=D AND B>SUM(E)) FROM T GROUP BY D
Scalar subqueries in the select-list can be rewritten as LATERAL sub- queries in de From-list. Since we already know how to translate the latter, we hence also know how to translate the former. Formally, the rewriting of subqueries in the select-list goes as follows. Consider a general select- statement with a subquery in its select-list.
SELECT Select-list, (Q) FROM... WHERE... GROUP BY... HAVING...
Since Q must be a scalar subquery, it can only have one output attribute, say attribute A. We then rewrite as follows:
SELECT Select-list, T.A FROM (SELECT * WHERE... GROUP BY... HAVING... ), LATERAL (Q) T
If the subquery has parameters that are aggregation operators (such as, e.g., the parameter SUM(E) in the second expression of Example 24) we need to fine-tune this rewriting by including these aggregations to the γ operator in the translation of the first subquery (which corresponds to the original statement).
If we then implement the aggregation operator γ in such a way that the COUNT of a single ⊥ equals 0, the evaluation proceeds correctly.