Query decomposition and | Lecture notes Computer Architecture and Organization

Query decomposition is the first phase of query processing. The aims of query decomposition are to transform a high-level query into a relational algebra

query and to check whether the query is syntactically and semantically correct.



The typical stages of query decomposition are

Analysis:

○



In this stage, the query is lexically and syntactically analysed using the techniques of programming language compilers. In addition, this stage verifies that

the relations and attributes specified in the query are defined in the system catalogue. It also verifies that any operations applied to database objects are

appropriate for the object type. For example, consider the following query:

SELECT staffNumber

FROM Staff

WHERE position >10;

This query would be rejected on two grounds:

(1) In the select list, the attribute staffNumber is not defined for the Staff relation (should be staffNo).

(2) In the WHERE clause, the comparison “>10” is incompatible with the data type position, which is a variable character string.

On completion of this stage, the high-level query has been transformed into some internal representation that is more suitable for processing. The internal

form that is typically chosen is some kind of query tree, which is constructed as follows:

A leaf node is created for each base relation in the query.

→

A nonleaf node is created for each intermediate relation produced by a relational algebra operation.

→

The root of the tree represents the result of the query.

→

The sequence of operations is directed from the leaves to the root.

SELECT*

FROMStaff s, Branch b

WHEREs.branchNo 5 b.branchNo AND (s.position 5 ‘Manager’ ANDb.city 5 ‘London’);

→

Three equivalent relational algebra queries corresponding to this SQL statement are:

(position = ‘Manager’) (city = ‘London’) (Staff.branchNo = Branch.branchNo)(Staff Branch)

(position = ‘Manager’) (city = ‘London’) (Staff ⋈Staff.branchNo5Branch.branchNo Branch)

(Position = ‘Manager’ (Staff)) ⋈Staff.branchNo=Branch.branchNo’ (city = ‘London’(Branch))

Figure 23.2 shows an example of a query tree for the SQL statement . We refer to this type of query tree as a relational algebra tree.

Normalization:

○

The normalization stage of query processing converts the query into a normalized form that can be more easily manipulated. The predicate (in SQL, the

WHERE condition), which may be arbitrarily complex, can be converted into one of two forms by applying a few transformation rules.

Conjunctive normal form: A sequence of conjuncts that are connected with the (AND) operator. Each conjunct contains one or more terms connected

QUERY DECOMPOSITION

13 November 2022

08:14

DBMS Page 1

Partial preview of the text

Download Query decomposition and and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

Query decomposition is the first phase of query processing. The aims of query decomposition are to transform a high-level query into a relational algebra query and to check whether the query is syntactically and semantically correct.

The typical stages of query decomposition are ○ Analysis:

In this stage, the query is lexically and syntactically analysed using the techniques of programming language compilers. In addition, this stage verifies that the relations and attributes specified in the query are defined in the system catalogue. It also verifies that any operations applied to database objects are appropriate for the object type. For example, consider the following query: SELECT staffNumber FROM Staff WHERE position >10; This query would be rejected on two grounds: (1) In the select list, the attribute staffNumber is not defined for the Staff relation (should be staffNo). (2) In the WHERE clause, the comparison “>10” is incompatible with the data type position, which is a variable character string. On completion of this stage, the high-level query has been transformed into some internal representation that is more suitable for processing. The internal form that is typically chosen is some kind of query tree, which is constructed as follows: → A leaf node is created for each base relation in the query. → A nonleaf node is created for each intermediate relation produced by a relational algebra operation. → The root of the tree represents the result of the query. The sequence of operations is directed from the leaves to the root. SELECT * FROM Staff s, Branch b WHERE s.branchNo 5 b.branchNo AND (s.position 5 ‘Manager’ AND b.city 5 ‘London’);

Three equivalent relational algebra queries corresponding to this SQL statement are: a. (position = ‘Manager’)  (^) (city = ‘London’)  (^) (Staff.branchNo = Branch.branchNo)(Staff Branch) b. (position = ‘Manager’)  (^) (city = ‘London’) (Staff ⋈ (^) Staff.branchNo5Branch.branchNo Branch) c. (Position = ‘Manager’ (Staff)) ⋈ (^) Staff.branchNo=Branch.branchNo’ (city = ‘London’(Branch)) Figure 23.2 shows an example of a query tree for the SQL statement. We refer to this type of query tree as a relational algebra tree. ○ Normalization: The normalization stage of query processing converts the query into a normalized form that can be more easily manipulated. The predicate (in SQL, the WHERE condition), which may be arbitrarily complex, can be converted into one of two forms by applying a few transformation rules. Conjunctive normal form: A sequence of conjuncts that are connected with the  (AND) operator. Each conjunct contains one or more terms connected QUERY DECOMPOSITION 13 November 2022 08:

Conjunctive normal form: A sequence of conjuncts that are connected with the  (AND) operator. Each conjunct contains one or more terms connected by the  (OR) operator. For example: (position = ‘Manager’  salary > 20000)  branchNo = ‘B003’.

A conjunctive selection contains only those tuples that satisfy all conjuncts. Disjunctive normal form: A sequence of disjuncts that are connected with the ( OR) operator. Each disjunct contains one or more terms connected by the  (AND) operator. For example, we could rewrite the previous conjunctive normal form as: (position = ‘Manager’  branchNo = ‘B003’)  (salary > 20000  branchNo = ‘B003’)

A disjunctive selection contains those tuples formed by the union of all tuples that satisfy the disjuncts. ○ Semantic Analysis: The objective of semantic analysis is to reject normalized queries that are incorrectly formulated or contradictory. A query is incorrectly formulated if components do not contribute to the generation of the result, which may happen if some join specifications are missing. A query is contradictory if its predicate cannot be satisfied by any tuple. For example, the predicate (position = ‘Manager’  position = ‘Assistant’) on the Staff relation is contradictory, as a member of staff cannot be both a Manager and an Assistant simultaneously. Thus the predicate ((position = ‘Manager’  position = ’Assistant’)  salary > 20000) could be simplified to (salary > 20000) by interpreting the contradictory clause as the Boolean value FALSE. Algorithms to determine correctness exist only for the subset of queries that do not contain disjunction and negation. For these queries, we could apply the following checks:

Construct a relation connection graph: If the graph is not connected, the query is incorrectly formulated. To construct a relation connection graph, we

create a node for each relation and a node for the result. We then create edges between two nodes that represent a join, and edges between nodes that represent the source of Projection operations. a.

Construct a normalized attribute connection graph: If the graph has a cycle for which the valuation sum is negative, the query is contradictory. To

construct a normalized attribute connection graph, we b. create a node for each reference to an attribute, or constant 0. We then create a directed edge between nodes that represent a join, and a directed edge between an attribute node and a constant 0 node that represents a Selection operation. Next, we weight the edges a → b with the value c, if it represents the inequality condition (a  b + c), and weight the edges 0 → a with the value - c, if it represents the inequality condition (a  c). ○ Simplification: The objectives of the simplification stage are to detect redundant qualifications, eliminate common subexpressions, and transform the query to a semantically equivalent but more easily and efficiently computed form. Typically, access restrictions, view definitions, and integrity constraints are considered at this stage, some of which may also introduce redundancy. If the user does not have the appropriate access to all the components of the query, the query must be rejected. Assuming that the user has the appropriate access privileges, an initial optimization is to apply the well-known idempotency rules of Boolean algebra. For example, consider the following view definition and query on the view: CREATE VIEW Staff3 AS SELECT * SELECT staffNo, fName, lName, salary, branchNo FROM Staff FROM Staff WHERE (branchNo = ‘B003’ AND

Query decomposition and, Lecture notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Query decomposition and and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

Construct a relation connection graph: If the graph is not connected, the query is incorrectly formulated. To construct a relation connection graph, we

Construct a normalized attribute connection graph: If the graph has a cycle for which the valuation sum is negative, the query is contradictory. To