Deductive Databases-Database Management Systems-Lecture 28 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Deductive Databases, Motivation, Datalog, Fixpoints, Negation, Stratification, Aggregate Operators, Magic Sets Algorithm, Database Management Systems, Raghu Ramakrishnan, Lecture Slides, Computer Science, University of Wisconsin, United States of America

Typology: Slides

2011/2012

Uploaded on 02/15/2012

arien
arien 🇺🇸

4.8

(24)

309 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
1Database Management Systems 3ed, R. Ramakrishnanand J. Gehrke
Deductive Databases
Chapter 25
2Database Management Systems 3ed, R. Ramakrishnanand J. Gehrke
Motivation
SQL-92 cannot express some queries:
Are we running low on any parts needed
to build a ZX600 sports car?
What is the total component and assembly
cost to build a ZX600 at today's part prices?
Can we extend the query language to cover
such queries?
Yes, by adding recursion.
3Database Management Systems 3ed, R. Ramakrishnanand J. Gehrke
Datalog
SQL queries can be read as follows:
If some tuples exist in the From tables that
satisfy the Where conditions,
then the Select tuple is in the answer.”
Datalog is a query language that has the same
if-then flavor:
New: The answer table can appear in the
From clause, i.e., be defined recursively.
Prolog style syntax is commonly used.
pf3
pf4
pf5
pf8

Partial preview of the text

Download Deductive Databases-Database Management Systems-Lecture 28 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Deductive Databases

Chapter 25

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

Motivation

™ SQL-92 cannot express some queries: ƒ Are we running low on any parts needed to build a ZX600 sports car? ƒ What is the total component and assembly cost to build a ZX600 at today's part prices? ™ Can we extend the query language to cover such queries? ƒ Yes, by adding recursion.

Datalog

™ SQL queries can be read as follows: “If some tuples exist in the From tables that satisfy the Where conditions, then the Select tuple is in the answer.” ™ Datalog is a query language that has the same if-then flavor: ƒ New: The answer table can appear in the From clause, i.e., be defined recursively. ƒ Prolog style syntax is commonly used.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 4

Example

™ Find the components of a trike? ™ We can write a relational algebra query to compute the answer on the given instance of Assembly****. ™ But there is no R.A. (or SQL-92) query that computes the answer on all Assembly instances.

trike wheel 3 trike frame 1 frame seat 1 frame pedal 1 wheel spoke 2 wheel tire 1 tire rim 1 tire tube 1

Assembly instance

part subpartnumber

trike

wheel frame

spoke tire seat pedal

rim tube

3 1

2 1 1 1

1 1

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 5

The Problem with R.A. and SQL-

™ Intuitively, we must join Assembly with itself to deduce that trike contains spoke and tire. ƒ Takes us one level down Assembly hierarchy. ƒ To find components that are one level deeper (e.g., rim), need another join. ƒ To find all components, need as many joins as there are levels in the given instance! ™ For any relational algebra expression, we can create an Assembly instance for which some answers are not computed by including more levels than the number of joins in the expression!

A Datalog Query that Does the Job

Comp(Part, Subpt) :- Assembly(Part, Subpt, Qty). Comp(Part, Subpt) :- Assembly(Part, Part2, Qty), Comp(Part2, Subpt).

Can read the second rule as follows: “ For all values of Part, Subpt and Qty, if there is a tuple (Part, Part2, Qty) in Assembly and a tuple (Part2, Subpt) in Comp, then there must be a tuple (Part, Subpt) in Comp.”

head of rule implication body of rule

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 10

Fixpoints

™ Let f be a function that takes values from domain D and returns values from D. A value v in D is a fixpoint of f if f(v)=v. ™ Consider the fn double+ , which is applied to a set of integers and returns a set of integers (I.e., D is the set of all sets of integers). ƒ E.g., double+({1,2,5})={2,4,10} Union {1,2,5} ƒ The set of all integers is a fixpoint of double+. ƒ The set of all even integers is another fixpoint of double+ ; it is smaller than the first fixpoint.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 11

Least Fixpoint Semantics for Datalog

™ The least fixpoint of a function f is a fixpoint v of f such that every other fixpoint of f is smaller than or equal to v. ™ In general, there may be no least fixpoint (we could have two minimal fixpoints, neither of which is smaller than the other). ™ If we think of a Datalog program as a function that is applied to a set of tuples and returns another set of tuples, this function (fortunately!) always has a least fixpoint.

Negation

™ If rules contain not there may not be a least fixpoint. Consider the Assembly instance; trike is the only part that has 3 or more copies of some subpart. Intuitively, it should be in Big, and it will be if we apply Rule 1 first. ƒ But we have Small(trike) if Rule 2 is applied first! ƒ There are two minimal fixpoints for this program: Big is empty in one, and contains trike in the other (and all other parts are in Small in both fixpoints). ™ Need a way to choose the intended fixpoint.

Big(Part) :- Assembly(Part, Subpt, Qty), Qty >2, not Small(Part). Small(Part) :- Assembly(Part, Subpt, Qty), not Big(Part).

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 13

Stratification

™ T depends on S if some rule with T in the head contains S or (recursively) some predicate that depends on S, in the body. ™ Stratified program: If T depends on not S, then S cannot depend on T (or not T). ™ If a program is stratified, the tables in the program can be partitioned into strata: ƒ Stratum 0: All database tables. ƒ Stratum I: Tables defined in terms of tables in Stratum I and lower strata. ƒ If T depends on not S, S is in lower stratum than T.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 14

Fixpoint Semantics for Stratified Pgms

™ The semantics of a stratified program is given by one of the minimal fixpoints, which is identified by the following operational defn: ƒ First, compute the least fixpoint of all tables in Stratum 1. (Stratum 0 tables are fixed.) ƒ Then, compute the least fixpoint of tables in Stratum 2; then the lfp of tables in Stratum 3, and so on, stratum-by-stratum. ™ Note that Big/Small program is not stratified.

Aggregate Operators

™ The < … > notation in the head indicates grouping; the remaining arguments (Part, in this example) are the GROUP BY^ fields. ™ In order to apply such a rule, must have all of Assembly relation available. ™ Stratification with respect to use of < … > is the usual restriction to deal with this problem; similar to negation.

NumParts(Part, SUM()) :- Assembly(Part, Subpt, Qty).

SELECT A.Part, SUM(A.Qty) FROM Assembly A GROUP BY A.Part

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 19

Avoiding Unnecessary Inferences

™ Suppose that we want to find all SameLev tuples with spoke in the first column. We should “push” this selection into the fixpoint computation to avoid unnecessary inferences. ™ But we can’t just compute SameLev tuples with spoke in the first column, because some other SameLev tuples are needed to compute all such tuples: SameLev(spoke,seat) :- Assembly(wheel,spoke,2), SameLev(wheel,frame), Assembly(frame,seat,1).

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 20

“Magic Sets” Idea

™ Idea: Define a “filter” table that computes all relevant values, and restrict the computation of SameLev to infer only tuples with a relevant value in the first column. Magic_SL(P1) :- Magic_SL(S1), Assembly(P1,S1,Q1). Magic(spoke).

SameLev(S1,S2) :- Magic_SL(S1), Assembly(P1,S1,Q1), Assembly(P2,S2,Q2). SameLev(S1,S2) :- Magic_SL(S1), Assembly(P1,S1,Q1), SameLev(P1,P2), Assembly(P2,S2,Q2).

The Magic Sets Algorithm

™ Generate an “adorned” program ƒ Program is rewritten to make the pattern of bound and free arguments in the query explicit; evaluating SameLevel with the first argument bound to a constant is quite different from evaluating it with the second argument bound ƒ This step was omitted for simplicity in previous slide ™ Add filters of the form “Magic_P” to each rule in the adorned program that defines a predicate P to restrict these rules ™ Define new rules to define the filter tables of the form Magic_P

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 22

Generating Adorned Rules

™ The adorned program for the query pattern SameLev bf^ , assuming a right-to-left order of rule evaluation : SameLev bf^ (S1,S2) :- Assembly(P1,S1,Q1), Assembly(P2,S2,Q2).

SameLev bf^ (S1,S2) :- Assembly(P1,S1,Q1),

SameLev bf^ (P1,P2), Assembly(P2,S2,Q2). ™ An argument of (a given body occurrence of) SameLev is b if it appears to the left in the body, or in a b arg of the head of the rule. ™ Assembly is not adorned because it is an explicitly stored table.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23

Defining Magic Tables

™ After modifying each rule in the adorned program by adding filter “Magic” predicates, a rule for Magic_P is generated from each occurrence O of P in the body of such a rule: ƒ Delete everything to the right of O ƒ Add the prefix “Magic” and delete the free columns of O ƒ Move O, with these changes, into the head of the rule SameLev bf^ (S1,S2) :- Magic_SL(S1), Assembly(P1,S1,Q1), SameLev bf^ (P1,P2), Assembly(P2,S2,Q2). Magic_SL(P1) :- Magic_SL(S1), Assembly(P1,S1,Q1).

Summary

™ Adding recursion extends relational algebra and SQL-92 in a fundamental way; included in SQL:1999, though not the core subset. ™ Semantics based on iterative fixpoint evaluation. Programs with negation are restricted to be stratified to ensure that semantics is intuitive and unambiguous. ™ Evaluation must avoid repeated and unnecessary inferences. ƒ “Seminaive” fixpoint evaluation ƒ “Magic Sets” query transformation