Download Understanding Recursive Queries in Datalog and SQL and more Study Guides, Projects, Research Database Management Systems (DBMS) in PDF only on Docsity!
Winter 2003 1
Today’s Lecture
- Useful queries that RA/RC/basic SQL cannot express?
- Datalog: a logical query language
- Recursion in Datalog
- Semantics and computation
Winter 2003 2
Some queries cannot be written in basic
SQ L
relation schema
Assembly(Part, Subpart,
qty)
subparts of part “Wheel”
Tire Tube 1
Tire rim 1
Wheel Tire 1
Wheel Spoke 2
Frame Pedal 1
Frame Seat 1
Trike Frame 1
Trike Wheel 3
Part Subpart Qty
Winter 2003 3
Tree v iew of A ssem b ly rel ati on
trike
wheel frame
spoke tire
rim
seat pedal
tube
- First, find all subparts of “wheel” - a select, followed by project - Answer: spoke, tire
- Then, find all subparts of the results above - join result above with Assembly relation - Answer: rim, tube
- Take the union of answers above - spoke, tire, rim, tube
- But what if there are subparts beneath rim or tube?
Winter 2003 4
Tree v iew of A ssem b ly rel ati on
trike
wheel frame
spoke tire
rim
seat pedal
tube
- The above query works only for this given instance of relation
- More precisely, it works if wheel has at most subparts of depth 2
- However, we cannot run the same query in another instance of the relation where wheel has subparts of depth 3 or 4 or … It is impossible to write a query, (whether in RA, RC, or basic SQL), that will find all subparts of the part wheel for every possible instance of the Assembly relation
Winter 2003 7
D atal og : U sing log ic as a query lang uag e
- The general idea behind Datalog is to use Horn-clauses -- “if-then” rules -- as a query language for relational databases.
- Predicates correspond to relations
- The predicate Assembly(X, Y, Z) is true iff (X, Y, Z) is a tuple in the Assembly relation
- Note the positional interpretation to arguments
- X, Y, Z represents values in the 1st, 2nd, and 3rd^ columns of the Assembly relation respectively
- Ground predicate: all arguments are constants
- Assembly(wheel, spoke, 2)
- Convention: variables as represented as uppercase letters, constants are represented in lowercase
Winter 2003 8
D atal og rul es
- A rule has the form p :- q
- “:-” to be read as “if”
- This means if q is true then p is true
- Subparts(X, Y) :- Assembly(X, Y, Z), Assembly(X,U,V)
- Body consists of a logical AND or zero or more subgoals
- A datalog program is a collection of rules
Head (^) Body Subgoal
Winter 2003 9
E x tensional v s. I ntensional P red icates
- EDB – extensional database consists of predicates
which are defined in relations
Assembly(trike, wheel, 3)
Assembly(trike, frame, 1)
Assembly(frame, wheel, 1)
- IDB – intensional database consists of predicates that
are defined through some datalog rules.
Winter 2003 10
E x am p le
- Suppose the Assembly relation on the right is defined in the EDB
- Find all immediate subparts of “frame”
- FrameSubparts(X) :- Assembly(Frame, X, Y)
Answer:
FrameSubparts(Seat)
FrameSubparts(Pedal)
Tire Tube 1
Tire rim 1
Wheel Tire 1
Wheel Spoke 2
Frame Pedal 1
Frame Seat 1
Trike Frame 1
Trike Wheel 3
Part Subpart Qty
Winter 2003 13
E x am p le
- Output so far: WheelSubparts(spoke), WheelSubparts(tire)
- Consider the second rule:
- If WheelSupbparts(tire), Assembly(tire,rim,1), then WheelSubpart(rim)
- If WheelSupbparts(tire), Assembly(tire,tube,1), then WheelSubpart(tube)
- Output so far: WheelSubparts(spoke), WheelSubparts(tire), WheelSubpart(rim), WheelSubpart(tube)
- Consider the second rule again:
- No new predicates can be inferred!
- Therefore the result is { WheelSubparts(spoke), WheelSubparts(tire), WheelSubpart(rim), WheelSubpart(tube) }
Winter 2003 14
W h en is a datal og p rog ram recursi v e?
- The datalog program, WheelSubparts, is recursive
- Construct a dependency graph from the datalog program
- Example: P :- S1, …, Sn
- The program is recursive if the graph contains a cycle
P S
Sn
P depends on Si
Winter 2003 15
E x am p les
WheelSubparts(Y) :- Assembly(wheel, Y, _)
WheelSubparts(Y) :- WheelSupbparts(X), Assembly(X,Y,_)
WheelSubparts Assembly
S(X,Y,Z) :- T(X,Y,Z)
P(X,Y) :- R(X, a, Y), S(Y, b, c)
P(X, Z):- R(X, Z, Y)
S T
P (^) R
S(X,Y,Z) :- T(X,Y,Z)
P(X,Y) :- R(X, a, Y), S(Y, b, c) T(X,Y,Y) :- P(X,Y)
S T
P (^) R
Winter 2003 16
S Q L w ith recursi on
- SQL:1999 syntax
- WITH RECURSIVE IDB(arguments) AS
<def of IDB (a query involving IDB)>
Select * from IDB
Winter 2003 19
S af e D atal og P rog ram s
- A rule is safe if
- every variable X in the rule occurs at least once in a positive goal predicate P in the body and P is safe (note that it does not count if a variable occurs in some arithmetic expression)
- a predicate P is safe if P is a relational predicate or every rule defining P is safe
- Except for the datalog programs in the previous slide, all datalog programs you have seen so far are safe
Unsafe Single(X) :- Married(Y,Z) Spouse(X,Y) :- Married(X,Z) Divorced(X) :- Married(X,Z), Z > 30 Age(Z) :- Employee(X,Y,A), Z > A
Safe Spouse(X,Y) :- Married(X,Y) Persons(X) :- Married(X,Y)
Winter 2003 20
S em an ti cs of a D atal og p rog ram
- To understand the meaning of a datalog program (with
no negation for now), we need to know
- The relationship between rules and logical sentences
- What is a model?
- What is the intended model?
Winter 2003 21
T h e rel ationsh ip between rul es and log ical
sentences
- R 1 (u 1 ) :- R 2 (u 2 ) , …, Rn(un) where ui are sequences of
variables
- ∀X 1 , …, Xm (R 2 (u 2 ), …, Rn(un) → R 1 (u 1 )) where Xi are
variables in the datalog rule
- Informally, the model of a datalog program P is an
instance over the edb and idb of P that satisfies the
logical sentences of P
Winter 2003 22
T h e rel ationsh ip between rul es and log ical
sentences
- Obviously, there can be many models of a datalog
program
S(a):-.
R(x) :- S(x)
- R(a), S(a) is a model. R(a), R(b), S(a) is also a model.
- There are infinitely many models for this program. Which
is the right one?
- The least model semantics: The semantics of the
datalog program is the least model; the least model of a
set of sentences is the smallest possible model
Winter 2003 25
E x am p le
WheelSubparts(Y) :- Assembly(wheel, Y, _)
WheelSubparts(Y) :- WheelSupbparts(X), Assembly(X,Y,_)
Tire Tube 1
Tire rim 1
Wheel Tire 1
Wheel Spoke 2
Frame Pedal 1
Frame Seat 1
Trike Frame 1
Trike Wheel 3
- Round 1:
- New facts inferred = WS(spoke), WS(tire)
- WS^1 = { WS(spoke), WS(tire) }
Winter 2003 26
E x am p le
- Round 2:
- New facts inferred – WS(rim), WS(tube)
- If WheelSupbparts(tire), Assembly(tire,rim,1), then WheelSubpart(rim)
- If WheelSubparts(tire), Assembly(tire, tube, 1), then WheelSubparts(tube)
- WS^2 = { WS(spoke), WS(tire), WS(rim), WS(tube) }
- Round 3:
- No new facts can be inferred!
- STOP
Note that WS^0 WS^1 WS^2 = WS^3
Winter 2003 27
Th e f ix p oi n t op erator
- WS^3 is a fixpoint for the WheelSubparts program
- A fixpoint v of a function f is such that f(v) = v
- Example:
- Double = doubles every element in an input set
- Double({1,2,3}) = {2,4,6}
- Define Double’(L) = L ∪ Double(L)
- The set of all integers is a fixpoint for Double’
- The set of all even integers is another fixpoint for
Double’
- The minimal model of a datalog program is the smallest
fixpoint
Winter 2003 28
R A to D atal og
- Selection σC(R):
- Projection π$2(R):
- Cross-Product R x S:
- Result(X,Y) :- R(X), S(Y)
- Set-Difference R – S:
- Union R ∪ S:
Result(X) :- S(X)
Winter 2003 31
P roduct an d J oi n
- The product of two relations R x S is expressed by a single rule with both relations as subgoals using distinct variables
- The head contains all the variables that occur in the subgoals
- Result(X,Y,Z,W) :- Students(X,Y), Enrolled(Z,W)
- Recall that given R(A,B) and S(B,C), R S = σR.B=S.B R x S
- So we can apply the rules we just learnt
- Result(X,Y,W) :- R(X,Y), S(Z,W), Y=Z, OR
- Result(X,Z,W) :- R(X,Y), S(Z,W), Y=Z
- Or, we could substitute out the equality above to “reuse” variables
- Result(X,Y,Z) :- R(X,Y), S(Y,Z)
Winter 2003 32
U n ion an d D if f eren ce
Winter 2003 33
C on v erti n g a com p lex R A ex p ressi on
in to D atal og
- How can we convert the following relational algebra expression into datalog? π name(σgrade<3.0 (Students^ Enrolled))
- For each internal node in the query tree, create the corresponding rule
- The IDB corresponding to root is the result.
- Tem1(S,N,Z,G) :- Student(S,N), Enrolled(S,Z,G)
- Tem2(S,N,Z,G) :- Tem1(S,N,Z,G), G<3.
- Ans(N) :- Tem2(S,N,Z,G)
Student Enrolled
σgrade<3.
πname
Winter 2003 34
D atal og to R A?
- We have seen that RA can be translated into Datalog
- Obviously, not every datalog program can be translated
into relational algebra
- A recursive datalog program can be translated into
RA if a fixpoint operator is added to RA
- In general, only safe non-recursive datalog programs
can be translated into relational algebra
Winter 2003 37
S um m ary
- RA/basic SQL/RC cannot compute all our desired queries; in particular, recursive queries cannot be expressed in RA/basic SQL/RC unless a fix point operator is added
- Datalog is a logical query language where recursive queries can be naturally expressed
- Semantics of datalog programs (no negation) is the minimal model semantics
- Naïve evaluation computes the minimal model