Understanding Recursive Queries in Datalog and SQL, Study Guides, Projects, Research of Database Management Systems (DBMS)

The limitations of using relational algebra (ra), sql, and recursive computation (rc) for recursive queries and introduces datalog as a solution. Datalog is a logic-based query language that allows for the natural expression of recursive queries. The semantics of datalog programs and provides examples of datalog rules and sql queries with recursion.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/19/2009

koofers-user-19d
koofers-user-19d 🇺🇸

10 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Winter 2003 1
Today’s Lecture
Useful queries that RA/RC/basic SQL cannot express?
Datalog: a logical query language
Recursion in Datalog
Semantics and computation
Winter 2003 2
Some queries cannot be written in basic
SQ L
Consider the following
relation schema
Assembly(Part, Subpart,
qty)
Query: Find all the
subparts of part “Wheel”
1TubeTire
1rimTire
1TireWheel
2SpokeWheel
1PedalFrame
1SeatFrame
1FrameTrike
3WheelTrike
QtySubpartPart
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Understanding Recursive Queries in Datalog and SQL and more Study Guides, Projects, Research Database Management Systems (DBMS) in PDF only on Docsity!

Winter 2003 1

Today’s Lecture

  • Useful queries that RA/RC/basic SQL cannot express?
  • Datalog: a logical query language
  • Recursion in Datalog
  • Semantics and computation

Winter 2003 2

Some queries cannot be written in basic

SQ L

  • Consider the following

relation schema

Assembly(Part, Subpart,

qty)

  • Query: Find all the

subparts of part “Wheel”

Tire Tube 1

Tire rim 1

Wheel Tire 1

Wheel Spoke 2

Frame Pedal 1

Frame Seat 1

Trike Frame 1

Trike Wheel 3

Part Subpart Qty

Winter 2003 3

Tree v iew of A ssem b ly rel ati on

trike

wheel frame

spoke tire

rim

seat pedal

tube

  • First, find all subparts of “wheel” - a select, followed by project - Answer: spoke, tire
  • Then, find all subparts of the results above - join result above with Assembly relation - Answer: rim, tube
  • Take the union of answers above - spoke, tire, rim, tube
  • But what if there are subparts beneath rim or tube?

Winter 2003 4

Tree v iew of A ssem b ly rel ati on

trike

wheel frame

spoke tire

rim

seat pedal

tube

  • The above query works only for this given instance of relation
  • More precisely, it works if wheel has at most subparts of depth 2
  • However, we cannot run the same query in another instance of the relation where wheel has subparts of depth 3 or 4 or … It is impossible to write a query, (whether in RA, RC, or basic SQL), that will find all subparts of the part wheel for every possible instance of the Assembly relation

Winter 2003 7

D atal og : U sing log ic as a query lang uag e

  • The general idea behind Datalog is to use Horn-clauses -- “if-then” rules -- as a query language for relational databases.
  • Predicates correspond to relations
    • The predicate Assembly(X, Y, Z) is true iff (X, Y, Z) is a tuple in the Assembly relation
  • Note the positional interpretation to arguments
    • X, Y, Z represents values in the 1st, 2nd, and 3rd^ columns of the Assembly relation respectively
  • Ground predicate: all arguments are constants
    • Assembly(wheel, spoke, 2)
  • Convention: variables as represented as uppercase letters, constants are represented in lowercase

Winter 2003 8

D atal og rul es

  • A rule has the form p :- q
    • “:-” to be read as “if”
    • This means if q is true then p is true
    • Subparts(X, Y) :- Assembly(X, Y, Z), Assembly(X,U,V)
  • Body consists of a logical AND or zero or more subgoals
  • A datalog program is a collection of rules

Head (^) Body Subgoal

Winter 2003 9

E x tensional v s. I ntensional P red icates

  • EDB – extensional database consists of predicates

which are defined in relations

Assembly(trike, wheel, 3)

Assembly(trike, frame, 1)

Assembly(frame, wheel, 1)

  • IDB – intensional database consists of predicates that

are defined through some datalog rules.

Winter 2003 10

E x am p le

  • Suppose the Assembly relation on the right is defined in the EDB
  • Find all immediate subparts of “frame”
  • FrameSubparts(X) :- Assembly(Frame, X, Y)

Answer:

FrameSubparts(Seat)

FrameSubparts(Pedal)

Tire Tube 1

Tire rim 1

Wheel Tire 1

Wheel Spoke 2

Frame Pedal 1

Frame Seat 1

Trike Frame 1

Trike Wheel 3

Part Subpart Qty

Winter 2003 13

E x am p le

  • Output so far: WheelSubparts(spoke), WheelSubparts(tire)
  • Consider the second rule:
    • If WheelSupbparts(tire), Assembly(tire,rim,1), then WheelSubpart(rim)
    • If WheelSupbparts(tire), Assembly(tire,tube,1), then WheelSubpart(tube)
  • Output so far: WheelSubparts(spoke), WheelSubparts(tire), WheelSubpart(rim), WheelSubpart(tube)
  • Consider the second rule again:
    • No new predicates can be inferred!
    • Therefore the result is { WheelSubparts(spoke), WheelSubparts(tire), WheelSubpart(rim), WheelSubpart(tube) }

Winter 2003 14

W h en is a datal og p rog ram recursi v e?

  • The datalog program, WheelSubparts, is recursive
  • Construct a dependency graph from the datalog program
  • Example: P :- S1, …, Sn
  • The program is recursive if the graph contains a cycle

P S

Sn

P depends on Si

Winter 2003 15

E x am p les

WheelSubparts(Y) :- Assembly(wheel, Y, _)

WheelSubparts(Y) :- WheelSupbparts(X), Assembly(X,Y,_)

WheelSubparts Assembly

S(X,Y,Z) :- T(X,Y,Z)

P(X,Y) :- R(X, a, Y), S(Y, b, c)

P(X, Z):- R(X, Z, Y)

S T

P (^) R

S(X,Y,Z) :- T(X,Y,Z)

P(X,Y) :- R(X, a, Y), S(Y, b, c) T(X,Y,Y) :- P(X,Y)

S T

P (^) R

Winter 2003 16

S Q L w ith recursi on

  • SQL:1999 syntax
  • WITH RECURSIVE IDB(arguments) AS

<def of IDB (a query involving IDB)>

Select * from IDB

Winter 2003 19

S af e D atal og P rog ram s

  • A rule is safe if
    • every variable X in the rule occurs at least once in a positive goal predicate P in the body and P is safe (note that it does not count if a variable occurs in some arithmetic expression)
    • a predicate P is safe if P is a relational predicate or every rule defining P is safe
  • Except for the datalog programs in the previous slide, all datalog programs you have seen so far are safe

Unsafe Single(X) :- Married(Y,Z) Spouse(X,Y) :- Married(X,Z) Divorced(X) :- Married(X,Z), Z > 30 Age(Z) :- Employee(X,Y,A), Z > A

Safe Spouse(X,Y) :- Married(X,Y) Persons(X) :- Married(X,Y)

Winter 2003 20

S em an ti cs of a D atal og p rog ram

  • To understand the meaning of a datalog program (with

no negation for now), we need to know

  • The relationship between rules and logical sentences
  • What is a model?
  • What is the intended model?

Winter 2003 21

T h e rel ationsh ip between rul es and log ical

sentences

  • R 1 (u 1 ) :- R 2 (u 2 ) , …, Rn(un) where ui are sequences of

variables

  • ∀X 1 , …, Xm (R 2 (u 2 ), …, Rn(un) → R 1 (u 1 )) where Xi are

variables in the datalog rule

  • Informally, the model of a datalog program P is an

instance over the edb and idb of P that satisfies the

logical sentences of P

Winter 2003 22

T h e rel ationsh ip between rul es and log ical

sentences

  • Obviously, there can be many models of a datalog

program

S(a):-.

R(x) :- S(x)

  • R(a), S(a) is a model. R(a), R(b), S(a) is also a model.
  • There are infinitely many models for this program. Which

is the right one?

  • The least model semantics: The semantics of the

datalog program is the least model; the least model of a

set of sentences is the smallest possible model

Winter 2003 25

E x am p le

WheelSubparts(Y) :- Assembly(wheel, Y, _)

WheelSubparts(Y) :- WheelSupbparts(X), Assembly(X,Y,_)

Tire Tube 1

Tire rim 1

Wheel Tire 1

Wheel Spoke 2

Frame Pedal 1

Frame Seat 1

Trike Frame 1

Trike Wheel 3

  • WS Part Subpart Qty
  • Round 1:
    • New facts inferred = WS(spoke), WS(tire)
    • WS^1 = { WS(spoke), WS(tire) }

Winter 2003 26

E x am p le

  • Round 2:
    • New facts inferred – WS(rim), WS(tube)
      • If WheelSupbparts(tire), Assembly(tire,rim,1), then WheelSubpart(rim)
      • If WheelSubparts(tire), Assembly(tire, tube, 1), then WheelSubparts(tube)
    • WS^2 = { WS(spoke), WS(tire), WS(rim), WS(tube) }
  • Round 3:
    • No new facts can be inferred!
    • STOP

Note that WS^0  WS^1  WS^2 = WS^3

Winter 2003 27

Th e f ix p oi n t op erator

  • WS^3 is a fixpoint for the WheelSubparts program
  • A fixpoint v of a function f is such that f(v) = v
  • Example:
    • Double = doubles every element in an input set
    • Double({1,2,3}) = {2,4,6}
    • Define Double’(L) = L ∪ Double(L)
    • The set of all integers is a fixpoint for Double’
    • The set of all even integers is another fixpoint for

Double’

  • The minimal model of a datalog program is the smallest

fixpoint

Winter 2003 28

R A to D atal og

  • Selection σC(R):
    • Result(X) :- R(X), C
  • Projection π$2(R):
    • Result(Y) :- R(X,Y)
  • Cross-Product R x S:
    • Result(X,Y) :- R(X), S(Y)
  • Set-Difference R – S:
    • Result(X) :- R(X), ¬S(X)
  • Union R ∪ S:
    • Result(X) :- R(X)

Result(X) :- S(X)

Winter 2003 31

P roduct an d J oi n

  • The product of two relations R x S is expressed by a single rule with both relations as subgoals using distinct variables
  • The head contains all the variables that occur in the subgoals
    • Result(X,Y,Z,W) :- Students(X,Y), Enrolled(Z,W)
  • Recall that given R(A,B) and S(B,C), R  S = σR.B=S.B R x S
  • So we can apply the rules we just learnt
    • Result(X,Y,W) :- R(X,Y), S(Z,W), Y=Z, OR
    • Result(X,Z,W) :- R(X,Y), S(Z,W), Y=Z
  • Or, we could substitute out the equality above to “reuse” variables
    • Result(X,Y,Z) :- R(X,Y), S(Y,Z)

Winter 2003 32

U n ion an d D if f eren ce

  • In class

Winter 2003 33

C on v erti n g a com p lex R A ex p ressi on

in to D atal og

  • How can we convert the following relational algebra expression into datalog?  π name(σgrade<3.0 (Students^  Enrolled))
  • For each internal node in the query tree, create the corresponding rule
  • The IDB corresponding to root is the result.
  • Tem1(S,N,Z,G) :- Student(S,N), Enrolled(S,Z,G)
  • Tem2(S,N,Z,G) :- Tem1(S,N,Z,G), G<3.
  • Ans(N) :- Tem2(S,N,Z,G)

Student Enrolled



σgrade<3.

πname

Winter 2003 34

D atal og to R A?

  • We have seen that RA can be translated into Datalog
  • Obviously, not every datalog program can be translated

into relational algebra

  • A recursive datalog program can be translated into

RA if a fixpoint operator is added to RA

  • In general, only safe non-recursive datalog programs

can be translated into relational algebra

Winter 2003 37

S um m ary

  • RA/basic SQL/RC cannot compute all our desired queries; in particular, recursive queries cannot be expressed in RA/basic SQL/RC unless a fix point operator is added
  • Datalog is a logical query language where recursive queries can be naturally expressed
  • Semantics of datalog programs (no negation) is the minimal model semantics
  • Naïve evaluation computes the minimal model