Bucket Algorithm for Query Expansion: Understanding Variable Mapping and Containment, Slides of Information Integration

The bucket algorithm for query expansion, which helps answer queries by trying all views and ensuring each query subgoal is covered. The algorithm distinguishes between distinguished and shared variables, and handles variable mapping and containment. It also discusses examples and solutions for handling distinguished variables and shared variables in query expansions.

Typology: Slides

2011/2012

Uploaded on 07/15/2012

saji
saji 🇮🇳

4.5

(2)

54 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
The Bucket Algorithm
We can \answer queries using views" by trying
all CQ's with no more (view) subgoals than the
query has subgoals.
However, a more organized exploration of the
possibilities, called the
bucket algorithm
looks
at how views can \cover" each of the query
subgoals and eachvariable of the query.
Basic idea due to Levy; improved to consider
the query variables as well as the subgoals by
Prasenjit Mitra (and independently by Levy
and Pottinger).
Limits on the Containment Mapping
From Query to Expansion
The key to the bucket algorithm is to understand
what can happen to variables in the query, when they
are mapped to variables in the expansion.
Each subgoal of the query must be
covered
by
some view in the solution. That is, the query
subgoal must map to some subgoal in some
expansion, or the expansion is not contained in
the query.
Wemust dierentiate between
distinguished
variables
(those that appear in the query head
or the head of a view denition) and
nondistinguished variables (others).
Wemust also dierentiate between
unique
nondistinguished variables (those that appear
only once in the query) and
shared
variables
(which appear more than once).
R1. A distinguished variable in the query must map
to a distinguished variable (of some view) in
the expansion.
R2. A shared variable
X
must either map to a
distinguished variable of a view expansion or all
query subgoals involving
X
must map to the
same nondistinguished variable in the
expansion of a single view.
{
The reason is that two view expansions
can never have the same nondistinguished
variable.
{
Even if the local variable name is the same
in the view denition, the expansions must
use dierent names.
docsity.com
pf3
pf4
pf5

Partial preview of the text

Download Bucket Algorithm for Query Expansion: Understanding Variable Mapping and Containment and more Slides Information Integration in PDF only on Docsity!

The Bucket Algorithm

 We can \answer queries using views" by trying all CQ's with no more (view) subgoals than the query has subgoals.

 However, a more organized exploration of the p ossibilities, called the bucket algorithm lo oks at how views can \cover" each of the query subgoals and each variable of the query.

 Basic idea due to Levy; improved to consider the query variables as well as the subgoals by Prasenjit Mitra (and indep endently by Levy and Pottinger).

Limits on the Containment Mapping

From Query to Expansion

The key to the bucket algorithm is to understand what can happ en to variables in the query, when they are mapp ed to variables in the expansion.

 Each subgoal of the query must b e covered by some view in the solution. That is, the query subgoal must map to some subgoal in some expansion, or the expansion is not contained in the query.

 We must di erentiate b etween distinguished variables (those that app ear in the query head or the head of a view de nition) and nondistinguished variables (others).

 We must also di erentiate b etween unique nondistinguished variables (those that app ear only once in the query) and shared variables (which app ear more than once).

R1. A distinguished variable in the query must map to a distinguished variable (of some view) in the expansion.

R2. A shared variable X must either map to a distinguished variable of a view expansion or all query subgoals involving X must map to the same nondistinguished variable in the expansion of a single view.

{ The reason is that two view expansions can never have the same nondistinguished variable. { Even if the lo cal variable name is the same in the view de nition, the expansions must use di erent names.

Examples

Figure 0.1 shows one consequence of rules R 1 and R2.

v(X,Y)

p(X,W) q(W,Z) r(Z,Y)

answer(U,V) :−... r(A,B)... s(B,C)

Y

v’(... Y.. .)

Query

Expansion

Solution

Figure 0.1: Covering the query subgoal r (A; B )

 A is a unique, nondistinguished variable. It can map to the nondistinguished variable Z in the expansion of view v.

 B is a shared, nondistinguished variable. One p ossible treatment, shown in Fig. 0.1, is that B maps to a distinguished variable, Y , and other o ccurrences of B , suggested by the subgoal s(B ; C ), are mapp ed so that o ccurrence of B maps to a distinguished variable of some other view.

{ Note that in designing the solution, we have the freedom to equate the variables of the views, and thus to use the same distinguished variable in two or more view expansions. { This capability makes it p ossible to map the o ccurrences of B to the same variable, coming from expansions of di erent views.

Figure 0.2 suggests what happ ens if a shared variable is forced to map to a nondistinguished variable in a view expansion. The s subgoal now has noplace to go, since there is no s subgoal in the expansion of v , and another expansion could not have Z as a variable.

v(X,Y)

p(X,W) q(W,Z) r(Z,Y)

Query answer(U,V) :−... q(A,B)... s(B,C)

Expansion

Solution

??

Figure 0.2: We cannot map B to lo cal variable Z

On the other hand, if the other subgoal(s) with B can also map to subgoals in the same expansion, then B can b e \covered." Figure 0.3 shows what happ ens when the s subgoal b ecomes and r subgoal.

  1. If B is a shared variable, then the bucket for B includes each view v and each set of subgoals in the expansion of v such that there is a containment mapping from all the query subgoals containing B to that subset of the subgoals in v 's expansion.

 The containment mapping must map distinguished (of the query) to distinguished (of the view de nition).

Example

Consider the following views v and w , and query:

v(X,Y) :- p(X,Z) & p(Z,Y) w(U,V) :- p(U,S) & p(S,T) & p(T,V)

query(A,B) :- p(A,C) & p(C,D) & p(D,E) & p(E,F) & p(F,G) & p(G,B)

Think of p as \parent," v as \grandparent," and w as \great-grandparent."

 The bucket for p(A; C ) is empty, b ecause A is distinguished and C is shared. But no view subgoal has two distinguished variables.

 The bucket for p(C ; D ) is empty b ecause b oth variables are shared, and again no view subgoal has two distinguished variables.

 Similarly, the buckets for the other query subgoals are empty.

 The bucket for shared variable C includes fp(X ; Z ); p(Z ; Y )g from v , and fp(U; S ); p(S; T )g from w.

{ It do es not include fp(S; T ); p(T ; V )g from w , b ecause p(A; C ) would have to map to p(S; T ), and A is distinguished, but S is not.

 The bucket for shared variable D includes fp(X ; Z ); p(Z ; Y )g from v , and b oth fp(U; S ); p(S; T )g and fp(S; T ); p(T ; V )g from w.

{ Note that we cannot at this p oint guarantee a consistent treatment of the variables C and E that are involved in the mapping of the subgoals with D ; a more careful analysis can check that the requirements of these shared variables can b e met in conjuction with D 's requirements.

We have to put together some views whose expansions cover all the subgoals.

  1. One p ossibility is to use expansions of v as the buckets for shared variables C , E , and G.

 The other shared variables fortunately cna map to distinguished variables of the expansions, so we can handle that sharing by equating the variables in the view subgoals of the solution.  Likewise, the distinguished variables of the query can map to distinguished variables of the expansions.  The resulting solution:

query(A,B) :- v(A,H) & v(H,I) & v(I,B)

  1. Another solution uses one expansion of w to cover the buckets for b oth C and D and a second expansion of w for F and G.

 Again, the sharing of E can b e handled by using the same variable in the two subgoals of the solution, and the distinguished variables of the query fortuitously map to distinguished variables of the view expansions.  The solution:

query(A,B) :- w(A,H) & w(H,B)

  1. Notice there is no covering with one use of v and one of w. However, we do it, one query subgoal needs to b e covered by a third view expansion, and there is no way to make b oth variables of the \lost" subgoal map to distinguished variables of that expansion.