















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The challenges of information integration due to the existence of related data in various places with different models, schemas, and conventions. It introduces two approaches to addressing this problem: warehousing and mediation. The document also explains semistructured data, its representation using graphs, and the role of xml and xquery in information integration.
Typology: Slides
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















Plan
Information Integration
Problem: related data exists in many places. They talk ab out the same things, but di er in mo del, schema, conventions (e.g., terminology).
Example
In the real world, every bar has its own database.
Some may have relations like b eer-price; others have an MS-word le from which the menu is printed.
Some keep phones of manufacturers but not addresses.
Some distinguish b eers and ales; others do not.
Warehousing
Wrapp er Wrapp er
Combiner
Warehouse
user query
result
Mediation
Wrapp er Wrapp er
Mediator
query result query result
result
query query result
query result
Graph Representati on of Semistructured Data
No des = ob jects.
No des connected in a general ro oted graph structure.
Lab els on arcs.
Atomic values on leaf no des.
Big deal: no restriction on lab els (roughly = attributes).
F Zero, one, or many children of a given lab el typ e are all OK.
Example
M'lob 1995 Gold
Bud A.B.
prize
year award
name
manf manf
b eer
bar b eer
Jo e's Maple
name addr
servedAt
name
Well-Formed XML
F Normal declaration is <? XML VERSION = "1.0" STANDALONE = "yes" ?> F \Standalone" means that there is no DTD sp eci ed.
F Option of tags that don't require balance, like
in HTML.
Example
Elements of a DTD
An element is a name (its tag) and a parenthesized description of tags within an element.
F Sp ecial case: (#PCDATA) after an element name means it is text.
Example
]>Comp onents
Each element name is a tag.
Its comp onents are the tags that app ear nested within, in the order sp eci ed.
Multipli ci ty of a tag is controlled by:
a) * = zero or more of. b) + = one or more of. c)? = zero or one of.
In addition, | = \or."
Example of (a)
]>Example of (b)
Supp ose our bars DTD is in le bar.dtd.
ID's and IDREF's
These are p ointers from one ob ject to another, analogous to NAME = "foo" and HREF = "#foo" in HTML.
Allows the structure of an XML do cument to b e a general graph, rather than just a tree.
An attribute of typ e ID can b e used to give the ob ject (string b etween op ening and closing tags) a unique string identi er.
An attribute of typ e IDREF refers to some ob ject by its identi er. F Also IDREFS to allow multiple ob ject references within one tag.
Example
Let us include in our Bars do cument typ e elements that are the manufacturers of b eers, and have each b eer ob ject link, with an IDREF, to the prop er manufacturer ob ject.
]>