Semistructured Data and XML: Information Integration Approaches, Slides of Database Management Systems (DBMS)

The challenges of information integration due to the existence of related data in various places with different models, schemas, and conventions. It introduces two approaches to addressing this problem: warehousing and mediation. The document also explains semistructured data, its representation using graphs, and the role of xml and xquery in information integration.

Typology: Slides

2011/2012

Uploaded on 01/31/2012

marphy
marphy 🇺🇸

4.4

(31)

284 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Plan
1. Information integration: important new
application that motivates what follows.
2. Semistructured data: a new data model
designed to cope with problems of information
integration.
3. XML: a new Web standard that is essentially
semistructured data.
4. XQUERY: an emerging standard query
language for XML data.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Semistructured Data and XML: Information Integration Approaches and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Plan

  1. Information integration: imp ortant new application that motivates what follows.
  2. Semistructured data: a new data mo del designed to cop e with problems of information integration.
  3. XML: a new Web standard that is essentiall y semistructured data.
  4. XQUERY: an emerging standard query language for XML data.

Information Integration

Problem: related data exists in many places. They talk ab out the same things, but di er in mo del, schema, conventions (e.g., terminology).

Example

In the real world, every bar has its own database.

 Some may have relations like b eer-price; others have an MS-word le from which the menu is printed.

 Some keep phones of manufacturers but not addresses.

 Some distinguish b eers and ales; others do not.

Warehousing

Wrapp er Wrapp er

Combiner

DB1 DB

Warehouse

user query

result

Mediation

Wrapp er Wrapp er

DB1 DB

Mediator

query result query result

result

query query result

query result

Graph Representati on of Semistructured Data

 No des = ob jects.

 No des connected in a general ro oted graph structure.

 Lab els on arcs.

 Atomic values on leaf no des.

 Big deal: no restriction on lab els (roughly = attributes).

F Zero, one, or many children of a given lab el typ e are all OK.

Example

M'lob 1995 Gold

Bud A.B.

prize

year award

name

manf manf

b eer

bar b eer

Jo e's Maple

name addr

servedAt

name

Well-Formed XML

  1. Declaration = .

F Normal declaration is <? XML VERSION = "1.0" STANDALONE = "yes" ?> F \Standalone" means that there is no DTD sp eci ed.

  1. Root tag surrounds the entire balance of the do cument. F is balanced by , as in HTML.
  2. Any balanced structure of tags OK.

F Option of tags that don't require balance, like

in HTML.

Example

Joe's Bar Bud< /NAME > 2.50< /BEER > Mill er 3.00< /BEER > ...

Elements of a DTD

An element is a name (its tag) and a parenthesized description of tags within an element.

F Sp ecial case: (#PCDATA) after an element name means it is text.

Example

]>

Comp onents

 Each element name is a tag.

 Its comp onents are the tags that app ear nested within, in the order sp eci ed.

 Multipli ci ty of a tag is controlled by:

a) * = zero or more of. b) + = one or more of. c)? = zero or one of.

 In addition, | = \or."

Example of (a)

]> Joe's Bar Bud< /NAME > 2.50< /BEER > Mill er 3.00< /BEER > ...

Example of (b)

Supp ose our bars DTD is in le bar.dtd.

Joe's Bar Bud< /NAME > 2.50< /BEER > Mill er 3.00< /BEER > ...

ID's and IDREF's

These are p ointers from one ob ject to another, analogous to NAME = "foo" and HREF = "#foo" in HTML.

 Allows the structure of an XML do cument to b e a general graph, rather than just a tree.

 An attribute of typ e ID can b e used to give the ob ject (string b etween op ening and closing tags) a unique string identi er.

 An attribute of typ e IDREF refers to some ob ject by its identi er. F Also IDREFS to allow multiple ob ject references within one tag.

Example

Let us include in our Bars do cument typ e elements that are the manufacturers of b eers, and have each b eer ob ject link, with an IDREF, to the prop er manufacturer ob ject.

]>