Advanced Database Systems-Lecture 14 Slides-Computer Science, Slides of Database Management Systems (DBMS)

XML, DTD, XPath, From HTML to XML, Portability, Flexibility, Extensibility, XML Terminology, Well-formed XML Documents, XML Features, Valid XML documents, DTD Explained, XML Versus Relational Data, Query Languages for XML, XPath, XQuery, XSLT, XSLT, Predicates in Path Expressions, De-referencing IDREF’s, General XPath Location Steps, XPath Operators and Functions

Typology: Slides

2011/2012

Uploaded on 01/28/2012

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
XML, DTD, and XPath
CPS 216
Advanced Database Systems
2
From HTML to XML (eXtensible Markup Language)
HTML describes the presentation of the content
<h1>Bibliography</h1>
<p><i>Foundations of Databases</i>
Abiteboul, Hull, and Vianu
<br>Addison Wesley, 1995
<p>…
XML describes only the content
<bibliography>
<book>
<title>Foundations of Databases</title>
<author>Abiteboul</author>
<author>Hull</author>
<author>Vianu</author>
<publisher>Addison Wesley</publisher>
<year>1995</year>
</book>
<book>…</book>
</bibliography>
)Separation of content from presentation simplifies content e xtraction
and allows the same content to be presented easily in differ ent looks
3
Other nice features of XML
Portability: Just like HTML, you can ship XML
data across platforms
Relational data requires heavy-weight protocols, e.g.,
JDBC
Flexibility: You can represent any information
(structured, semi-structured, documents, …)
Relational data is best suited for structured data
Extensibility: Since data describes itself, you can
change the schema easily
Relational schema is rigid and difficult to change
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Advanced Database Systems-Lecture 14 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

XML, DTD, and XPath

CPS 216

Advanced Database Systems

2

From HTML to XML (eXtensible Markup Language)

™ HTML describes the presentation of the content Bibliography

Foundations of Databases Abiteboul, Hull, and Vianu
Addison Wesley, 1995

… ™ XML describes only the content

Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995

) Separation of content from presentation simplifies content extraction and allows the same content to be presented easily in different looks

3

Other nice features of XML

™ Portability: Just like HTML, you can ship XML

data across platforms

ƒ Relational data requires heavy-weight protocols, e.g.,

JDBC

™ Flexibility: You can represent any information

(structured, semi-structured, documents, …)

ƒ Relational data is best suited for structured data

™ Extensibility: Since data describes itself, you can

change the schema easily

ƒ Relational schema is rigid and difficult to change

## XML terminology

™ Tag names: book, title, …

™ Start tags: , , …

™ End tags: </book>, , …

™ An element is enclosed by a pair of start and end

tags: …

ƒ Elements can be nested:

………

ƒ Empty elements:

  • Can be abbreviated:

™ Elements can also have attributes:

Foundations of Databases

Abiteboul Hull Vianu Addison Wesley 1995 …

5

Well-formed XML documents

A well-formed XML document

™ Follows XML lexical conventions

ƒ Wrong: We show that x < 0… ƒ Right: We show that x < 0…

  • Other special entities: > becomes > and & becomes &

™ Contains a single root element

™ Has tags that are properly matched and elements that are

properly nested

ƒ Right: ……… ƒ Wrong: ………

6

More XML features

™ Comments:

™ CDATA:

™ ID’s and references

Homer… Marge… Bart……

™ Namespaces allow external schemas and qualified names

… ……

™ Processing instructions for apps:

™ And more…

Using DTD

™ DTD can be included in the XML source file

ƒ

… …

™ DTD can be external

ƒ

… …

ƒ

… …

11

Why use DTD’s?

™ Benefits of using DTD

ƒ DTD can serve as a schema for the XML data

  • Guards against errors
  • Helps with processing

ƒ DTD facilitates information exchange

  • People can agree to use a common DTD to exchange data (e.g., XHTML)

™ Benefits of not using DTD

ƒ Unstructured data is easy to represent

ƒ Overhead of DTD validation is avoided

12

XML versus relational data

Relational data

™ Schema is always fixed in advance and difficult to change

™ Simple, flat table structures

™ Ordering of rows and columns is unimportant

™ Data exchange is problematic

™ “Native” support in all serious commercial DBMS

XML data ™

™

™

™

Which one is more intuitive? Which one is easier to implement?

Query languages for XML

™ XPath

ƒ Path expressions with conditions

)Building block of other standards (XQuery, XSLT,

XPointer, etc.)

™ XQuery

ƒ XPath + full-fledged SQL-like query language

™ XSLT

ƒ XPath + transformation templates

14

Example DTD and XML

]>

Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 ……

15

A tree representation

bibliography

title author author author publisher year section

book book

Foundations of Databases

Abiteboul Hull Vianu Addison Wesley

1995

title section section …

Introduction

… …

In this section we introduce …

Predicates in path expressions

[ condition ] matches the current element if condition evaluates

to true on the current element

™ Books with price lower than $

/bibliography/book[@price<50] ƒ XPath will automatically convert the price string to a numeric value for comparison

™ Books with author “Abiteboul”

/bibliography/book[author=‘Abiteboul’]

™ Books with a publisher child element

/bibliography/book[publisher]

™ Prices of books authored by “Abiteboul”

/bibliography/book[author=‘Abiteboul’]/@price

Note: “<” must be escaped if this expression appears in an XML document

20

More complex predicates

Predicates can have and’s and or’s

™ Books with price between $40 and $

/bibliography/book[40<=@price and @price<=50]

™ Books authored by “Abiteboul” or those with price

lower than $

/bibliography/book[author=“Abiteboul” or

@price<50]

21

Predicates involving node-sets

/bibliography/book[author=‘Abiteboul’]

™ There may be multiple authors, so author in

general returns a node-set (in XPath terminology)

™ The predicate evaluates to true as long as it

evaluates true for at least one node in the node-set,

i.e., at least one author is “Abiteboul”

™ Tricky query

/bibliography/book[author=‘Abiteboul’ and

author!=‘Abiteboul’]

ƒ Will it return any books?

XPath operators and functions

Frequently used in conditions:

x + y , x – y , x * y , x div y , x mod y

contains( x , y ) true if string x contains string y

count( node-set ) counts the number nodes in node-set

position() returns the position of the current

node in the currently selected node-set

last() returns the size of the currently selected

node-set

name() returns the tag name of the current

element

23

More XPath examples

™ All elements whose tag names contain “section” (e.g.,

“subsection”)

//*[contains(name(), ‘section’)]

™ Title of the first section in each book

/bibliography/book/section[position()=1]/title ƒ A shorthand: /bibliography/book/section[1]/title

™ Title of the last section in each book

/bibliography/book/section[position()=last()]/title

™ Books with fewer than 10 sections

/bibliography/book[count(section)<10]

™ All elements whose parent’s tag name is not “book”

//[name()!=‘book’]/

24

A tricky example

™ Suppose that price is a child element of book, and

there may be multiple prices per book

™ Books with some price in range [20, 50]

ƒ How about:

/bibliography/book

[price >= 20 and price <= 50]