Download Advanced Database Systems-Lecture 14 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!
XML, DTD, and XPath
CPS 216
Advanced Database Systems
2
From HTML to XML (eXtensible Markup Language)
HTML describes the presentation of the content Bibliography
Foundations of Databases Abiteboul, Hull, and Vianu
Addison Wesley, 1995
… XML describes only the content
Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995
…
) Separation of content from presentation simplifies content extraction and allows the same content to be presented easily in different looks
3
Other nice features of XML
Portability: Just like HTML, you can ship XML
data across platforms
Relational data requires heavy-weight protocols, e.g.,
JDBC
Flexibility: You can represent any information
(structured, semi-structured, documents, …)
Relational data is best suited for structured data
Extensibility: Since data describes itself, you can
change the schema easily
Relational schema is rigid and difficult to change
## XML terminology
Tag names: book, title, …
Start tags: , , …
End tags: </book>, , …
An element is enclosed by a pair of start and end
tags: …
Elements can be nested:
………
Empty elements:
Elements can also have attributes:
Foundations of Databases
Abiteboul Hull Vianu Addison Wesley 1995 …
5
Well-formed XML documents
A well-formed XML document
Follows XML lexical conventions
Wrong: We show that x < 0… Right: We show that x < 0…
- Other special entities: > becomes > and & becomes &
Contains a single root element
Has tags that are properly matched and elements that are
properly nested
Right: ……… Wrong: ………
6
More XML features
Comments:
CDATA:
ID’s and references
Homer… Marge… Bart……
Namespaces allow external schemas and qualified names
… ……
Processing instructions for apps:
And more…
Using DTD
DTD can be included in the XML source file
… …
DTD can be external
… …
… …
11
Why use DTD’s?
Benefits of using DTD
DTD can serve as a schema for the XML data
- Guards against errors
- Helps with processing
DTD facilitates information exchange
- People can agree to use a common DTD to exchange data (e.g., XHTML)
Benefits of not using DTD
Unstructured data is easy to represent
Overhead of DTD validation is avoided
12
XML versus relational data
Relational data
Schema is always fixed in advance and difficult to change
Simple, flat table structures
Ordering of rows and columns is unimportant
Data exchange is problematic
“Native” support in all serious commercial DBMS
XML data
Which one is more intuitive? Which one is easier to implement?
Query languages for XML
XPath
Path expressions with conditions
)Building block of other standards (XQuery, XSLT,
XPointer, etc.)
XQuery
XPath + full-fledged SQL-like query language
XSLT
XPath + transformation templates
14
Example DTD and XML
]>
Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995 ……
…
15
A tree representation
bibliography
title author author author publisher year section
book book
Foundations of Databases
Abiteboul Hull Vianu Addison Wesley
1995
title section section …
Introduction
… …
In this section we introduce …
Predicates in path expressions
[ condition ] matches the current element if condition evaluates
to true on the current element
Books with price lower than $
/bibliography/book[@price<50] XPath will automatically convert the price string to a numeric value for comparison
Books with author “Abiteboul”
/bibliography/book[author=‘Abiteboul’]
Books with a publisher child element
/bibliography/book[publisher]
Prices of books authored by “Abiteboul”
/bibliography/book[author=‘Abiteboul’]/@price
Note: “<” must be escaped if this expression appears in an XML document
20
More complex predicates
Predicates can have and’s and or’s
Books with price between $40 and $
/bibliography/book[40<=@price and @price<=50]
Books authored by “Abiteboul” or those with price
lower than $
/bibliography/book[author=“Abiteboul” or
@price<50]
21
Predicates involving node-sets
/bibliography/book[author=‘Abiteboul’]
There may be multiple authors, so author in
general returns a node-set (in XPath terminology)
The predicate evaluates to true as long as it
evaluates true for at least one node in the node-set,
i.e., at least one author is “Abiteboul”
Tricky query
/bibliography/book[author=‘Abiteboul’ and
author!=‘Abiteboul’]
Will it return any books?
XPath operators and functions
Frequently used in conditions:
x + y , x – y , x * y , x div y , x mod y
contains( x , y ) true if string x contains string y
count( node-set ) counts the number nodes in node-set
position() returns the position of the current
node in the currently selected node-set
last() returns the size of the currently selected
node-set
name() returns the tag name of the current
element
23
More XPath examples
All elements whose tag names contain “section” (e.g.,
“subsection”)
//*[contains(name(), ‘section’)]
Title of the first section in each book
/bibliography/book/section[position()=1]/title A shorthand: /bibliography/book/section[1]/title
Title of the last section in each book
/bibliography/book/section[position()=last()]/title
Books with fewer than 10 sections
/bibliography/book[count(section)<10]
All elements whose parent’s tag name is not “book”
//[name()!=‘book’]/
24
A tricky example
Suppose that price is a child element of book, and
there may be multiple prices per book
Books with some price in range [20, 50]
How about:
/bibliography/book
[price >= 20 and price <= 50]