Download Introduction to Database Systems-Lecture 14 Slides-Computer Science and more Slides Introduction to Database Management Systems in PDF only on Docsity!
XML, DTD, and XPath
CPS 116
Introduction to Database Systems
Announcements
Ā Midterm has been graded
Ā Graded exams available in my office
Ā Grades posted on Blackboard
Ā Sample solution and score distribution emailed
3
From HTML to XML (eXtensible Markup Language)
Ā HTML describes the presentation of the content Bibliography
Foundations of Databases Abiteboul, Hull, and Vianu
Addison Wesley, 1995
⦠ XML describes only the content
Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995
ā¦
) Separation of content from presentation simplifies content extraction and allows the same content to be presented easily in different looks
4
Other nice features of XML
Ā Portability: Just like HTML, you can ship XML
data across platforms
Ā Relational data requires heavy-weight protocols, e.g.,
JDBC
Ā Flexibility: You can represent any information
(structured, semi-structured, documents, ā¦)
Ā Relational data is best suited for structured data
Ā Extensibility: Since data describes itself, you can
change the schema easily
Ā Relational schema is rigid and difficult to change
5
XML terminology
Ā Tag names: book, title, ā¦
Ā Start tags: , , ā¦
Ā End tags: </book>, , ā¦
Ā An element is enclosed by a pair of start and end
tags: ā¦
Ā Elements can be nested:
ā¦ā¦ā¦
Ā Empty elements:
Ā Elements can also have attributes:
Foundations of Databases AbiteboulHull VianuAddison Wesley ā¦1995
6
Well-formed XML documents
A well-formed XML document
Ā Follows XML lexical conventions
Ā Wrong: We show that x < 0⦠ Right: We show that x < 0ā¦
- Other special entities: > becomes > and & becomes &
Ā Contains a single root element
Ā Has tags that are properly matched and elements that are
properly nested
Ā Right: ā¦ā¦ā¦ Ā Wrong: ā¦ā¦ā¦
## More XML features
Ā Comments:
Ā CDATA:
Ā IDās and references
Homer⦠Marge⦠Bartā¦ā¦
Ā Namespaces allow external schemas and qualified names
⦠ā¦ā¦
Ā Processing instructions for apps:
Ā And moreā¦
Valid XML documents
Ā A valid XML document conforms to a Document Type
Definition (DTD)
Ā A DTD is optional
Ā A DTD specifies
Ā A grammar for the document Ā Constraints on structures and values of elements, attributes, etc.
Ā Example
]>
9
DTD explained
bibliography is the root element of the document
bibliography consists of a sequence of one or more book elements
One or more
Foundations of DatabasesAbiteboul HullVianu Addison Wesley1995 ā¦
book consists of a title, zero or more authors, an optional publisher, and zero or more sections, in sequence
Zero or more
Zero or one
book has a required ISBN attribute which is a unique identifier
book has an optional (#IMPLIED) price attribute which contains character data Other attribute types include IDREF (reference to an ID), IDREFS (space-separated list of references), enumerated list, etc.
10
DTD explained (contād)
]>
title, author, publisher, and year all contain parsed character data (#PCDATA)
PCDATA is text that will be parsed (<ā¦> will be treated as a markup tag and < etc. will be treated as entities); CDATA is unparsed character data
Each section starts with a title, followed by some optional text and then zero or more subsections
IntroductionIn this section we introduce XML and DTD⦠XMLXML stands for⦠DTD DefinitionDTD stands for⦠Usage You can use DTD toā¦
11
Using DTD
Ā DTD can be included in the XML source file
Ā
⦠ā¦
Ā DTD can be external
Ā
⦠ā¦
Ā
⦠ā¦
12
Why use DTDās?
Ā Benefits of not using DTD
Ā Unstructured data is easy to represent
Ā Overhead of DTD validation is avoided
Ā Benefits of using DTD
Ā DTD can serve as a schema for the XML data
- Guards against errors
- Helps with processing
Ā DTD facilitates information exchange
- People can agree to use a common DTD to exchange data (e.g., XHTML)
Simple XPath examples
Ā All book titles
/bibliography/book/title
Ā All book ISBN numbers
/bibliography/book/@ISBN
Ā All title elements, anywhere in the document
//title
Ā All section titles, anywhere in the document
//section/title
Ā Authors of bibliographical entries (suppose there are
articles, reports, etc. in addition to books)
/bibliography/*/author
Predicates in path expressions
[ condition ] matches the current element if condition evaluates
to true on the current element
Ā Books with price lower than $
/bibliography/book[@price<50] Ā XPath will automatically convert the price string to a numeric value for comparison
Ā Books with author āAbiteboulā
/bibliography/book[author=āAbiteboulā]
Ā Books with a publisher child element
/bibliography/book[publisher]
Ā Prices of books authored by āAbiteboulā
/bibliography/book[author=āAbiteboulā]/@price
21
More complex predicates
Predicates can have andās and orās
Ā Books with price between $40 and $
/bibliography/book[40<=@price and @price<=50]
Ā Books authored by āAbiteboulā or those with price
lower than $
/bibliography/book[author=āAbiteboulā or
@price<50]
22
Predicates involving node-sets
/bibliography/book[author=āAbiteboulā]
Ā There may be multiple authors, so author in
general returns a node-set (in XPath terminology)
Ā The predicate evaluates to true as long as it
evaluates true for at least one node in the node-set,
i.e., at least one author is āAbiteboulā
Ā Tricky query
/bibliography/book[author=āAbiteboulā and
author!=āAbiteboulā]
Ā Will it return any books?
23
XPath operators and functions
Frequently used in conditions:
x + y , x ā y , x * y , x div y , x mod y
contains( x , y ) true if string x contains string y
count( node-set ) counts the number nodes in node-set
position() returns the position of the current
node in the currently selected node-set
last() returns the size of the currently selected
node-set
name() returns the tag name of the current
element
24
More XPath examples
Ā All elements whose tag names contain āsectionā (e.g.,
āsubsectionā)
//*[contains(name(), āsectionā)]
Ā Title of the first section in each book
/bibliography/book/section[position()=1]/title Ā A shorthand: /bibliography/book/section[1]/title
Ā Title of the last section in each book
/bibliography/book/section[position()=last()]/title
Ā Books with fewer than 10 sections
/bibliography/book[count(section)<10]
Ā All elements whose parentās tag name is not ābookā
//[name()!=ābookā]/
A tricky example
Ā Suppose that price is a child element of book, and
there may be multiple prices per book
Ā Books with some price in range [20, 50]
Ā How about:
/bibliography/book
[price >= 20 and price <= 50]
Ā Correct answer:
/bibliography/book
[price[. >= 20 and. <= 50]]
De-referencing IDREFās
id( identifier ) returns the element with the unique
identifier
Ā Suppose that books can make references to other
books
Introduction XML is a hot topic these days; see for more detailsā¦
Ā Find all references to books written by āAbiteboulā
in the book with āISBN-10ā
/bibliography/book[@ISBN=āISBN-10ā] //bookref[id(@ISBN)/author=āAbiteboulā]
27
General XPath location steps
Ā Technically, each XPath query consists of a series of
location steps separated by /
Ā Each location step consists of
Ā An axis: one of self, attribute, parent, child, ancestor, ancestor-or-self, descendent, descendent-or-self, following, following-sibling, preceding, preceding- sibling, and namespace Ā A node test: either a name test (e.g., book, section, *) or a type test (e.g., text(), node(), comment()), separated from the axis by :: Ā Zero of more predicates (or conditions) enclosed in square brackets
28
Example of verbose syntax
Verbose (axis, node test, predicate):
/child::bibliography
/child::book[attribute::ISBN=āISBN-10ā]
/descendent-or-self::node()
/child::title
Abbreviated:
/bibliography/book[@ISBN=āISBN-10ā]//title
Ā child is the default axis Ā // stands for /descendent-or-self::node()/