Introduction to Database Systems-Lecture 14 Slides-Computer Science, Slides of Introduction to Database Management Systems

XML, DTD, XPath, HTML to XML, Features of XML, Portability, Flexibility, Extensibility, XML Terminology, Well-formed XML Documents, Valid XML Documents, DTD, XML Versus Relational Data, Query Languages for XML, XPath, XQuery, XSLT, Tree Representation, XPath Constructs, Predicates in Path Expressions, XPath Operators and Functions, De-referencing IDREF’s, XPath Location Steps, Verbose Syntax, Contains(x, y), Count(Node-set), Position(), Last(), Name()

Typology: Slides

2011/2012

Uploaded on 01/29/2012

arold
arold šŸ‡ŗšŸ‡ø

4.7

(24)

372 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
XML, DTD, and XPath
CPS 116
Introduction to Database Systems
2
Announcements
īš™Midterm has been graded
Graded exams available in my office
Grades posted on Blackboard
Sample solution and score distribution emailed
3
From HTML to XML (eXtensible Markup Language)
īš™HTML describes the presentation of the content
<h1>Bibliography</h1>
<p><i>Foundations of Databases</i>
Abiteboul, Hull, and Vianu
<br>Addison Wesley, 1995
<p>…
īš™XML describes only the content
<bibliography>
<book>
<title>Foundations of Databases</title>
<author>Abiteboul</author>
<author>Hull</author>
<author>Vianu</author>
<publisher>Addison Wesley</publisher>
<year>1995</year>
</book>
<book>…</book>
</bibliography>
)Separation of content from presentation simplifies content e xtraction
and allows the same content to be presented easily in differ ent looks
4
Other nice features of XML
īš™Portability: Just like HTML, you can ship XML
data across platforms
Relational data requires heavy-weight protocols, e.g.,
JDBC
īš™Flexibility: You can represent any information
(structured, semi-structured, documents, …)
Relational data is best suited for structured data
īš™Extensibility: Since data describes itself, you can
change the schema easily
Relational schema is rigid and difficult to change
5
XML terminology
īš™Tag names: book, title, …
īš™Start tags: <book>, <title>, …
īš™End tags: </book>, </title>, …
īš™An element is enclosed by a pair of start and end
tags: <book>…</book>
Elements can be nested:
<book>…<title>…</title>…</book>
Empty elements: <is_textbook></is_textbook>
• Can be abbreviated: <is_textbook/>
īš™Elements can also have attributes: <book ISBN=ā€ā€¦ā€
price=ā€80.00ā€>
<bibliography>
<book ISBN=ā€ISBN-10ā€ price=ā€80.00ā€>
<title>Foundations of Databases</title>
<is_textbook/>
<author>Abiteboul</author>
<author>Hull</author>
<author>Vianu</author>
<publisher>Addison Wesley</publisher>
<year>1995</year>
</book>…
</bibliography>
6
Well-formed XML documents
A well-formed XML document
īš™Follows XML lexical conventions
Wrong: <section>We show that x < 0…</section>
Right: <section>We show that x &lt; 0…</section>
• Other special entities: >becomes &gt; and &becomes &amp;
īš™Contains a single root element
īš™Has tags that are properly matched and elements that are
properly nested
Right:
<section>…<subsection>…</subsection>…</section>
Wrong:
<section>…<subsection>…</section>…</subsection>
pf3
pf4
pf5

Partial preview of the text

Download Introduction to Database Systems-Lecture 14 Slides-Computer Science and more Slides Introduction to Database Management Systems in PDF only on Docsity!

XML, DTD, and XPath

CPS 116

Introduction to Database Systems

Announcements

Ā™ Midterm has been graded

ƒ Graded exams available in my office

ƒ Grades posted on Blackboard

ƒ Sample solution and score distribution emailed

3

From HTML to XML (eXtensible Markup Language)

Ā™ HTML describes the presentation of the content Bibliography

Foundations of Databases Abiteboul, Hull, and Vianu
Addison Wesley, 1995

… Ā™ XML describes only the content

Foundations of Databases Abiteboul Hull Vianu Addison Wesley 1995

…

) Separation of content from presentation simplifies content extraction and allows the same content to be presented easily in different looks

4

Other nice features of XML

Ā™ Portability: Just like HTML, you can ship XML

data across platforms

ƒ Relational data requires heavy-weight protocols, e.g.,

JDBC

Ā™ Flexibility: You can represent any information

(structured, semi-structured, documents, …)

ƒ Relational data is best suited for structured data

Ā™ Extensibility: Since data describes itself, you can

change the schema easily

ƒ Relational schema is rigid and difficult to change

5

XML terminology

Ā™ Tag names: book, title, …

Ā™ Start tags: , , …

Ā™ End tags: </book>, , …

Ā™ An element is enclosed by a pair of start and end

tags: …

ƒ Elements can be nested:

………

ƒ Empty elements:

  • Can be abbreviated:

Ā™ Elements can also have attributes:

Foundations of Databases AbiteboulHull VianuAddison Wesley …1995

6

Well-formed XML documents

A well-formed XML document

Ā™ Follows XML lexical conventions

ƒ Wrong: We show that x < 0… ƒ Right: We show that x < 0…

  • Other special entities: > becomes > and & becomes &

Ā™ Contains a single root element

Ā™ Has tags that are properly matched and elements that are

properly nested

ƒ Right: ……… ƒ Wrong: ………

## More XML features

Ā™ Comments:

Ā™ CDATA:

Ā™ ID’s and references

Homer… Marge… Bart……

Ā™ Namespaces allow external schemas and qualified names

… ……

Ā™ Processing instructions for apps:

Ā™ And more…

Valid XML documents

Ā™ A valid XML document conforms to a Document Type

Definition (DTD)

ƒ A DTD is optional

Ā™ A DTD specifies

ƒ A grammar for the document ƒ Constraints on structures and values of elements, attributes, etc.

Ā™ Example

]>

9

DTD explained

bibliography is the root element of the document

bibliography consists of a sequence of one or more book elements

One or more

Foundations of DatabasesAbiteboul HullVianu Addison Wesley1995 …

book consists of a title, zero or more authors, an optional publisher, and zero or more sections, in sequence

Zero or more

Zero or one

book has a required ISBN attribute which is a unique identifier

book has an optional (#IMPLIED) price attribute which contains character data Other attribute types include IDREF (reference to an ID), IDREFS (space-separated list of references), enumerated list, etc.

10

DTD explained (cont’d)

]>

title, author, publisher, and year all contain parsed character data (#PCDATA)

PCDATA is text that will be parsed (<…> will be treated as a markup tag and < etc. will be treated as entities); CDATA is unparsed character data

Each section starts with a title, followed by some optional text and then zero or more subsections

IntroductionIn this section we introduce XML and DTD… XMLXML stands for… DTD DefinitionDTD stands for… Usage You can use DTD to…

11

Using DTD

Ā™ DTD can be included in the XML source file

ƒ

… …

Ā™ DTD can be external

ƒ

… …

ƒ

… …

12

Why use DTD’s?

Ā™ Benefits of not using DTD

ƒ Unstructured data is easy to represent

ƒ Overhead of DTD validation is avoided

Ā™ Benefits of using DTD

ƒ DTD can serve as a schema for the XML data

  • Guards against errors
  • Helps with processing

ƒ DTD facilitates information exchange

  • People can agree to use a common DTD to exchange data (e.g., XHTML)

Simple XPath examples

Ā™ All book titles

/bibliography/book/title

Ā™ All book ISBN numbers

/bibliography/book/@ISBN

Ā™ All title elements, anywhere in the document

//title

Ā™ All section titles, anywhere in the document

//section/title

Ā™ Authors of bibliographical entries (suppose there are

articles, reports, etc. in addition to books)

/bibliography/*/author

Predicates in path expressions

[ condition ] matches the current element if condition evaluates

to true on the current element

Ā™ Books with price lower than $

/bibliography/book[@price<50] ƒ XPath will automatically convert the price string to a numeric value for comparison

Ā™ Books with author ā€œAbiteboulā€

/bibliography/book[author=ā€˜Abiteboul’]

Ā™ Books with a publisher child element

/bibliography/book[publisher]

Ā™ Prices of books authored by ā€œAbiteboulā€

/bibliography/book[author=ā€˜Abiteboul’]/@price

21

More complex predicates

Predicates can have and’s and or’s

Ā™ Books with price between $40 and $

/bibliography/book[40<=@price and @price<=50]

Ā™ Books authored by ā€œAbiteboulā€ or those with price

lower than $

/bibliography/book[author=ā€œAbiteboulā€ or

@price<50]

22

Predicates involving node-sets

/bibliography/book[author=ā€˜Abiteboul’]

Ā™ There may be multiple authors, so author in

general returns a node-set (in XPath terminology)

Ā™ The predicate evaluates to true as long as it

evaluates true for at least one node in the node-set,

i.e., at least one author is ā€œAbiteboulā€

Ā™ Tricky query

/bibliography/book[author=ā€˜Abiteboul’ and

author!=ā€˜Abiteboul’]

ƒ Will it return any books?

23

XPath operators and functions

Frequently used in conditions:

x + y , x – y , x * y , x div y , x mod y

contains( x , y ) true if string x contains string y

count( node-set ) counts the number nodes in node-set

position() returns the position of the current

node in the currently selected node-set

last() returns the size of the currently selected

node-set

name() returns the tag name of the current

element

24

More XPath examples

Ā™ All elements whose tag names contain ā€œsectionā€ (e.g.,

ā€œsubsectionā€)

//*[contains(name(), ā€˜section’)]

Ā™ Title of the first section in each book

/bibliography/book/section[position()=1]/title ƒ A shorthand: /bibliography/book/section[1]/title

Ā™ Title of the last section in each book

/bibliography/book/section[position()=last()]/title

Ā™ Books with fewer than 10 sections

/bibliography/book[count(section)<10]

Ā™ All elements whose parent’s tag name is not ā€œbookā€

//[name()!=ā€˜book’]/

A tricky example

Ā™ Suppose that price is a child element of book, and

there may be multiple prices per book

Ā™ Books with some price in range [20, 50]

ƒ How about:

/bibliography/book

[price >= 20 and price <= 50]

ƒ Correct answer:

/bibliography/book

[price[. >= 20 and. <= 50]]

De-referencing IDREF’s

id( identifier ) returns the element with the unique

identifier

Ā™ Suppose that books can make references to other

books

Introduction XML is a hot topic these days; see for more details…

Ā™ Find all references to books written by ā€œAbiteboulā€

in the book with ā€œISBN-10ā€

/bibliography/book[@ISBN=ā€˜ISBN-10’] //bookref[id(@ISBN)/author=ā€˜Abiteboul’]

27

General XPath location steps

Ā™ Technically, each XPath query consists of a series of

location steps separated by /

Ā™ Each location step consists of

ƒ An axis: one of self, attribute, parent, child, ancestor, ancestor-or-self, descendent, descendent-or-self, following, following-sibling, preceding, preceding- sibling, and namespace ƒ A node test: either a name test (e.g., book, section, *) or a type test (e.g., text(), node(), comment()), separated from the axis by :: ƒ Zero of more predicates (or conditions) enclosed in square brackets

28

Example of verbose syntax

Verbose (axis, node test, predicate):

/child::bibliography

/child::book[attribute::ISBN=ā€˜ISBN-10’]

/descendent-or-self::node()

/child::title

Abbreviated:

/bibliography/book[@ISBN=ā€˜ISBN-10’]//title

ƒ child is the default axis ƒ // stands for /descendent-or-self::node()/