XML: Extensible Markup Language - Chapter 10 of CIS 560 at Kansas State University, Slides of Database Management Systems (DBMS)

A chapter from the database system concepts course at kansas state university, focusing on extensible markup language (xml). The structure of xml data, xml document schema, querying and transformation, application program interfaces to xml, storage of xml data, and xml applications. It also discusses the differences between xml schema and dtds, and provides examples of element specification in dtd and xml schema. The document also covers functions in xpath and more features of xml schema.

Typology: Slides

2011/2012

Uploaded on 01/31/2012

beatryx
beatryx 🇺🇸

4.6

(16)

289 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Computing & Information Sciences
Kansas State University
Wednesday, 18 Oct 2006CIS 560: Database System Concepts
Lecture 23 of 42
Wednesday, 18 October 2006
William H. Hsu
Department of Computing and Information Sciences, KSU
KSOL course page: http://snipurl.com/va60
Course web site: http://www.kddresearch.org/Courses/Fall-2006/CIS560
Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class:
First half of Chapter 10, Silberschatz et al., 5th edition
Extensible Markup Language (XML)
Discussion: Semistructured Data
Computing & Information Sciences
Kansas State University
Wednesday, 18 Oct 2006CIS 560: Database System Concepts
XML
XML
zStructure of XML Data
zXML Document Schema
zQuerying and Transformation
zApplication Program Interfaces to XML
zStorage of XML Data
zXML Applications
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download XML: Extensible Markup Language - Chapter 10 of CIS 560 at Kansas State University and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Lecture 23 of 42

Wednesday, 18 October 2006

William H. Hsu Department of Computing and Information Sciences, KSU

KSOL course page: http://snipurl.com/va Course web site: http://www.kddresearch.org/Courses/Fall-2006/CIS Instructor home page: http://www.cis.ksu.edu/~bhsu

Reading for Next Class: First half of Chapter 10, Silberschatz et al. , 5th^ edition

Extensible Markup Language (XML)

Discussion: Semistructured Data

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

XML XML

z Structure of XML Data z XML Document Schema z Querying and Transformation z Application Program Interfaces to XML z Storage of XML Data z XML Applications

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Introduction Introduction

z XML: Extensible Markup Language z Defined by the WWW Consortium (W3C) z Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML z Documents have tags giving extra information about sections of the document ’ E.g. XML Introduction … z Extensible , unlike HTML ’ Users can add new tags, and separately specify how the tag should be handled for display

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

XML Introduction (Cont.) XML Introduction (Cont.)

z The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data , not just documents. ’ Much of the use of XML has been in data exchange applications, not as a replacement for HTML z Tags make data (relatively) self-documenting ’ E.g. <account_number> A-101 </account_number> <branch_name> Downtown </branch_name> 500 <account_number> A-101 </account_number> <customer_name> Johnson </customer_name>

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Bank DTDBank DTD

]>

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Attribute Specification in DTDAttribute Specification in DTD

z Attribute specification : for each attribute ’ Name ’ Type of attribute Ö CDATA Ö ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) ‹ more on this later ’ Whether Ö mandatory (#REQUIRED) Ö has a default value (value), Ö or neither (#IMPLIED) z Examples ’ ’

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

IDs and IDREFs IDs and IDREFs

z An element can have at most one attribute of type ID z The ID attribute value of each element in an XML document must be distinct ’ Thus the ID attribute value is an object identifier z An attribute of type IDREF must contain the ID value of an element in the same document z An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Bank DTD with AttributesBank DTD with Attributes

z Bank DTD with ID and IDREF attribute types.

… declarations for branch, balance, customer_name, customer_street and customer_city ]>

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

XML SchemaXML Schema

z XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports ’ Typing of values Ö E.g. integer, string, etc Ö Also, constraints on min/max values ’ User-defined, comlex types ’ Many more features, including Ö uniqueness and foreign key constraints, inheritance z XML Schema is itself specified in XML syntax, unlike DTDs ’ More-standard representation, but verbose z XML Scheme is integrated with namespaces z BUT: XML Schema is significantly more complicated than DTDs.

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

XML Schema Version of Bank DTDXML Schema Version of Bank DTD

<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema> <xs:element name=“bank” type=“BankType”/> <xs:element name=“account”> <xs:complexType> <xs:sequence> <xs:element name=“account_number” type=“xs:string”/> <xs:element name=“branch_name” type=“xs:string”/> <xs:element name=“balance” type=“xs:decimal”/> </xs:squence> </xs:complexType> </xs:element> ….. definitions of customer and depositor …. <xs:complexType name=“BankType”> <xs:squence> <xs:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xs:sequence> </xs:complexType> </xs:schema>

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

XML Schema Version of Bank DTDXML Schema Version of Bank DTD

z Choice of “xs:” was ours -- any other namespace prefix could be chosen z Element “bank” has type “BankType”, which is defined separately ’ xs:complexType is used later to create the named complex type “BankType” z Element “account” has its type defined in-line

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

More features of XML Schema More features of XML Schema

z Attributes specified by xs:attribute tag: ’ <xs:attribute name = “account_number”/> ’ adding the attribute use = “required” means value must be specified z Key constraint: “account numbers form a key for account elements under the root bank element: <xs:key name = “accountKey”> <xs:selector xpath = “]bank/account”/> <xs:field xpath = “account_number”/> <\xs:key> z Foreign key constraint from depositor to account: <xs:keyref name = “depositorAccountKey” refer=“accountKey”> <xs:selector xpath = “]bank/account”/> <xs:field xpath = “account_number”/> <\xs:keyref>

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

XPath XPath

z XPath is used to address (select) parts of documents using path expressions z A path expression is a sequence of steps separated by “/” ’ Think of file names in a directory hierarchy z Result of path expression: set of values that along with their containing elements/attributes match the specified path z E.g. /bank-2/customer/customer_name evaluated on the bank-2 data we saw earlier returns <customer_name>Joe</customer_name> <customer_name>Mary</customer_name> z E.g. /bank-2/customer/customer_name/text( ) returns the same names, but without the enclosing tags

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

XPath (Cont.)XPath (Cont.)

z The initial “/” denotes root of the document (above the top-level tag) z Path expressions are evaluated left to right ’ Each step operates on the set of instances produced by the previous step z Selection predicates may follow any step in a path, in [ ] ’ E.g. /bank-2/account[balance > 400] Ö returns account elements with a balance value greater than 400 Ö /bank-2/account[balance] returns account elements containing a balance subelement z Attributes are accessed using “@” ’ E.g. /bank-2/account[balance > 400]/@account_number Ö returns the account numbers of accounts with balance > 400 ’ IDREF attributes are not dereferenced automatically (more on this later)

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Functions in XPathFunctions in XPath

z XPath provides several functions ’ The function count() at the end of a path counts the number of elements in the set generated by the path Ö E.g. /bank-2/account[count(./customer) > 2] ‹ Returns accounts with > 2 customers ’ Also function for testing position (1, 2, ..) of node w.r.t. siblings z Boolean connectives and and or and function not() can be used in predicates z IDREFs can be referenced using function id() ’ id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks ’ E.g. /bank-2/account/id(@owner) Ö returns all customers referred to from the owners attribute of account elements.

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

More XPath Features More XPath Features

z Operator “|” used to implement union ’ E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower) Ö Gives customers with either accounts or loans Ö However, “|” cannot be nested inside other operators. z “//” can be used to skip multiple levels of nodes ’ E.g. /bank-2//customer_name Ö finds any customer_name element anywhere under the /bank- element, regardless of the element in which it is contained. z A step in the path can go to parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children ’ “//”, described above, is a short from for specifying “all descendants” ’ “..” specifies the parent. z doc(name) returns the root of a named document

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

JoinsJoins

z Joins are specified in a manner very similar to SQL for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor where $a/account_number = $d/account_number and $c/customer_name = $d/customer_name return <cust_acct> { $c $a } </cust_acct> z The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account_number = $a/account_number and

customer_name = $c/customer_name ]

return <cust_acct> { $c $a } </cust_acct>

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Nested QueriesNested Queries

z The following query converts data from the flat structure for bank information into the nested structure used in bank- <bank-1> { for $c in /bank/customer return { $c/* } { for $d in /bank/depositor[customer_name = $c/customer_name], $a in /bank/account[account_number=$d/account_number] return $a } } </bank-1> z $c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag z $c/text() gives text content of an element without any subelements / tags

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Sorting in XQuerySorting in XQuery

z The order by clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer order by $c/customer_name return { $c/* } z Use order by $c/customer_name to sort in descending order z Can sort at multiple levels of nesting (sort by customer_name, and by account_number within each customer) <bank-1> { for $c in /bank/customer order by $c/customer_name return { $c/* } { for $d in /bank/depositor[customer_name=$c/customer_name], $a in /bank/account[account_number=$d/account_number] } order by $a/account_number return $a/*

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Functions and Other XQuery FeaturesFunctions and Other XQuery Features

z User defined functions with the type system of XMLSchema function balances(xs:string $c) returns list(xs:decimal) { for $d in /bank/depositor[customer_name = $c], $a in /bank/account[account_number = $d/account_number] return $a/balance } z Types are optional for function parameters and return values z The * (as in decimal) indicates a sequence of values of that type z Universal and existential quantification in where clause predicates ’ some $e in path satisfies P ’ every $e in path satisfies P z XQuery also supports If-then-else clauses

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Creating XML Output Creating XML Output

z Any text or tag in the XSL stylesheet that is not in the xsl namespace is output as is z E.g. to wrap results in new XML elements. <xsl:template match=“/bank-2/customer”> <xsl:value-of select=“customer_name”/> </xsl;template> <xsl:template match=“*”/> ’ Example output: Joe Mary

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Creating XML Output (Cont.)Creating XML Output (Cont.)

z Note: Cannot directly insert a xsl:value-of tag inside another tag ’ E.g. cannot create an attribute for in the previous example by directly using xsl:value-of ’ XSLT provides a construct xsl:attribute to handle this situation Ö xsl:attribute adds attribute to the preceding element Ö E.g. <xsl:attribute name=“customer_id”> <xsl:value-of select = “customer_id”/> </xsl:attribute> results in output of the form …. z xsl:element is used to create output elements with computed names

Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University

Structural Recursion Structural Recursion

„ Template action can apply templates recursively to the contents of a matched element <xsl:template match=“/bank”> <xsl:template apply-templates/> </xsl:template> <xsl:template match=“/customer”> <xsl:value-of select=“customer_name”/> </xsl:template> <xsl:template match=“*”/> „ Example output: John Mary