









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A chapter from the database system concepts course at kansas state university, focusing on extensible markup language (xml). The structure of xml data, xml document schema, querying and transformation, application program interfaces to xml, storage of xml data, and xml applications. It also discusses the differences between xml schema and dtds, and provides examples of element specification in dtd and xml schema. The document also covers functions in xpath and more features of xml schema.
Typology: Slides
1 / 17
This page cannot be seen from the preview
Don't miss anything!










Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
Wednesday, 18 October 2006
William H. Hsu Department of Computing and Information Sciences, KSU
KSOL course page: http://snipurl.com/va Course web site: http://www.kddresearch.org/Courses/Fall-2006/CIS Instructor home page: http://www.cis.ksu.edu/~bhsu
Reading for Next Class: First half of Chapter 10, Silberschatz et al. , 5th^ edition
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Structure of XML Data z XML Document Schema z Querying and Transformation z Application Program Interfaces to XML z Storage of XML Data z XML Applications
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z XML: Extensible Markup Language z Defined by the WWW Consortium (W3C) z Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML z Documents have tags giving extra information about sections of the document E.g.
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data , not just documents. Much of the use of XML has been in data exchange applications, not as a replacement for HTML z Tags make data (relatively) self-documenting E.g.
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Attribute specification : for each attribute Name Type of attribute Ö CDATA Ö ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) more on this later Whether Ö mandatory (#REQUIRED) Ö has a default value (value), Ö or neither (#IMPLIED) z Examples
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z An element can have at most one attribute of type ID z The ID attribute value of each element in an XML document must be distinct Thus the ID attribute value is an object identifier z An attribute of type IDREF must contain the ID value of an element in the same document z An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Bank DTD with ID and IDREF attribute types.
… declarations for branch, balance, customer_name, customer_street and customer_city ]>Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports Typing of values Ö E.g. integer, string, etc Ö Also, constraints on min/max values User-defined, comlex types Many more features, including Ö uniqueness and foreign key constraints, inheritance z XML Schema is itself specified in XML syntax, unlike DTDs More-standard representation, but verbose z XML Scheme is integrated with namespaces z BUT: XML Schema is significantly more complicated than DTDs.
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
<xs:schema xmlns:xs=http://www.w3.org/2001/XMLSchema> <xs:element name=“bank” type=“BankType”/> <xs:element name=“account”> <xs:complexType> <xs:sequence> <xs:element name=“account_number” type=“xs:string”/> <xs:element name=“branch_name” type=“xs:string”/> <xs:element name=“balance” type=“xs:decimal”/> </xs:squence> </xs:complexType> </xs:element> ….. definitions of customer and depositor …. <xs:complexType name=“BankType”> <xs:squence> <xs:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/> <xs:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/> </xs:sequence> </xs:complexType> </xs:schema>
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Choice of “xs:” was ours -- any other namespace prefix could be chosen z Element “bank” has type “BankType”, which is defined separately xs:complexType is used later to create the named complex type “BankType” z Element “account” has its type defined in-line
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Attributes specified by xs:attribute tag: <xs:attribute name = “account_number”/> adding the attribute use = “required” means value must be specified z Key constraint: “account numbers form a key for account elements under the root bank element: <xs:key name = “accountKey”> <xs:selector xpath = “]bank/account”/> <xs:field xpath = “account_number”/> <\xs:key> z Foreign key constraint from depositor to account: <xs:keyref name = “depositorAccountKey” refer=“accountKey”> <xs:selector xpath = “]bank/account”/> <xs:field xpath = “account_number”/> <\xs:keyref>
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z XPath is used to address (select) parts of documents using path expressions z A path expression is a sequence of steps separated by “/” Think of file names in a directory hierarchy z Result of path expression: set of values that along with their containing elements/attributes match the specified path z E.g. /bank-2/customer/customer_name evaluated on the bank-2 data we saw earlier returns <customer_name>Joe</customer_name> <customer_name>Mary</customer_name> z E.g. /bank-2/customer/customer_name/text( ) returns the same names, but without the enclosing tags
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z The initial “/” denotes root of the document (above the top-level tag) z Path expressions are evaluated left to right Each step operates on the set of instances produced by the previous step z Selection predicates may follow any step in a path, in [ ] E.g. /bank-2/account[balance > 400] Ö returns account elements with a balance value greater than 400 Ö /bank-2/account[balance] returns account elements containing a balance subelement z Attributes are accessed using “@” E.g. /bank-2/account[balance > 400]/@account_number Ö returns the account numbers of accounts with balance > 400 IDREF attributes are not dereferenced automatically (more on this later)
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z XPath provides several functions The function count() at the end of a path counts the number of elements in the set generated by the path Ö E.g. /bank-2/account[count(./customer) > 2] Returns accounts with > 2 customers Also function for testing position (1, 2, ..) of node w.r.t. siblings z Boolean connectives and and or and function not() can be used in predicates z IDREFs can be referenced using function id() id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks E.g. /bank-2/account/id(@owner) Ö returns all customers referred to from the owners attribute of account elements.
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Operator “|” used to implement union E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower) Ö Gives customers with either accounts or loans Ö However, “|” cannot be nested inside other operators. z “//” can be used to skip multiple levels of nodes E.g. /bank-2//customer_name Ö finds any customer_name element anywhere under the /bank- element, regardless of the element in which it is contained. z A step in the path can go to parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children “//”, described above, is a short from for specifying “all descendants” “..” specifies the parent. z doc(name) returns the root of a named document
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Joins are specified in a manner very similar to SQL for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor where $a/account_number = $d/account_number and $c/customer_name = $d/customer_name return <cust_acct> { $c $a } </cust_acct> z The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account_number = $a/account_number and
return <cust_acct> { $c $a } </cust_acct>
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z The following query converts data from the flat structure for bank information into the nested structure used in bank- <bank-1> { for $c in /bank/customer return
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z The order by clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer order by $c/customer_name return
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z User defined functions with the type system of XMLSchema function balances(xs:string $c) returns list(xs:decimal) { for $d in /bank/depositor[customer_name = $c], $a in /bank/account[account_number = $d/account_number] return $a/balance } z Types are optional for function parameters and return values z The * (as in decimal) indicates a sequence of values of that type z Universal and existential quantification in where clause predicates some $e in path satisfies P every $e in path satisfies P z XQuery also supports If-then-else clauses
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Any text or tag in the XSL stylesheet that is not in the xsl namespace is output as is z E.g. to wrap results in new XML elements. <xsl:template match=“/bank-2/customer”>
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
z Note: Cannot directly insert a xsl:value-of tag inside another tag E.g. cannot create an attribute for
Computing & Information Sciences CIS 560: Database System Concepts Wednesday, 18 Oct 2006 Kansas State University
Template action can apply templates recursively to the contents of a matched element <xsl:template match=“/bank”>