XML Documents Trees - Distributed Software Develop | CS 682, Study notes of Software Engineering

Material Type: Notes; Class: Distributed Software Develop; Subject: Computer Science; University: University of San Francisco (CA); Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-tys
koofers-user-tys 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Software Development
XML
Chris Brooks
Department of Computer Science
University of San Francisco
Departmentof Computer Science University of San Francisco p. 1/??
Outline
About XML
Structuring XML documents
Using CSS to display XML
Parsing with DOM
Parsing with SAX
Departmentof Computer Science University of San Francisco p. 2/??
XML
XML is a language for describing data
Really more of a meta-language
XML itself provides metadata
Data types, relations between data objects, etc.
Designed to be read, created, and consumed by
programs.
Departmentof Computer Science University of San Francisco
Advantages of XML
Well-defined, easy-to-manipulate structure
Human-readable
Extensible
Metadata can be included directly with data
Widely used
Departmentof Computer Science University of San Francisco p. 4/??
Things to note
An XML document has two components:
tags (metadata)
content (data)
Metadata serves to help an application make sense
of the data.
Departmentof Computer Science University of San Francisco p. 5/??
Example
<?xml version="1.0"?>
<book>
<author> J.R.R. Tolkien </author>
<title> The Lord of the Rings </title>
<volumes>
<volume> Fellowship of The Ring </volume>
<volume> The Two Towers </volume>
<volume> Return of the King </volume>
</volumes>
<price> 14.95 </price>
<publisher> Ballantine </publisher>
<isbn> 0345340426 </isbn>
</book>
Departmentof Computer Science University of San Francisco
pf3
pf4
pf5

Partial preview of the text

Download XML Documents Trees - Distributed Software Develop | CS 682 and more Study notes Software Engineering in PDF only on Docsity!

Distributed Software Development

XML Chris Brooks

Department of Computer ScienceUniversity of San Francisco

Department of Computer Science — University of San Francisco – p. 1/

??

Outline

•^ About XML •^ Structuring XML documents •^ Using CSS to display XML •^ Parsing with DOM •^ Parsing with SAX

Department of Computer Science — University of San Francisco – p. 2/

??

X

•^ XML is a language for describing data

•^ Really more of a meta-language

•^ XML itself provides metadata

•^ Data types, relations between data objects, etc.

•^ Designed to be read, created, and consumed byprograms.

Department of Computer Science — University of San Fra

Advantages of XML

•^ Well-defined, easy-to-manipulate structure •^ Human-readable •^ Extensible •^ Metadata can be included directly with data •^ Widely used

Department of Computer Science — University of San Francisco – p. 4/

??

Things to note

•^ An XML document has two components:

•^ tags (metadata) •^ content (data)

•^ Metadata serves to help an application make senseof the data.

Department of Computer Science — University of San Francisco – p. 5/

??

Exam

version="1.0"?>

J.R.R.

Tolkien

</p> <p>The</p> <p>Lord</p> <p>of^</p> <p>the^</p> <p>Rings</p> <p>

Fellowship

of

The

Ring

The

Two

Towers

Return

of

the

King

Ballantine

Department of Computer Science — University of San Fra

XML documents as trees

•^ An XML document can also be represented as a tree. •^ This makes XML very easy to parse. •^ The outermost element is the root element, andelements contained within it are children of thatelement. •^ Content is stored at the leaves •^ What would the tree for our Tolkien example look like?

Department of Computer Science — University of San Francisco – p. 7/

??

Outline

•^ About XML •^ Stucturing XML documents •^ Using CSS to display XML •^ Parsing with DOM •^ Parsing with SAX

Department of Computer Science — University of San Francisco – p. 8/

??

Eleme

•^ XML requires that every starting tag have acorresponding closing tag. •^ Everything between a starting tag and a closing tag iscalled an

element

•^ For example, Return of The King is an element •^ So is everything between and •^ As is everything between and . •^ This means that elements must be nested.

Department of Computer Science — University of San Fra

Tags and elements

•^ Tags form the boundaries of elements, and giveprocessing instructions to parsers.

•^ Empty elements:

<coAuthor

All information is

contained in the tag. • Container elements:

•^ Comments:

here’s

a

comment

•^ Declaration:

<!ENTITY

jrrt

‘‘J.R.R.

Tolkien>

This provides a way to define

variables or constants in a single location. • Entity reference:

&jrrt

Department of Computer Science — University of San Francisco – p. 10/

??

Attributes and Values

•^ You can also specify that an element has

attributes

•^ These attributes can take on

values

•^ This is helpful when you want to specify that an objectbelongs to one of a few types.^ <book

genre="fantasy"

size="large">

Department of Computer Science — University of San Francisco – p. 11/

??

Attributes vs. Sub-eleme

•^ We could rewrite the example above usingsubelements instead of attributes. •^ When to use one over the other is largely stylistic.

•^ Can always transform one into the other

•^ If a feature can only take on one of a few values, anattribute might make more sense. •^ If we expect to extend the number of genres, asubelement is preferable. •^ Also, order is preserved for subelements

•^ Semantically, attribute/value pairs are treated as adictionary.

•^ So, a list of authors should be done as subelements

Department of Computer Science — University of San Fran

Entities

•^ We could then use our entity definitions later in thedocument by prepending a ’&’ to them^

the

Author

of^

The^

Lord

of

the

Rings

is

&jrrt;

he

invented

a^ grammar

and

semantics

for

Elvish,

which

can

be

found

at

&elvish-key;

Department of Computer Science — University of San Francisco – p. 19/

??

Outline

•^ About XML •^ Stucturing XML documents •^ Using CSS to display XML •^ Parsing with DOM •^ Parsing with SAX

Department of Computer Science — University of San Francisco – p. 20/

??

Using CSS to display X

•^ CSS can also be used to display XML documents. •^ Control is limited to laying out a complete XMLdocument. •^ If we want filtering or sorting, we’ll need to use XSLT.

Department of Computer Science — University of San Fran

An example

•^ Let’s say we have an XML-based CD database: •^ We can use CSS to display it in a web browser. •^ (see separate examples)

Department of Computer Science — University of San Francisco – p. 22/

??

Outline

•^ About XML •^ Stucturing XML documents •^ Validating XML with schema •^ Using CSS to display XML •^ Parsing with DOM •^ Parsing with SAX

Department of Computer Science — University of San Francisco – p. 23/

??

Parsing X

•^ XML also has the advantage of being easy forprograms to parse and construct. •^ There are two different approaches to parsing andmanipulating XML. •^ SAX: Simple API for XML

•^ Event-driven parser •^ User defines actions to take when an element isfound during parsing.

Department of Computer Science — University of San Fran

Parsing XML

•^ DOM: Document Object Model

•^ Tree parser: Entire document is instantiated inmemory as a tree. •^ Nice for random-access applications •^ Large documents may consume a large amount ofmemory

•^ Most languages provide support for both. We’ll startwith DOM.

Department of Computer Science — University of San Francisco – p. 25/

??

Libraries

-^ The DOM model is specified in a language independent way. •^ Implementations then follow this specification. -^ This means that they all work very similarly. -^ Java -^ javax.xml.parsers built into Java 1.5 •^ Apache’s Xerces parser provides support for both SAX and DOM.^ •

Xerces also has C++ and Perl implementations

-^ JDOM is also a popular tool for parsing and creating XML in Java. -^ Python -^ Built-in support for SAX, DOM, and minidom •^ ElementTree is a DOM-like parser. •^ 4suite provides third-party implementations

Department of Computer Science — University of San Francisco – p. 26/

??

Libra

-^ Perl -^ LibXML provides SAX and DOM functionality. -^ C# -^ .NET has built-in support for SAX and DOM -^ Ruby -^ The REXML library provides tree parsing, but not with the DOM interface.

Department of Computer Science — University of San Fran

Parsing a document in Python

•^ Example:^ from

xml.dom

import

minidom

doc=

minidom.parse(’library.xml’)

•^ Reads in and parses a document •^ creates a Document object. •^ toxml() show the XML version.

Department of Computer Science — University of San Francisco – p. 28/

??

Traversing the tree

•^ childNodes, firstChild, lastChild, parentNode •^ childNodes can have childNodes. •^ Leaves are text nodes,

•^ Respond to ’data’, which gives up the data theystore.

•^ This is useful if you need to process an entiredocument, but annoying if you’re searching.

Department of Computer Science — University of San Francisco – p. 29/

??

Finding specific eleme

•^ getElementsByTagName finds all elements accordingto name:^ eltlist

=^

doc.getElementsByTagName(’key’)

•^ Can search at any node

Department of Computer Science — University of San Fran

Parsing with SAX

•^ DOM is very convenient to use in many cases, but notall

•^ Document is too large to hold in memory •^ Document is malformed •^ Document is being produced (and should beconsumed) incrementally

•^ In these cases, a SAX parser may be moreappropriate.

Department of Computer Science — University of San Francisco – p. 37/

??

SAX: Simple API for XML

•^ SAX is an interface that was developed to provide anuniform way to integrate different XML parsers.

•^ Interesting contrast in origin to DOM. •^ SAX developed ’bottom-up’ by XML developers •^ DOM developed ’top-down’ by the W3C.

•^ SAX is an

event-driven parser

•^ You define an event handler that is passed to theparser. •^ Describes how to handle particular types ofelements. •^ Document is processed sequentially. State mustbe maintained by hand.

Department of Computer Science — University of San Francisco – p. 38/

??

Using SAX within Pyt

-^ (Note: Java looks very similar) •^ Most of the work involves creating

handlers

-^ For example, to deal with processing content, override the

content handler

import

xml.sax from

xml.sax.handler

import

class

CDHandler(ContentHandler)

def

init(self)

self.books

= [];

self.buffer

=^

self.inTitle

=^

False

def

startElement(self,

name,

attrs)

if^ name

’title’

self.inTitle

= True

def

endElement(self,

name)

if^ name

’title’

self.inTitle

= False

print

self.buffer self.buffer

=^ ’’

Department of Computer Science — University of San Fran

Using SAX within Python

•^ To use this, we then register the handler with a SAXparser.

parser

=^ xml.sax.make_parser() handler

=^ CDHandler() parser.setContentHandler(handler)parser.parse(’cdcat.xml’)

Department of Computer Science — University of San Francisco – p. 40/

??

SAX comments

•^ You must keep track of ’where you are’ yourself.

•^ No access to the enclosing context •^ It’s hard with SAX to, for example, print thecorresponding artist for each title node.

•^ SAX has more modest memory requirements thanDOM

•^ Nodes are discarded after parsing

•^ More flexible recovery from parsing errors. •^ Use the parser that best fits your needs.

Department of Computer Science — University of San Francisco – p. 41/

??