









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
1.1 What is XML? XML stands for eXtensible Markup Language. It is a markup language much like HTML, but there are several differences between them:.
Typology: Summaries
1 / 17
This page cannot be seen from the preview
Don't miss anything!










CSc31800: Internet Programming, CS-CCNY, Spring 2004 Jinzhong Niu May 14, 2004
XML stands for eXtensible Markup Language. It is a markup language much like HTML, but there are several differences between them:
Tove Jani Reminder Don’t forget the party!
With XML, one can define whatever tags needed, which together compose a user-defined markup language similar to HTML, and then use the language to describe data. Specif- ically XML uses Document Type Definition (DTD) or an XML Schema to define tags. In this sense, XML is viewed as a meta-language since it can be used to define and de- scribe a markup language instead of concrete data directly. That is also why it is called extensible.
of the sender of the note. This enables the fulfillment of the task of finding all the notes written by a specific person. So XML was designed to describe data and to focus on what data is while HTML was designed to display data and to focus on how data looks. Actually XML and HTML can complement each other. For example, we use XML files to store data on a web server machine and when a request arrives, a servlet runs to retrieve data in XML, compose a HTML file, and finally output it to the client. This way you can concentrate on using HTML for data layout and display, and be sure that changes in the underlying data will not require any changes to your HTML. Besides XML’s role of storing data, with XML, data can be exchanged between incom- patible systems. In the real world, computer systems and databases contain data in incompatible formats. One of the most time-consuming challenges for developers has been to exchange data between such systems over the Internet. Converting the data to XML can greatly reduce this complexity and create data that can be read by many different types of applications, since XML data is stored in plain text format, which is software- and hardware-independent.
The best description of XML may be this: XML is a cross-platform, software and hardware independent tool for transmitting information. Since the creation of XML, it has been amazing to see how quickly the XML standard has been developed and how quickly a large number of software vendors have adopted the standard. It is strongly believed by the IT community at large that XML will be as important to the future of the Web as HTML has been to the foundation of the Web and that XML will be the most common tool for all data manipulation and data transmission.
The syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use. Because of this, creating software that can read and manipulate XML is very easy to do.
1.2.1 An Example XML Document
XML documents use a self-describing and simple syntax. For example, the following is a complete XML file presenting a note:
Tove
This text is bold and italic
In XML all elements must be properly nested within each other like this:
This text is bold and italic
All other elements must be within this root element. All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element:
.....
XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:
Tove Jani
Tove Jani
The error in the first document is that the date attribute in the note element is not quoted.
This is unlike HTML. With HTML, a sentence like this:
Hello my name is Tove,
will be displayed like this:
Hello my name is Tove,
because HTML strips off the white space.
However as you see, there is nothing special about XML. It is just plain text with the addition of some XML tags enclosed in angle brackets.
Software that can handle plain text can also handle XML. In a simple text editor, the XML tags will be visible and will not be handled specially.
In an XML-aware application, however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application.
XML elements define the framework of an XML document and the structure of data items.
1.3.1 XML Elements are Extensible
XML was invented to be extensible and XML documents can be extended to carry more information.
Take the above note as an example again. Let’s imagine that we created an application that extracted the , , and elements from the XML document to produce this output:
My First XML
Introduction to XML What is HTML What is XML
XML Syntax Elements must have a closing tag Elements must be properly nested
book is the root element. title, prod, and chapter are child elements of book. book is the parent element of title, prod, and chapter. title, prod, and chapter are siblings (or sister elements) because they have the same parent.
1.3.3 Elements have Content
Elements can have different content types.
An XML element is everything from (including) the element’s start tag to (including) the element’s end tag.
An element can have element content, mixed content, simple content, or empty content. An element can also have attributes.
In the example above, book has element content, because it contains other elements. chapter has mixed content because it contains both text and other elements. para has simple content (or text content) because it contains only text. prod has empty content, because it carries no information.
In the example above only the prod element has attributes. The attribute named id has the value “33-657”. The attribute named media has the value “paper”.
1.3.4 Element Naming
XML elements must follow these naming rules:
Take care when you “invent” element names and follow these simple rules:
Any name can be used, no words are reserved, but the idea is to make names descriptive. Names with an underscore separator are nice, for example and .
Avoid “-” and “.” in names. For example, if you name something “first-name,” it could be a mess if your software tries to subtract name from first. Or if you name something “first.name,” your software may think that “name” is a property of the object ”first.”
Element names can be as long as you like, but don’t exaggerate. Names should be short and simple, like this: not like this: .
XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.
Non-English letters like o¸˙c are perfectly legal in XML element names, but watch out for problems if your software vendor doesn’t support them.
The “:” should not be used in element names because it is reserved to be used for something called namespaces.
XML elements can have attributes in the start tag, just like HTML. Attributes are used to provide additional information about elements.
From HTML you will remember this: . The SRC attribute provides additional information about the IMG element.
In HTML (and in XML) attributes provide additional information about elements:
Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element:
computer.gif
As we said before, attribute values in XML must always be enclosed in quotes, but either single or double quotes can be used. Note that if the attribute value itself contains double quotes it is necessary to use single quotes and vice versa, like in this example:
Similar to HTML, XML with correct syntax is Well Formed XML. That is a well formed XML document is a document that conforms to the XML syntax rules that were described in the previous sections.
More specifically, to be well formed, an XML document must be validated against a Document Type Definition (DTD). The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. a DTD can be specified internally or externally. The following is an example of internal DTD for the above note example:
]>
Tove Jani Reminder Don’t forget me this weekend
The DTD above is interpreted like this: !DOCTYPE note (in line 2) defines that this is a document of the type note. !ELEMENT note (in line 3) defines the note element as having four elements: “to,from,heading,body”. !ELEMENT to (in line 4) defines the to element to be of the type “#PCDATA”. !ELEMENT from (in line 5) defines the from element to be of the type “#PCDATA” and so on ...
If the DTD is external to your XML source file, it should be wrapped in a DOCTYPE definition with the following syntax:
Tove Jani Reminder Don’t forget me this weekend!
And the following is a copy of the file “note.dtd” containing the DTD:
W3C supports an alternative to DTD called XML Schema. If interested, you may read more about XML Schema in related books.
The W3C XML specification states that a program should not continue to process an XML document if it finds a validation error. The reason is that XML software should be easy to write, and that all XML documents should be compatible.
With HTML it was possible to create documents with lots of errors (like when you forget an end tag). One of the main reasons that HTML browsers are so big and incompatible, is that they have their own ways to figure out what a document should look like when they encounter an HTML error.
With XML this should not be possible.
To view an XML document in IE 5.0 (and higher) you can click on a link, type the URL in the address bar, or double-click on the name of an XML file in a files folder. If you open an XML document in IE, it will display the document with color coded root and child elements. A plus (+) or minus sign (-) to the left of the elements can be clicked to expand or collapse the element structure. If you want to view the raw XML source, you must select ”View Source” from the browser menu.
To view an XML document in Netscape 6, you’ll have to open the XML file and then right-click in XML file and select “View Page Source”. If you open an XML document in Netscape 6, it will display the document with color coded root and child elements.
If an erroneous XML file is opened, the browser will report the error.
Since XML tags are “invented” by the author of the XML document, browsers do not know if a tag like describes an HTML table or a dining table.
Without any information about how to display the data, most browsers will just display the XML document as it is. In the next section, we will take a look at different solutions to the display problem, using CSS and XSL.
Before we have learned that CSS files may work together with HTML files in the way that the former is in charge of display and the latter provides concrete information. CSS can do the same thing with XML.
}
COUNTRY,PRICE,YEAR,COMPANY { display: block; color: #000000; margin-left: 20pt; }
where it is specified how to display each kind of elements.
Besides CSS, XSL was invented just for displaying XML. The eXtensible Stylesheet Language (XSL) is far more sophisticated than CSS.
XSL consists of three parts:
w3schools/xpath/default.asp
Look at the following simple XML document:
Empire Burlesque Bob Dylan 10.90
Hide your heart Bonnie Tyler 9.90
Greatest Hits Dolly Parton 9.90
The XPath expression below selects the ROOT element catalog:
/catalog
The XPath expression below selects all the cd elements of the catalog element:
/catalog/cd
The XPath expression below selects all the price elements of all the cd elements of the catalog element:
/catalog/cd/price
Note: If the path starts with a slash (‘/’) it represents an absolute path to an element! XPath also defines a library of standard functions for working with strings, numbers and Boolean expressions. The XPath expression below selects all the cd elements that have a price element with a value larger than 10.80:
/catalog/cd[price>10.80]
Think of XSL as set of languages that can transform XML into XHTML, filter and sort XML data, define parts of an XML document, format XML data based on the data value, like displaying negative numbers in red, and output XML data to different media, like screens, paper, or voice.
One way to use XSL is to transform XML into HTML before it is displayed by the browser. Be- low is a fraction of the XML file, with an added XSL reference. The second line, , links the XML file to the XSL file:
To use DOM, you need have a DOM-compliant parser. A list of such parsers are given at http://www.xml.com/pub/rg/Java_Parsers. Besides, the Java API for XML Processing (JAXP) is needed and available from http://java.sun.com/. This API provides a small layer on top of DOM that lets you plug in different vendor’s parsers without making any changes to your basic code.
The use of DOM goes as follows:
public static void main(String args[]) { String jaxPropertyName = "javax.xml.parsers.DocumentBuilderFactory"; if (System.getProperty(jaxPropertyName) == null) { String apacheXercesPropertyValue = "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl"; System.setProperty(jaxPropertyName, apacheXercesPropertyValue); } ... }
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document document = builder.parse(someInputStream);
document.getDocumentElement().normalize();
Element rootElement = document.getDocumentElement();
As you may image, you may simply replace the Property format configuration file for your own web server with an XML one, and use the above process to access the configuration information.