XML and DTDs: Understanding Document Type Definitions for Structuring XML Documents, Slides of Computer Science

An introduction to document type definitions (dtds) in the context of xml documents. Dtds describe the structure of xml documents by defining elements, attributes, and entities. Xml documents must be well-structured and valid, with the latter requiring a dtd specification. The role of dtds, parsing xml documents, and examples of dtd syntax for various types of elements. It also touches upon names and namespaces, and the relationship between dtds and xml schemas.

Typology: Slides

2012/2013

Uploaded on 03/19/2013

dharamnishth
dharamnishth 🇮🇳

2.5

(2)

50 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
DTDs
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download XML and DTDs: Understanding Document Type Definitions for Structuring XML Documents and more Slides Computer Science in PDF only on Docsity!

DTDs

XML and DTDs

  • A DTD (Document Type Definition) describes the structure of

one or more XML documents. Specifically, a DTD describes:

  • Elements
  • Attributes, and
  • Entities
  • (We will discuss each of these in turn)
  • An XML document is well-structured if it follows certain

simple syntactic rules

  • An XML document is valid if it also specifies and conforms to a

DTD

Parsers

• An XML parser is an API that reads the content

of an XML document

– Currently popular APIs are DOM (Document

Object Model) and SAX (Simple API for XML)

• A validating parser is an XML parser that

compares the XML document to a DTD and

reports any errors

– Most browsers don’t use validating parsers

An XML example

This is the great American novel.</ paragraph> It was a dark and stormy night. Suddenly, a shot rang out!

  • An XML document contains (and the DTD describes):
    • Elements, such as novel and paragraph, consisting of tags and content
    • Attributes, such as number="1", consisting of a name and a value
    • Entities (not used in this example)

ELEMENT descriptions

• Suffixes:

? optional foreword?

+ one or more chapter+

* zero or more appendix*

• Separators

, both, in order foreword?,

chapter+

| or section|chapter

• Grouping

( ) grouping (section|chapter)+Docsity.com

Elements without children

• The syntax is

– The name is the element name used in start and

end tags

– The category may be EMPTY:

  • In the DTD:
  • In the XML:

    or just

– In the XML, an empty element may not have any

content between the start tag and the end tag

– An empty element may (and usually does) have

attributes

Elements with children

  • A category may describe one or more children:
  • Parentheses are required, even if there is only one child
  • A space must precede the opening parenthesis
  • Commas (,) between elements mean that all children must appear, and must be in the order specified
  • “|” separators means any one child may be used
  • All child elements must themselves be declared
  • Children may have children
  • Parentheses can be used for grouping:

Elements with mixed content

• #PCDATA describes elements with only

character data

• #PCDATA can be used in an “or” grouping:

– This is called mixed content

– Certain (rather severe) restrictions apply:

  • #PCDATA must be first
  • The separators must be “|”
  • The group must be starred (meaning zero or more)

An expanded DTD example

  • -

<!ELEMENT novel

(foreword, chapter+, biography?, criticalEssay*)>

]>

Attributes and entities

  • In addition to elements, a DTD may declare attributes and

entities

  • This slide shows examples; we will discuss each in detail
  • An attribute describes information that can be put within the

start tag of an element

  • In XML:
  • In DTD:
  • An entity describes text to be substituted
  • In XML: &copyright; In the DTD:

Important attribute types

• There are ten attribute types

• These are the most important ones:

– CDATA The value is character data

– (man|woman|child) The value is one

from this list

– ID The value is a unique

identifier

  • ID values must be legal XML names and must be

unique within the document

– NMTOKEN The value is a legal XML

name Docsity.com

Less important attribute types

• IDREF The ID of another element

• IDREFS A list of other IDs

• NMTOKENS A list of valid XML names

• ENTITY An entity

• ENTITIES A list of entities

• NOTATION A notation

• xml: A predefined XML value

Entities

  • There are exactly five predefined entities: <, >, &, ", and '
  • Additional entities can be defined in the DTD:
  • Entities can be defined in another document:

  • Example of use in the XML:

This document is &copyright; 2002.

  • Entities are a way to include fixed text (sometimes called “boilerplate”)
  • Entities should not be confused with character references, which are

numerical values between & and

  • Example: &233#; or &xE9#; to indicate the character é

Another example: XML

05/29/2002 Philadelphia, PA USA 84 51