Introduction to HTML and XHTML, Study notes of Web Programming and Technologies

An introduction to HTML and XHTML, covering traditional HTML and XHTML, markup elements, viewing markup locally and with a web server, version history of HTML and XHTML, and document type statements and language versions. It explains how markup instructions found within a web page depend on the structure of the document to the browser software. It also covers the use of markup elements, empty elements, and attributes that modify the meaning of the tag. a brief summary of the version history of HTML and XHTML and explains the importance of document type statements and language versions.

Typology: Study notes

2021/2022

Available from 11/13/2023

chummi-chummi
chummi-chummi 🇮🇳

1 document

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
First Year Engineering, Sub Name: POP Code: BPLCK205A Module No.: 1
1
Module-1: Traditional HTML and XHTML:
First Look at HTML and XHTML
In the case of HTML, markup instructions found within a Web page depend on the
structure of the document to the browser software.
For example, if you want to emphasize a portion of text, you enclose it within the tags
<em> and </em>, as shown here:
<em>This is important text!</em>.
When a Web browser reads a document that has HTML markup in it, it determines how
to render it onscreen by considering the HTML elements embedded within the
document:
So, an HTML document is simply a text file that contains the information we want to
publish and the appropriate markup instructions indicating how the browser should
structure or present the document.
Markup elements are made up of a start tag, such as <strong>, and end tag, which is
indicated by a slash within the tag, such as </strong>.
The tag pair should fully enclose any content to be affected by the element, including
text and other HTML markup.
Under traditional HTML (not XHTML), the close tag for some elements is optional
because their closure can be inferred. For example, a <p> tag cannot enclose another
<p> tag, and thus the closing tag can be inferred when markup like this is encountered:
<p>This is a paragraph.
<p>This is also a paragraph.
Such shortened notations that depend on inference may be technically correct, but
stylistically they are not suggested.
It is always preferable to be exact, so use markup like this instead:
<p>This is a paragraph. </p>
<p>This is also a paragraph. </p>
There are markup elements, called empty elements, which do not enclose any content,
thus need no close tags at all, or in the case of XHTML use a self-close identification
scheme.
For example, to insert a line break, use a single <br> tag, which represents the empty
br element, because it doesn’t enclose any content and thus has no corresponding close
tag.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Introduction to HTML and XHTML and more Study notes Web Programming and Technologies in PDF only on Docsity!

Module-1: Traditional HTML and XHTML:

First Look at HTML and XHTML

  • In the case of HTML, markup instructions found within a Web page depend on the structure of the document to the browser software.
  • For example, if you want to emphasize a portion of text, you enclose it within the tags __ and __ , as shown here:

This is important text!.

  • When a Web browser reads a document that has HTML markup in it, it determines how to render it onscreen by considering the HTML elements embedded within the document:
  • So, an HTML document is simply a text file that contains the information we want to publish and the appropriate markup instructions indicating how the browser should structure or present the document.
  • Markup elements are made up of a start tag, such as , and end tag, which is indicated by a slash within the tag, such as .
  • The tag pair should fully enclose any content to be affected by the element, including text and other HTML markup.
  • Under traditional HTML (not XHTML), the close tag for some elements is optional because their closure can be inferred. For example, a

    tag cannot enclose another

    tag, and thus the closing tag can be inferred when markup like this is encountered:

    This is a paragraph.

    This is also a paragraph.

  • Such shortened notations that depend on inference may be technically correct, but stylistically they are not suggested.
  • It is always preferable to be exact, so use markup like this instead:

    This is a paragraph.

    This is also a paragraph.

  • There are markup elements, called empty elements, which do not enclose any content, thus need no close tags at all, or in the case of XHTML use a self-close identification scheme.
  • For example, to insert a line break, use a single
    tag, which represents the empty br element, because it doesn’t enclose any content and thus has no corresponding close tag.
  • However, in XML markup variants, particularly XHTML, an unclosed tag is not allowed, so you need to close the tag

  • or, more commonly, use a self-identification of closure like so:
    .
  • The start tag of an element might contain attributes that modify the meaning of the tag.
  • For example, in HTML, the simple inclusion of the noshade attribute in an **** tag, as shown here:
  • indicates that there should be no shading applied to this horizontal rule.
  • Under XHTML, such style attributes are not allowed, because all attributes must have a value, so instead you have to use syntax like this:

  • As the example shows, attributes may require values, which are specified with an equal sign; these values should be enclosed within double or single quotes.
  • For example, using standard HTML syntax,

  • specifies four attributes for this **** tag that are used to provide more information about the use of the included image.
  • Under traditional HTML, in the case of simple alphanumeric attribute values, the use of quotes is optional, as shown here:

  • Regardless of the flexibility provided under standard HTML, you should always aim to use quotes on all attributes.
  • You will find that doing so makes markup more consistent, and tends to help reduce errors caused by inconsistency. A graphical overview of the HTML markup syntax shown so far is presented here:

- In the case of XHTML, which is a form of HTML that is based upon the syntax rules of XML, we really don’t see many major changes yet in our example: - The preceding examples use some of the most common elements used in (X)HTML documents, including: - The statement, which indicates the particular version of HTML or XHTML being used in the document. - The first example uses the strict 4.01 specification, the second uses a reduced form for HTML5, and the final example uses the XHTML 1.0 strict specification. - The , , and tag pairs are used to specify the general structure of the document. - The required inclusion of the xmlns attribute in the tag is a small difference required by XHTML. - The tag used in the examples indicates the MIME type of the document and the character set in use. - Notice that in the XHTML example, the element has a trailing slash to indicate that it is an empty element. - The and tag pair specifies the title of the document, which generally, appears in the title bar of the Web browser. - A comment is specified by , allowing page authors to provide notes for future reference. - The and header tag pair indicates a headline specifying some important information. - The tag, which has a self-identifying end tag () under XHTML, inserts a horizontal rule, or bar, across the screen. - The

and

paragraph tag pair indicates a paragraph of text. - A special character is inserted using a named entity (♥), which in this case inserts a heart dingbat character into the text. - The and tag pair surrounds a small piece of text to emphasize which a browser typically renders in italics.

Viewing Markup Locally

  • Using a simple text editor, type in either one of the previous examples and save it with a filename such as “ helloworld. html” or “ helloworld. htm” ; you can choose which file extension to use,
  • .htm or .html , but whichever you pick for development, aim to be consistent.
  • After you save the example file on your local file system, open it in your Web browser by opening the File menu and choosing Open, Open Page, or Open File, depending on your browser.
  • Once your browser reads the file, it should render a page like the one shown here:
    • If for some reason you didn’t save your file with the appropriate extension, the browser shouldn’t attempt to interpret the HTML markup.
    • For example, notice here what happens when you try to open the content with a .txt extension:
    • If you want to make a change to the document, you could update the markup, save the file, go back to the browser, and click the Reload or Refresh button.
    • Sometimes the browser will still reload the page from its cache; if a page does not update correctly on reload, hold down the SHIFT key while clicking the Reload button, and the browser should refresh the page.

HTML and XHTML: Version History

  • Since its initial introduction in late 1991, HTML (and later its XML-based cousin, XHTML) has undergone many changes.
  • Interestingly, the first versions of HTML used to build the earliest Web pages lacked a rigorous definition.
  • Fortunately, by 1993 the Internet Engineering Task Force (IETF) began to standardize the language and later, in 1995, released the first real HTML standard in the form of HTML 2.0.
  • You will likely encounter more than just the latest style of markup for many years to come, so below Table presents a brief summary of the version history of HTML and XHTML.

HTML and XHTML DTDs: The Specifications Up Close

  • Contrary to the markup some Web developers seem to produce, both HTML and XHTML have very well-defined syntax.
  • All (X)HTML documents should follow a formal structure defined by the World Wide Web Consortium (W3C; www.w3.org), which is the primary organization that defines Web standards.
  • Traditionally, the W3C defined HTML as an application of the Standard Generalized Markup Language (SGML).
  • SGML is a technology used to define markup languages by specifying the allowed document structure in the form of a document type definition (DTD).
  • A DTD indicates the syntax that can be used for the various elements of a language such as HTML.
  • A snippet of the HTML 4.01 DTD defining the P element, which indicates a paragraph, is shown here:
  • The first line is a comment indicating what is below it.
  • The second line defines the P element, indicating that it has a start tag (

    ), as shown by the dash, and an optional close tag (

    ), as indicated by the O.
  • The type of content that is allowed to be placed within a P element is defined by the entity %inline, which acts here as a shorthand for various other elements and content.
  • This idea of only allowing some types of elements within other elements is called the content model.
  • If you further explore the specification to see what that %inline entity maps out to, you will see that it contains numerous other elements, such as EM, STRONG, and so on, as well as regular typed text.
  • The final line defines the attributes for a

    tag as indicated by the entity %attrs which then expands to a number of entities like %core, %i18n, and %coreevents which finally expand into a variety of attributes like id, class, style, title, lang, dir, onclick, ondblclick, and many more.

  • The aim here is for you to understand the syntax of SGML in a basic sense to support your understanding of how Web browsers treat markup.
  • As another example, look at the HTML 4.01 DTD’s definition of the HR element:
    • From this syntax fragment, note that the HR element has a start tag but does not require a close tag.
    • More interestingly, the element does not enclose any content, as indicated by the EMPTY designation.
    • It turns out to have the same set of attributes as the P element, as defined by the %attrs entity.

## Document Type Statements and Language Versions
  • (X)HTML documents should begin with a declaration. \

  • This statement identifies the type of markup that is supposedly used in a document. For example,

  • indicates that we are using the transitional variation of HTML 4.01 that starts with a root element html.

  • In other words, an tag will serve as the ultimate parent of all the content and elements within this document.

  • A declaration might get a bit more specific and specify the URI (Uniform Resource Identifier) of the DTD being used as shown here:

  • In the case of a XHTML document, the situation really isn’t much different:

  • However, do note that the root html element here is lowercase, which hints at the case sensitivity found in XHTML.

  • There are numerous doctype declarations that are found in HTML and XHTML documents, as shown in Table below.

(X)HTML Document Structure

  • The DTDs define the allowed syntax for documents written in that version of (X)HTML.
  • The core structure of these documents is fairly similar.
  • Given the HTML 4.01 DTD, a basic document template can be derived from the specification, as shown here:
  • In this graphical representation, the indicator, which, as previously mentioned, shows the particular version of HTML being used, in this case 4. Transitional.
  • Within a root html element, the basic structure of a document reveals two elements: the head and the body.
  • The head element contains information and tags describing the document, such as its title, while the body element houses the document itself, with associated markup required to specify its structure.
  • HTML5 follows the same core structure but introduces differences.
  • HTML5 does not support standard frames, though it does preserve inline frames.
  • we’ll concentrate on a typical document structure and drill into each element until we reach the very characters displayed.
  • Roughly speaking, the structure of a non-framed (X)HTML document breaks out like so:
  • The following sections drill into each of the document structuring markup elements and explore what’s contained inside.

The Document Head

  • The information in the head element of an (X)HTML document is very important because it is used to describe or augment the content of the document.
  • The element acts like the front matter or cover page of a document.
  • In many cases, the information contained within the head element is information about the page that is useful for visual styling, defining interactivity, setting the page title, and providing other useful information that describes or controls the document.

The title Element

  • A single title element is required in the head element and is used to set the text that most browsers display in their title bar.
  • The value within a title is also used in a browser’s history system, recorded when the page is bookmarked, and consulted by search engine robots to help determine page meaning.
  • In short, it is pretty important to have a syntactically correct, descriptive, and appropriate page title.
  • Thus, given Simple HTML Title Example.
  • you will see something like this:
  • When a title is not specified, most browsers display the URL path or filename instead:
  • Only one title element should appear in every document, and most user agents will ignore subsequent tag instances.
  • You should be quite careful about making sure a title element is well formed because omitting the close tag can cause many browsers to not load the document.
  • A recent version of Opera reveals what is likely happening in this situation:
  • A document title may contain standard text, but markup isn’t interpreted in a tag, as shown here:
  • However, character entities such as © (or, alternatively, ©), which specifies a copyright symbol, are allowed in a title: <title>Simple HTML Title Example, © 2010 WebMonopoly, Inc.
  • For an entity to be displayed properly, you need to make sure the appropriate character set is defined and that the browser supports such characters; otherwise, you may see boxes or other odd symbols in your title:

  • Comments Finally, comments are often found in the head of a document.
  • Following SGML syntax, a comment starts with **** and may encompass many lines:

The Document Body

  • After the head section, the body of a document is delimited by and .
  • Under the HTML 4.01 specification and many browsers, the body element is optional, but you should always include it, particularly because it is required in stricter markup variants.
  • Only one body element can appear per document.
  • Within the body of a Web document is a variety of types of elements.
  • For example, block-level elements define structural content blocks such as paragraphs (p) or headings (h1-h6).
  • Block-level elements generally introduce line breaks visually.
  • Within nonempty blocks, inline elements are found.
  • There are numerous inline elements, such as bold (b), italic (i), strong (strong), emphasis (em), and numerous others.
  • Other miscellaneous types of elements, including those that reference other objects such as images (img), also generally found within blocks, though in some versions of HTML they can stand on their own.
  • Typed text may include special characters that are difficult to insert from the keyboard or require special encoding.
  • To use such characters in an HTML document, they must be “escaped” by using a special code.
  • All character codes take the form &code;, where code is a word or numeric code indicating the actual character that you want to put onscreen.
  • For example, when adding a less-than symbol (<) you could use < or <.
  • A visual overview of all the items presented in the body is shown here:
  • The full syntax of the elements allowed in the body element is a bit more involved than the full syntax of the head.
  • This diagram shows what is directly included in the body:
  • These parse trees, often called DOM (Document Object Model) trees, are the browsers’ interpretation of the markup provided and are integral to determining how to render the page visually using both default (X)HTML style and any CSS attached.
  • JavaScript will also use this parse tree when scripts attempt to manipulate the document.
  • The parse tree serves as the skeleton of the page, so making sure that it is correct is quite important, but sadly we’ll see very often it isn’t.
  • Browsers are actually quite permissive in what they will render.
  • For example, consider the following markup: Hello HTML World

Welcome to the World of HTML

HTML really isn't so hard!

Soon you will ♥ using HTML.

You can put lots of text here if you want. We could go on and on with fake text for you to read, but let's get back to the book.

- This example misses important tags, doesn’t specify encoding types, has a malformed comment, uses inconsistent casing, doesn’t close tags, and even uses some unknown element foo. - However, this will render exactly the same visually as the correct markup previously presented, as shown in Figure: - Now if you look at the parse tree formed by the browser, you will note that many of the mistakes appear to be magically fixed by the browser: