Distributed Document-Based Systems: Understanding HTML, HTTP, and Web Architecture, Study notes of Operating Systems

An in-depth exploration of distributed document-based systems, focusing on html document types, http connections, and the architectural overview of clients and servers. Learn about mime types, server-side scripts, and http methods, as well as the organization of apache web servers and server clusters.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-tl2
koofers-user-tl2 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Distributed Document-Based
Systems
COP 6611 Advanced Operating System
Chi Zhang
The World Wide Web
Overall organization of the Web.
HTML HTTP TCP
HTTP is a stateless application-layer protocol
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Distributed Document-Based Systems: Understanding HTML, HTTP, and Web Architecture and more Study notes Operating Systems in PDF only on Docsity!

Distributed Document-Based

Systems

COP 6611 Advanced Operating System

Chi Zhang

[email protected]

The World Wide Web

Overall organization of the Web.

HTML ⇒ HTTP ⇒ TCP

HTTP is a stateless application-layer protocol

Document Types

Six top-level MIME types and some common subtypes.

e.g. text/HTML, application/PDF

Type Subtype Description Plain Unformatted text HTML Text including HTML markup commands

Text

XML Text including XML markup commands

Image GIF Still image in GIF format

Audio

Video

Pointer Representation of a pointer device for presentations

Multipart

JPEG Still image in JPEG format Basic Audio, 8-bit PCM sampled at 8000 Hz Tone A specific audible tone MPEG Movie in MPEG format

Octet-stream An uninterrupted byte sequence Postscript A printable document in Postscript PDF A printable document in PDF Mixed Independent parts in the specified order Parallel Parts must be viewed simultaneously

Application

Architectural Overview (1)

The principle of using server-side CGI programs.

Server-side script

An HTML document containing a JavaScript to be executed by the server

Also, server-side application: servlet (servlets run as threads of the server, while

CGI scripts run in separate processes)

(1)

(2)

(3)

The current content of

/data/file.txtis:

(4)

(5) **** (13)

(14)

Thank you for visiting this site.

(15) (16)

HTTP Connections

a) Using nonpersistent connections.

b) Using persistent connections (HTTP 1.1 or later)

HTTP Methods

Request Operations supported by HTTP.

Operation Description

Head Request to return the header of a document

Get Request to return a document to the client

Put Request to store a document at a certain location

Post Provide data that is to be put to a document (e.g. CGI script)

Delete Request to delete a document

HTTP Messages (1)

HTTP request message

Reference: URL

Clients (1)

Using a plug-in in a Web browser.

A plug-in is a small program that can be dynamically loaded into a

browser for handling a specific document (MIME) type.

The interfaces are standardized.

Clients (2)

Using a Web proxy when the browser does not speak FTP.

A Web proxy can be shared by a number of browsers.

Servers

General organization of the Apache Web server.

Apache servers are highly configurable: modules can be

incorporated. Each module can provide one or more

handlers that can assist in processing an incoming HTTP

request.

Server Clusters (1)

A transport-layer switch passes the data of a TCP connection to one

of the servers, depending on some measurement of the server’s

load.

With content-aware distribution, the front end also distributes the

HTTP request based also its content.

Caching and Proxy

ƒ A proxy send a conditional HTTP request (with

header If-Modified-Since ) to a server.

ƒ To improve performance at the cost of weak

consistency, Squid Web Proxy assigns Texpire = α

(Tcached – Tlast-modified ) + Tcached

ƒ Push-based mechanism and Leases

ƒ Active cache: In some cases, it is possible to shift

generation of the document from the server to the

proxy.

Cooperative Caching

The principle of cooperative caching

Akamai CDN (1)

ƒ A main HTML may contain several other documents

such as images, video, and audio.

ƒ Embedded documents are large

ƒ Embedded documents rarely change

ƒ Cache the embedded documents

ƒ In the main HTML, URLs to the embedded documents

actually refer to the pages cached in CDN.

ƒ The CDN DNS returns the IP address of the CDN server

closest to the client, or with less load.

ƒ Alternative: assign the same IP address to several servers, and

let the network layer direct the request to the nearest server.

Akamai CDN (2)

The principle working of the Akami CDN.