Download Distributed Document-Based Systems: Understanding HTML, HTTP, and Web Architecture and more Study notes Operating Systems in PDF only on Docsity!
Distributed Document-Based
Systems
COP 6611 Advanced Operating System
Chi Zhang
The World Wide Web
Overall organization of the Web.
HTML ⇒ HTTP ⇒ TCP
HTTP is a stateless application-layer protocol
Document Types
Six top-level MIME types and some common subtypes.
e.g. text/HTML, application/PDF
Type Subtype Description Plain Unformatted text HTML Text including HTML markup commands
Text
XML Text including XML markup commands
Image GIF Still image in GIF format
Audio
Video
Pointer Representation of a pointer device for presentations
Multipart
JPEG Still image in JPEG format Basic Audio, 8-bit PCM sampled at 8000 Hz Tone A specific audible tone MPEG Movie in MPEG format
Octet-stream An uninterrupted byte sequence Postscript A printable document in Postscript PDF A printable document in PDF Mixed Independent parts in the specified order Parallel Parts must be viewed simultaneously
Application
Architectural Overview (1)
The principle of using server-side CGI programs.
Server-side script
An HTML document containing a JavaScript to be executed by the server
Also, server-side application: servlet (servlets run as threads of the server, while
CGI scripts run in separate processes)
(1)
(2)
(3)
The current content of
/data/file.txtis:
(4)
(5) **** (13)
(14)
Thank you for visiting this site.
(15) (16)
HTTP Connections
a) Using nonpersistent connections.
b) Using persistent connections (HTTP 1.1 or later)
HTTP Methods
Request Operations supported by HTTP.
Operation Description
Head Request to return the header of a document
Get Request to return a document to the client
Put Request to store a document at a certain location
Post Provide data that is to be put to a document (e.g. CGI script)
Delete Request to delete a document
HTTP Messages (1)
HTTP request message
Reference: URL
Clients (1)
Using a plug-in in a Web browser.
A plug-in is a small program that can be dynamically loaded into a
browser for handling a specific document (MIME) type.
The interfaces are standardized.
Clients (2)
Using a Web proxy when the browser does not speak FTP.
A Web proxy can be shared by a number of browsers.
Servers
General organization of the Apache Web server.
Apache servers are highly configurable: modules can be
incorporated. Each module can provide one or more
handlers that can assist in processing an incoming HTTP
request.
Server Clusters (1)
A transport-layer switch passes the data of a TCP connection to one
of the servers, depending on some measurement of the server’s
load.
With content-aware distribution, the front end also distributes the
HTTP request based also its content.
Caching and Proxy
A proxy send a conditional HTTP request (with
header If-Modified-Since ) to a server.
To improve performance at the cost of weak
consistency, Squid Web Proxy assigns Texpire = α
(Tcached – Tlast-modified ) + Tcached
Push-based mechanism and Leases
Active cache: In some cases, it is possible to shift
generation of the document from the server to the
proxy.
Cooperative Caching
The principle of cooperative caching
Akamai CDN (1)
A main HTML may contain several other documents
such as images, video, and audio.
Embedded documents are large
Embedded documents rarely change
Cache the embedded documents
In the main HTML, URLs to the embedded documents
actually refer to the pages cached in CDN.
The CDN DNS returns the IP address of the CDN server
closest to the client, or with less load.
Alternative: assign the same IP address to several servers, and
let the network layer direct the request to the nearest server.
Akamai CDN (2)
The principle working of the Akami CDN.