Lecture Slides on Web and HTTP - Computer Network Programming | CSCE 515, Study notes of Computer Science

Material Type: Notes; Professor: Xu; Class: COMPUTR NETWRK PROGRAMNG; Subject: Computer Science & Engineering; University: University of South Carolina - Columbia; Term: Fall 2008;

Typology: Study notes

Pre 2010

Uploaded on 10/01/2009

koofers-user-1lu
koofers-user-1lu 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSCE 515:
Computer Network
Programming
------ Web & HTTP
Wenyuan Xu
Department of Computer Science and
Engineering
University of South Carolina
CSCE515 – Computer Network Programming
WWW History
What is WWW?
an architecture framework for accessing linked document spread over
millions of machines
1989-1990 – Tim Berners-Lee invents the World Wide Web at CERN
CERN: European center for nuclear research.
Means for distributing high-energy physics data
Means for transferring text and graphics simultaneously
Client/Server data transfer protocol
Communication via application level protocol
System ran on top of standard networking infrastructure
Established a common language for sharing information on computers
Text mark up language
Simple and easy to use
Requires a client application to render text/graphics
CSCE515 – Computer Network Programming
WWW History contd.
1994 – Mark Andreesen invents MOSAIC at National
Center for Super Computing Applications (NCSA)
First graphical browser
Internet’s first “killer app”
Freely distributed
Became Netscape Inc.
1995 (approx.) – Web traffic becomes dominant
Exponential growth
E-commerce
Web infrastructure companies
World Wide Web Consortium
CSCE515 – Computer Network Programming
How the web works?
User input:
URL
Hypertext link/ Hyperlink
Web browser
Gets the IP address of the server (via DNS)
Makes a TCP connection to port 80 on the server
Sends an HTTP request to the web server
Receives the required files from the web server
Releases the TCP connection.
Renders the page onto the screen as specified by its
HTML or other web languages
CSCE515 – Computer Network Programming
WWW Components
Structural Components
Clients/browsers – to dominant implementations
Servers – run on sophisticated hardware
Caches – many interesting implementations
Internet – the global infrastructure which facilitates data
transfer
Semantic Components
Hyper Text Transfer Protocol (HTTP)
Hyper Text Markup Language (HTML)
eXtensible Markup Language (XML)
Uniform Resource Identifiers (URIs)
CSCE515 – Computer Network Programming
URI: Uniform Resource Identifiers
URIs defined in RFC 2396.
provide a simple and extensible means for
identifying a resource
Absolute URI: scheme://hostname[:port]/path
http://www.cse.sc.edu:80/foo/blah
ftp://ftp.is.co.za/rfc/rfc1808.txt
Relative URI: /path
/foo/blah
No server mentioned
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Lecture Slides on Web and HTTP - Computer Network Programming | CSCE 515 and more Study notes Computer Science in PDF only on Docsity!

CSCE 515:

Computer Network

Programming

------ Web & HTTP

Wenyuan Xu Department of Computer Science and Engineering University of South Carolina

CSCE515 – Computer Network Programming

WWW History

„ What is WWW?

an architecture framework for accessing linked document spread over millions of machines „ 1989-1990 – Tim Berners-Lee invents the World Wide Web at CERN

CERN: European center for nuclear research.

Means for distributing high-energy physics data

Means for transferring text and graphics simultaneously

Client/Server data transfer protocol „ Communication via application level protocol „ System ran on top of standard networking infrastructure

Established a common language for sharing information on computers „ Text mark up language Simple and easy to use

Requires a client application to render text/graphics

CSCE515 – Computer Network Programming

WWW History contd.

„ 1994 – Mark Andreesen invents MOSAIC at National Center for Super Computing Applications (NCSA)

First graphical browser

Internet’s first “killer app”

Freely distributed

Became Netscape Inc.

„ 1995 (approx.) – Web traffic becomes dominant

Exponential growth

E-commerce

Web infrastructure companies

World Wide Web Consortium

CSCE515 – Computer Network Programming

How the web works?

„ User input:

URL

Hypertext link/ Hyperlink

„ Web browser

Gets the IP address of the server (via DNS)

Makes a TCP connection to port 80 on the server

Sends an HTTP request to the web server

Receives the required files from the web server

Releases the TCP connection.

Renders the page onto the screen as specified by its HTML or other web languages

WWW Components

„ Structural Components

Clients/browsers – to dominant implementations

Servers – run on sophisticated hardware

Caches – many interesting implementations

Internet – the global infrastructure which facilitates data transfer

„ Semantic Components

Hyper Text Transfer Protocol (HTTP)

Hyper Text Markup Language (HTML) „ eXtensible Markup Language (XML)

Uniform Resource Identifiers (URIs)

URI: Uniform Resource Identifiers

„ URIs defined in RFC 2396.

„ provide a simple and extensible means for

identifying a resource

„ Absolute URI: scheme://hostname[:port]/path

http://www.cse.sc.edu:80/foo/blah

ftp://ftp.is.co.za/rfc/rfc1808.txt

mailto:[email protected]

„ Relative URI: /path

/foo/blah No server mentioned

CSCE515 – Computer Network Programming

/foo/blah

usr bin www etc

foo fun gif

blah

CSCE515 – Computer Network Programming

URL vs. URI?

„ Most popular form of a URI is the Uniform

Resource Locator (URL)

„ What is the difference between URL and URI?

„ URI = URL+URN

„ URN: Uniform Resource Name

urn:isbn:0-395-36341-

HTTP

Hypertext

Transfer Protocol

Refs:

RFC 1945 (HTTP 1.0)

RFC 2616 (HTTP 1.1)

CSCE515 – Computer Network Programming

HTTP Basic

„ HTTP is the protocol that supports

communication between web browsers

and web servers.

„ A “Web Server” is a HTTP server

„ Most clients/servers today speak version

1.1, but 1.0 is also in use.

From the RFC

„ “HTTP is an application-level protocol with

the lightness and speed necessary for

distributed, hypermedia information

systems.”

„ Transport Independence

The RFC states that the HTTP protocol

generally takes place over a TCP connection,

but the protocol itself is not dependent on a

specific transport layer.

Request - Response

„ HTTP has a simple structure:

client sends a request

server returns a reply.

„ HTTP can support multiple request-reply

exchanges over a single TCP connection.

CSCE515 – Computer Network Programming

More Methods

„ TRACE: used to trace HTTP forwarding through proxies, tunnels, etc.

„ OPTIONS: used to determine the capabilities of the server, or characteristics of a named resource.

CSCE515 – Computer Network Programming

Common Usage

„ GET, HEAD and POST are supported everywhere.

„ HTTP 1.1 servers often support PUT, DELETE, OPTIONS & TRACE.

CSCE515 – Computer Network Programming

HTTP Version Number

HTTP/1.0 ” or “ HTTP/1.1

HTTP 0.9 did not include a version number in a request line.

If a server gets a request line with no HTTP

version number, it assumes 0.

CSCE515 – Computer Network Programming

The Header Lines

„ After the Request-Line come a number (possibly zero) of HTTP header lines.

„ Each header line contains an attribute name followed by a “:” followed by a space and the attribute value.

The Name and Value are just text.

Host: www.sc.edu „ Request Headers provide information to the server about the client

what kind of client

what kind of content will be accepted

who is making the request „ There can be 0 headers (HTTP 1.0) „ HTTP 1.1 requires a Host: header

Request-Line Headers.. .

Content...

blank line blank line

Example HTTP Headers

Accept: text/html

Host: www.sc.edu

From: [email protected]

User-Agent: Mozilla/4.

Referer: http://foo.com/blah

End of the Headers

„ Each header ends with a CRLF (

\r\n )

„ The end of the header section is

marked with a blank line.

just CRLF

„ For GET and HEAD requests, the

end of the headers is the end of the

request!

Request-Line Headers.. .

Content...

blank line blank line

CSCE515 – Computer Network Programming

POST

„ A POST request includes some content

(some data) after the headers (after the blank

line).

„ There is no format for the data (just raw

bytes).

„ A POST request must include a Content-

Length line in the headers:

Content-length: 267

CSCE515 – Computer Network Programming

Example GET Request

GET /~wyxu/index.html HTTP/1. Accept: / Host: www.cse.se.edu User-Agent: Internet Explorer From: [email protected] Referer: http://foo.com/

There is a blank line here!

CSCE515 – Computer Network Programming

Example POST RequestExample POST Request

POST /~POST /~wxy/changegrade.cgiwxy/changegrade.cgi HTTP/1.1HTTP/1. Accept: /Accept: / Host:Host:^ www.cse.sc.eduwww.cse.sc.edu UserUser--Agent:Agent:^ SecretAgentSecretAgent^ V2.3V2. ContentContent--Length: 35Length: 35 RefererReferer:: http://monte.cs.rpi.edu/blahhttp://monte.cs.rpi.edu/blah

stuidstuid=6660182722&item=test1&grade=99=6660182722&item=test1&grade=

CSCE515 – Computer Network Programming

Typical Method Usage

GET used to retrieve an HTML document.

HEAD used to find out if a document has changed.

POST used to submit a form.

HTTP Response

„ ASCII Status Line

„ Headers Section

„ Content can be anything (not just text)

typically an HTML document or some kind of

image.

Status-Line

Headers.

Content...

blank lineblank line

Response Status Line

HTTP-Version Status-Code Message

„ Status Code is 3 digit number (for computers)

„ Message is text (for humans)

CSCE515 – Computer Network Programming

Content type - More

„ application/pdf: PDF files;

„ application/msword: word files

„ audio/mpeg: MP3 or other MPEG audio;

„ audio/x-wav: WAV audio

„ image/gif: GIF image;

„ image/jpeg: JPEG JFIF image;

„ image/tiff: Tag Image File Format;

„ video/mpeg: MPEG-1 video with multiplexed audio; CSCE515 – Computer Network Programming

Web browser

„ text/html: display directly „ MIME type is not one of the build-in ones:

consults its table of MIME types

table associate a MIME type with a viewer

„ To interpret a rapidly growing collection of file types:

Plug-in

helper application

CSCE515 – Computer Network Programming

Plug-in

„ A code module that the browser fetches

from a special directory on the disk and installs as an extension to itself.

Client machine Browser

Base code plug in

Browser runs as a single process

interface CSCE515 – Computer Network Programming

Helper application

„ A complete program, running as a separate process.

Client machine Browser

Process

Browser helper

Process

Examples

„ application/pdf: PDF files;

Æ Acrobat is automatically started

„ application/msword: word files

Æ Microsoft word

Single Request/Reply

„ The client sends a complete request. „ The server sends back the entire reply. „ The server closes it’s socket.

„ If the client needs another document it must open a new connection.

This was the default for HTTP 1.

CSCE515 – Computer Network Programming

Persistent Connections

„ HTTP 1.1 supports persistent connections (this

is the default).

„ Multiple requests can be handled over a single

TCP connection.

„ The Connection: header is used to exchange

information about persistence (HTTP/1.1)

„ 1.0 Clients used a Keep-alive: header

CSCE515 – Computer Network Programming

Example

„ wget www.google.com „ Using wireshark capture the file

„ Homework,

Using browser to open www.google.com,

check whether the header is different from

“wget”

HTML

Hyper-Text

Markup Language

CSCE515 – Computer Network Programming

HTML Basics

„ I assume everyone knows something about HTML.

If not: check the home page for some links.

„ Documents use elements to “mark up” or identify

sections of text for different purposes or display

characteristics

„ Mark up elements are not seen by the user when

page is displayed

„ NOTE: Not all documents in the Web are HTML!

HTML Tables:

„

,
start/end a table

„ , start/end a table row

„ , start/end a table cell

„ , start/end table header cell

timedate.com Hit Table

**

hour number of hits
12-1AM 4,320
1-2AM 18,986
**

hour number of hits 12-1AM 4, 1-2AM 18, 2-3AM 246

CSCE515 – Computer Network Programming

Pre fork() ’d Server

  • Creating a new process for each client is expensive.
  • We can create a bunch of processes, each of which can take care of a client.
  • Each child process is an iterative server.

CSCE515 – Computer Network Programming

Pre fork() ’d TCP Server

  • Initial process creates socket and binds to well known address.
  • Process now calls fork() a bunch of times.
  • All children call accept().
  • The next incoming connection will be handed to one child.

CSCE515 – Computer Network Programming

listen()

Server

TCP

3-wayhandshake complete

accept

arrivingSYN

Completed connection queue

Incomplete connection queue

Sum of both queuescannot exceed backlog

CSCE515 – Computer Network Programming

Preforking

  • As the book shows, having too many preforked children can be bad.
  • Using dynamic process allocation instead of a hard-coded number of children can avoid problems.
  • The parent process just manages the children, doesn’t worry about clients.

Sockets library vs. system call

  • A preforked TCP server won’t usually work the way we want if sockets is not part of the kernel:

calling accept() is a library call, not an atomic

operation.

  • We can get around this by making sure only one child calls accept() at a time using some locking scheme.

Prethreaded Server

  • Same benefits as preforking.
  • Can also have the main thread do all the calls to accept() and hand off each client to an existing thread.

CSCE515 – Computer Network Programming

What’s the best server design

for my application?

  • Many factors:

expected number of simultaneous clients.

Transaction size (time to compute or lookup

the answer)

Variability in transaction size.

Available system resources (perhaps what

resources can be required in order to run the

service).

CSCE515 – Computer Network Programming

Server Design

  • It is important to understand the issues and options.
  • Knowledge of queuing theory can be a big help.
  • You might need to test a few alternatives to determine the best design.

Assignment & Next time

„ Reading:

UNP 30

RFC 2396: http://www.ietf.org/rfc/rfc2396.txt

HTTP 1.

HTTP 1.

„ Next Lecture:

Advanced socket programming