Web Programming: Understanding Distributed Systems and HTTP for Data Exchange - Prof. Dere, Study Guides, Projects, Research of Computer Systems Networking and Telecommunications

An introduction to web programming, focusing on the concepts of distributed systems and the hypertext transfer protocol (http) for data exchange. It covers the basics of sockets, client-server architecture, and http requests and responses. Students will learn about the differences between distributed and unitary systems, the importance of handling concurrency and partial failure, and the role of standards like ip, udp, and tcp in internet communication.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/18/2009

koofers-user-2wn
koofers-user-2wn 🇺🇸

10 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSCI 553: Networking III
Unix Network Programming
Spring 2007
Web Client Programming
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download Web Programming: Understanding Distributed Systems and HTTP for Data Exchange - Prof. Dere and more Study Guides, Projects, Research Computer Systems Networking and Telecommunications in PDF only on Docsity!

CSCI 553: Networking III

Unix Network Programming

Spring 2007

Web Client Programming

Today's Agenda

 Task for Thursday March 13

 Update project.txt

 Begin coding

 use your student repo project directory

 Examples for today

[harry@nisl ~]$ mkdir webprog [harry@nisl ~]$ cd webprog [harry@nisl webprog]$ tar xvfz /home/csci553/classfiles/webprog.tgz ./ ./spider.py ./urllibex.py ...

Web clients & application programming

Introduction

The Internet is changing everything

 Distributed programs are different from unitary ones

 Distributed teams work differently from collocated ones

This lecture looks at how to build programs that

get data from the web

 The next lecture will discuss simple ways to build

programs that supply data

 And the one after that will talk about how to do this

securely

Small Pieces, Loosely Joined

The Unix command line was the world's first

component object model

 (^) Programmers build small pieces, then connect them in arbitrary ways

 Key features:

 (^) Low cost of entry: it's easy to add one more tool to the toolbox  (^) Common data format: stream of strings  (^) Common communication protocol: stdin, stdout, and zero/nonzero exit codes 

The Web grew so quickly because it replicated these

strengths

 (^) Everything used HTML (data format) over HTTP (communication protocol)

Partial Failure

Difference #2: partial failure

 One component fails while others are still healthy

If you've waited five seconds for a web site to respond,

should you assume that it's down, or keep waiting?

Both differences make distributed applications

much harder to debug than unitary ones

Often have heisenbugs (which only appear

intermittently)

And it's usually impossible to get a complete picture of

the system's state

Only way to get a distributed system right is to

build it right in the first place

Under the Hood

These days, the Internet runs on a family

of standards called Internet Protocol (IP)

User Datagram Protocol (UDP) moves

packets across the network

Fast, but no guarantees of delivery or correct

ordering

Transmission Control Protocol (TCP) is

much more commonly used

Guarantees that everything you send is

received, in the right order

Client/Server vs. Peer-to-Peer

A client/server architecture is one in which many

clients communicate with a central server

 (^) Asymmetric: clients ask for things, servers provide them  Web servers are the best-known examples  (^) But database management systems are also servers 

A peer-to-peer architecture is one in which all

processes exchange information equally

 (^) Symmetric: every participant both provides and receives data

 Client/server architectures are simpler to create

 (^) But if the server fails, the whole system fails

Socket Client

import sys, socket buffer_size = 1024 # bytes host = '127.0.0.1' # local machine port = 19073 # hope nobody else is using it... message = 'ping!' # what to send

AF_INET means 'Internet socket'.

SOCK_STREAM means 'TCP'.

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((host, port))

Send the message.

sock.send(message)

Receive and display the reply.

data = sock.recv(buffer_size) print 'client received', data

Tidy up.

sock.close()

The Hypertext Transfer

Protocol

the Hypertext Transfer Protocol (HTTP)

specifies how programs exchange documents

over the web

Figure 13.2: HTTP Request Cycle

The Hypertext Protocol

the Hypertext Transfer Protocol (HTTP) specifies how

programs exchange documents over the web

 (^) Clients are typically browsers, such as Firefox and Internet Explorer  (^) Apache is the most widely used server, but many others exist 

The client sends a request specifying what it wants

The server sends the contents of the file in reply

 (^) Or an error message

 HTTP is a stateless protocol

 (^) Server doesn't remember anything between requests  (^) Every image in a web page must be requested and downloaded separately

Headers

An HTTP header is a key/value pair

"Accept: text/html"

"Accept-Language: en, fr"

"If-Modified-Since: 16-May-2005"

Unlike a dictionary, a key may appear any

number of times

So a request can specify that it's willing to

accept several types of content

Body

The body is any extra data associated with the

request

 Used with web forms, to upload files, etc.

Must be a blank line between the last header and

the start of the body

 Signals the end of the headers

Forgetting it is a common mistake

The "Content-Length" header tells the server how

many bytes to read

Note: there's no magic in any of this

An HTTP request is just text—any program that wants to

can create them or parse them

HTTP Response Codes

601 Connection Timed Out The server did not respond before the connection timed out 500 Internal Server Error An error occurred in the server that prevented it fulfilling the request 408 Timeout The server gave up waiting for the client 404 Not Found The requested resource could not be found 401 Unauthorized The request requires authentication 400 Bad Request The request is badly formatted 307 Temporary Redirect The requested resource is temporarily at a different location 301 Moved Permanently The requested resource has moved to a new permanent location 204 No Content The server has completed the request, but doesn't need to return any data 200 OK The request has succeeded 100 Continue Client should continue sending data Code Name Meaning

HTTP Example

 (^) Fetch a page from the course site  (^) Request has no headers, so the blank line that signals “end of headers” is right after the request line import sys, socket buffer_size = 1024 HttpRequest = '''GET /greeting.html HTTP/1. ''' sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect(('www.third-bit.com', 80)) sock.send(HttpRequest) response = '' while True: data = sock.recv(buffer_size) if not data: break response += data sock.close() print response  (^) Note: the double parentheses in the call to sock.connect are deliberate  (^) Method's argument is a (host, port) tuple