MSC's Internet Predictions: Building an HTTP Proxy Server for Productivity & Access Contro, Study Guides, Projects, Research of Computer Systems Networking and Telecommunications

The failed predictions of msc's internet strategy, including overloaded internet connections, declining employee productivity, and marketing embarrassments. The author was hired to write a configurable http proxy server to help msc employees access the web on a need-to-browser basis. The document also explains the reasons for using a proxy server, such as controlling access to the web on a host/location basis and caching data to reduce global traffic.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/18/2009

koofers-user-q1w-1
koofers-user-q1w-1 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COSC 6377: Computer Networks | Fall 2000
Term Pro ject: The MSC Proxy Server
Assigned
:Tuesday, October 31, 2000
Due
: Tuesday,November 21, 2000, midnight.
When the CIO convinced Midget Service Corp. (MSC)'s CEO of the importance of joining the
Internet revolution late last year, he thoughthewas doing the companyahuge favor and a big
bonus would be his for the taking. Unfortunately, his predictions that MSC's prots would climb
through the roof, employee productivitywould soar, and the company's image would shine like
never before have proven to be, how shall we put this mildly... disastrously inaccurate. A series of
major problems have arisen:
the Internet load quickly overwhelmed the initial ISDN link, so MSC had to change to a new
and more expensiveinternet service provider and install an expensive T1 link;
employee pro ductivity has been steadily declinining - on several occasions, MSC's CEO caught
marketing twerps playing Doom, wasting time with IRC, and tying up the internet link for long
periods of time downloading material from
www.playboy.com
.People in other departments
haveeven worse;
company prots are down, and the new web server has proven to be more of a marketing em-
barrassment than a marketing b onanza. The nal
coup
came in December, when a prankster
hacked their wayinto MSC's network and caused computers to display nothing but pictures
of snow-covered mountains, Santa Claus, and reindeers for an entire week.
After the \Christmas server" incident, MSC's CEO went ballistic. She told the CIO that he had
until the end of February to solve the problems or nd a replacement. Arewall was installed
immediately, which helped prevent outside hackers from breaking in. The next step involves con-
trolling MSC's Internet link to restrict its abuse by MSC employees. You have b een hired as a
contractor to write a congurable HTTP proxy server that would allow MSC employees access to
the Web on a
need-to-browse
basis.
1 Overview
HTTP requests consist of roughly three steps: (i) the client application parses the desired URL to
determine the machine name and TCP port number of the desired web page, and makes a TCP
`socket' connection to that machine and port; (ii) the client application sends a request message to
the server on the new socket connection, specifying what operation it wants performed (most often,
a GET operation to get the contents of a web page), and; (iii) the server application performs the
desired operation and returns the resulting data (most often a web page in html format) to the
client on the established TCP connection.
When a client application is congured to connect via a
proxy server
, the client establishes
its connection to the proxy server's machine and TCP port and forwards its complete request to
the proxy server rather than the \real" web server. The HTTP proxy server accepts local HTTP
connections and forwards them to their nal destination { essentially it introduces an extra \hop"
between the client browser and the web server. There are several reasons whyyou would use a
proxy server instead of connecting to the web site via the Internet directly: you might be b ehind
a rewall where direct connections are not allowed; a proxy can be used to locally cache data in
order to reduce the amount of global trac; or|in our case|the proxy is used to control access to
the Web on a per host/per location basis.
pf3
pf4
pf5
pf8

Partial preview of the text

Download MSC's Internet Predictions: Building an HTTP Proxy Server for Productivity & Access Contro and more Study Guides, Projects, Research Computer Systems Networking and Telecommunications in PDF only on Docsity!

Term Pro ject: The MSC Proxy Server

Assigned: Tuesday, Octob er 31, 2000 Due: Tuesday, Novemb er 21, 2000, midnight.

When the CIO convinced Midget Service Corp. (MSC)'s CEO of the imp ortance of joining the Internet revolution late last year, he thought he was doing the company a huge favor and a big b onus would b e his for the taking. Unfortunately, his predictions that MSC's pro ts would climb through the ro of, employee pro ductivity would soar, and the company's image would shine like never b efore have proven to b e, how shall we put this mildly... disastrously inaccurate. A series of ma jor problems have arisen:

 the Internet load quickly overwhelmed the initial ISDN link, so MSC had to change to a new and more exp ensive internet service provider and install an exp ensive T1 link;

 employee pro ductivity has b een steadily declinini ng - on several o ccasions, MSC's CEO caught marketing twerps playing Do om, wasting time with IRC, and tying up the internet link for long p erio ds of time downloading material from www.playboy.com. People in other departments have even worse;

 company pro ts are down, and the new web server has proven to b e more of a marketing em- barrassment than a marketing b onanza. The nal coup came in Decemb er, when a prankster hacked their way into MSC's network and caused computers to display nothing but pictures of snow-covered mountains, Santa Claus, and reindeers for an entire week.

After the \Christmas server" incident, MSC's CEO went ballistic. She told the CIO that he had until the end of February to solve the problems or nd a replacement. A rewall was installed immediately, which help ed prevent outside hackers from breaking in. The next step involves con- trolling MSC's Internet link to restrict its abuse by MSC employees. You have b een hired as a contractor to write a con gurable HTTP proxy server that would allow MSC employees access to the Web on a need-to-browse basis.

1 Overview

HTTP requests consist of roughly three steps: (i) the client application parses the desired URL to determine the machine name and TCP p ort numb er of the desired web page, and makes a TCP `so cket' connection to that machine and p ort; (ii) the client application sends a request message to the server on the new so cket connection, sp ecifying what op eration it wants p erformed (most often, a GET op eration to get the contents of a web page), and; (iii) the server application p erforms the desired op eration and returns the resulting data (most often a web page in html format) to the client on the established TCP connection. When a client application is con gured to connect via a proxy server, the client establishes its connection to the proxy server's machine and TCP p ort and forwards its complete request to the proxy server rather than the \real" web server. The HTTP proxy server accepts lo cal HTTP connections and forwards them to their nal destination { essentially it intro duces an extra \hop" b etween the client browser and the web server. There are several reasons why you would use a proxy server instead of connecting to the web site via the Internet directly: you might b e b ehind a rewall where direct connections are not allowed; a proxy can b e used to lo cally cache data in order to reduce the amount of global tra c; or|in our case|the proxy is used to control access to the Web on a p er host/p er lo cation basis.

The ob jectives of this assignment are to help you learn how to write networking co de using the Berkeley so ckets API, to familiarize you with one of the most widely used Internet application proto cols (http), and to give you exp erience in building client/server applications. An interesting thing ab out proxy servers is that they b ehave like a server when they accept requests from lo cal clients, but they act like a client when they forward their lo cal client requests along to the real web server. Thus, by implementing a proxy server, you will get a feel for how b oth sides of a client/server application are implemented. The following sections detail the set of op erations that your proxy server must p erform to solve MSC's internet problem. In addition to this do cument, we have provided a numb er of other resources to help you implement the pro ject, including:

 RFC 1945: Hyp ertext Transfer Proto col (HTTP 1.0). This do cument describ es the format of HTTP proto col requests and resp onses, which is what your server will b e receiving and generating.

 RFC 1738: Uniform Resource Lo cators (URL). This do cument describ es the format of URLs, which your server will need to parse. Most of the legal URLs sp eci ed in RFC 1738 are not supp orted by the MSC proxy server, however, so consider this text supplemental.

 RFC 1808: Relative Uniform Resource Lo cators. This do cument builds on RFC 1738. Like that do cument, RFC 1808 is supplemental.

 Berkeley so ckets user guide and examples: This collection of do cuments and sample pro- grams should prove useful for getting started on programming your proxy server using Berke- ley so ckets. Please refer to the \Reference / Resource" section in the pro ject web page (http://www.cs.uh.edu/~jsteach/cosc6377/).

 Unix man pages: in addition to the high level primer describ ed ab ove, you should b e sure to read the relevant Unix man pages on gethostbyname(), gethostbyaddr(), byteorder, socket(), bind(), listen(), accept(), and select().

 Unix Network Programming sampling: The b o ok, \Unix Network Programming" by W. Richard Stevens is considered the quintessential guide to programming network applications (clients or server) on Unix. You might consider grabbing or b orrowing a copy of the complete text if you want more background material or examples.

2 Requirements

Your proxy must implement a subset of the HTTP/1.0 proto col. The proto col is describ ed in RFC 1945. In this section, we describ e the steps that your server must p erform whenever a client request arrives. Up on startup, your server should read a con guration le, op en a TCP so cket on a p ort indicated in the con guration le and start accepting requests. Whenever a connect request arrives, you should translate the client's IP address to a host name and check whether the client is allowed to access the Web. If the client is not authorized, your server must refuse the connection immediately { those marketing dweebs have b een cut o the net entirely until further notice!

2.1 Pro cessing an Incoming Request

For authorized clients, you must accept the request and parse it. A request consists of a request line, followed by request header elds. The request line has three elds: metho d, URL and proto col

all comp osite (multipart) messages through an image removal lter. Your proxy server will not need to p erform the actual ltering! We will provide a set of lters. However, your proxy must examine the media typ e of returned do cuments to determine if the message b o dy should b e handed to a lter that has b een registered for that typ e of data, and if so, replace the returned message b o dy with the output of the sp eci ed lter. The interface for connecting to lter programs is describ ed b elow.

2.3 Other Requirements

Your server should b e capable of handling multiple connections concurrently. This means that you must b e very careful when using system calls that could blo ck the server (i.e., reading or writing from/to a so cket). There are two common ways to handle multiple connections concurrently: (i) forking o a child pro cess for each connection and (ii) using the select(2) system call to p erform asynchronous op erations. You may use whichever technique you prefer. Note that you are not required you fork a pro cess or use select during the resolution of host addresses using the DNS interface. Although functions such as gethostbyname or gethostbyaddr blo ck waiting for an answer, you may blo ck the server while p erforming these op erations. The HTTP proto col de nes persistent connections that are used to send and receive more than one request. If you investigate HTTP requests pro duced by various client browsers, and Netscap e Navigator in particular, you will see header eld Proxy-Connection: Keep-Alive. This indicates that you should not close the connection when the last byte of the reply is sent. We do not require that you implement p ersistent connections { you may close so cket connections as so on as you nish using them.

3 The Con guration File

The runtime b ehaviour of the MSC HTTP proxy server is controlled by means of a con guration le. Figure 1 illustrates a sample proxy con guration le. A con guration le consists of a sequence of lines containing comments or commands. Com- ments are initiated by a p ound sign (#) and extend to the end of the line. Line continuation is done by means of a backslash character (). Each non-empty line consists of a sequence of tokens that make up a command (clause). If a token contains white space, it must b e enclosed in double quotes (``''). No sp ecial escap e sequences are recognized in strings.

3.1 Con guration File Contents

There are six clauses that can app ear in a con guration le:

port n If present, this clause sp eci es the TCP p ort numb er your server should listen on. If omitted, the p ort numb er is indicated by the environment variable PROXY PORT. If the clause is not present and PROXY PORT is unde ned, the default http p ort (80) should b e used.

refuse pattern This clause sp eci es hosts from whom connections have should b e refused. Argu- ment pattern is an extended regular expression; for more information ab out regular expres- sions, consult the regex(5) man page.

block pattern This clause sp eci es that all URLs matching pattern are blo cked { an attempt to access them should return an error message.

proxyrc -- sample HTTP proxy configuration file

This clause defines the port where proxy should listen on.

The value is accessible via the proxy_port() macro in host order.

port 5000

A list of refused clients and blocked/redirected URLs goes here.

The string that should be submitted to proxy_location() should

consist of the part followed by `/' and the part.

Note that you should (a) extract port number (if any) and append

a trailing `/' even if the part of the URL is empty.

refuse marketing.mscorp.com # blocks complete `marketing' group

block www.microsoft.com # Microsloth is a bad location

redirect www.netscape.com/(.*) "www3.netscape.com/\1" # www3 server is preferrable to www.

This clause probably doesn't have any real use, except if you try to

fool server or browser into believe something that isn't there.

This clause, for example, rewrites `User-Agent' header fields in

such a way that any substring `Netscape foo.bar ...' is replaced

by `Mozilla foo.bar ...' (Remember, it's spelled M-o-z-i-l-l-a :-)

rewrite User-Agent "Netscape (.*)" "Mozilla \1"

This is how you filter incoming data.

The HTTP protocol defines a bunch of content-types which have

exactly the same structure as used in MIME mail.

You have to parse the `Content-type' header field, extract the media

and pass it to proxy_filter(). If a filter has been defined, you'll

get as a return the string representing a shell command to be executed.

Sample entry here says that all text should be piped through an imaginary

scramble filter which groks text on stdin and dumps a swedish-chef

version of it on stdout. Note that the complete command line has

to be enclosed in double quotes if spaces are embedded.

filter text/(.*) "scramble -swedish -\1"

Figure 1 Sample proxy con guration le

path. The host name should b e followed by a slash character, even if path in the original URL is empty. If the return value indicates that a new lo cation should b e used, the result string should b e regarded as a full URL (without the http: pre x).

 int proxy hfield(proxy t* p, const char* key, const char* value, char* buf, size t bufsiz) : This command rewrites request header elds. The string key is the header keyword { the rst token b efore the colon. The string value p oints to the remainder of the header eld, i.e., the characters after the colon. This function returns PROXY_OK if the mo di cation was successful, and PROXY_ERROR if the bu er was to o small. The new header value (without the key pre x) is available in buf.

 int proxy filter(proxy t* p, const char* media, char* buf, size t bufsiz) : This command checks to see whether a lter has b een sp eci ed for media. It returns PROXY_OK if no lter has b een de ned, PROXY_FILTER otherwise. If a lter has b een installed, buf contains a string describing the lter command to b e executed. Note that the command is not split into individual arguments { to execute the command, you should either construct the argument list yourself or execute /bin/sh -c command.

The le proxy.h contains the C header le declarations and de nitions needed to use the library routines. The library co de itself can b e found in the les libproxy.a and librx.a. Di erent versions of the library les are available for di erent architectures and op erating systems. In general it is imp ortant that you use only a single architecture throughout the pro ject (e.g., do all of your development on SunOS or Linux or ... ) or carefully clean up your links to library les and .o les whenever you switch platforms.

4 Your Assignment

To complete this assignment, you need to implement an MSC proxy server that runs on a machine in the 547 PGH lab (either on SunOS or Linux) 2 , and provides the functionality describ ed ab ove. The MSC proxy server pro ject will constitute 20% of your nal course grade. It is a term pro ject, and you should get started on it immediately! Be sure to submit the source co de les and your make le. Do not turn in executable and object les. Be sure to submit every le that we will need in order to recreate your executable, including all header les, .c les and makefile's. Your executable should b e called proxy in your makefile. Also b e sure to submit any external do cumentation that you've written (e.g., a README le) and comment your co de thoroughly and clearly. This pro ject will b e graded primarily on correctness, clarity of your overall design and organi- zation, and clear do cumentation (b oth internal and external) This pro ject is a group pro ject - groups can consist of one or two p eople. You are on your own to form your groups { feel free to use the class mailing list or newsgroup to help nd partners. If you form a two-p erson group, part of your exteral do cumentation should b e included who did what part of the work.

(^2) For grading purp ose, please make sure your programs will run on Linux machines in 547 PGH. If your program can b e compiled using gcc on SunOS machines, it should b e able to run on Linux machines.

5 Logistics

Your rst step should b e to read the relevant do cumentation cited ab ove. Make sure you feel comfortable with what this assignment entails b efore sp ending an inordinate amount of time writing co de. A little bit of careful design can go a long way! Directory ~jsteach/www/cosc6377/proxy is the base directory for this pro ject. You should start by copying all the les in the skel directory in a directory where you intend to work. Relevant do cumentation for this pro ject is in the doc sub directory. You might want to p oll it every now and then for new information. Additional do cumentation and links will b e on the class home page http://www.cs.uh.edu/~jsteach/cosc6377/.