




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The failed predictions of msc's internet strategy, including overloaded internet connections, declining employee productivity, and marketing embarrassments. The author was hired to write a configurable http proxy server to help msc employees access the web on a need-to-browser basis. The document also explains the reasons for using a proxy server, such as controlling access to the web on a host/location basis and caching data to reduce global traffic.
Typology: Study Guides, Projects, Research
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Assigned: Tuesday, Octob er 31, 2000 Due: Tuesday, Novemb er 21, 2000, midnight.
When the CIO convinced Midget Service Corp. (MSC)'s CEO of the imp ortance of joining the Internet revolution late last year, he thought he was doing the company a huge favor and a big b onus would b e his for the taking. Unfortunately, his predictions that MSC's pro ts would climb through the ro of, employee pro ductivity would soar, and the company's image would shine like never b efore have proven to b e, how shall we put this mildly... disastrously inaccurate. A series of ma jor problems have arisen:
the Internet load quickly overwhelmed the initial ISDN link, so MSC had to change to a new and more exp ensive internet service provider and install an exp ensive T1 link;
employee pro ductivity has b een steadily declinini ng - on several o ccasions, MSC's CEO caught marketing twerps playing Do om, wasting time with IRC, and tying up the internet link for long p erio ds of time downloading material from www.playboy.com. People in other departments have even worse;
company pro ts are down, and the new web server has proven to b e more of a marketing em- barrassment than a marketing b onanza. The nal coup came in Decemb er, when a prankster hacked their way into MSC's network and caused computers to display nothing but pictures of snow-covered mountains, Santa Claus, and reindeers for an entire week.
After the \Christmas server" incident, MSC's CEO went ballistic. She told the CIO that he had until the end of February to solve the problems or nd a replacement. A rewall was installed immediately, which help ed prevent outside hackers from breaking in. The next step involves con- trolling MSC's Internet link to restrict its abuse by MSC employees. You have b een hired as a contractor to write a con gurable HTTP proxy server that would allow MSC employees access to the Web on a need-to-browse basis.
HTTP requests consist of roughly three steps: (i) the client application parses the desired URL to determine the machine name and TCP p ort numb er of the desired web page, and makes a TCP `so cket' connection to that machine and p ort; (ii) the client application sends a request message to the server on the new so cket connection, sp ecifying what op eration it wants p erformed (most often, a GET op eration to get the contents of a web page), and; (iii) the server application p erforms the desired op eration and returns the resulting data (most often a web page in html format) to the client on the established TCP connection. When a client application is con gured to connect via a proxy server, the client establishes its connection to the proxy server's machine and TCP p ort and forwards its complete request to the proxy server rather than the \real" web server. The HTTP proxy server accepts lo cal HTTP connections and forwards them to their nal destination { essentially it intro duces an extra \hop" b etween the client browser and the web server. There are several reasons why you would use a proxy server instead of connecting to the web site via the Internet directly: you might b e b ehind a rewall where direct connections are not allowed; a proxy can b e used to lo cally cache data in order to reduce the amount of global tra c; or|in our case|the proxy is used to control access to the Web on a p er host/p er lo cation basis.
The ob jectives of this assignment are to help you learn how to write networking co de using the Berkeley so ckets API, to familiarize you with one of the most widely used Internet application proto cols (http), and to give you exp erience in building client/server applications. An interesting thing ab out proxy servers is that they b ehave like a server when they accept requests from lo cal clients, but they act like a client when they forward their lo cal client requests along to the real web server. Thus, by implementing a proxy server, you will get a feel for how b oth sides of a client/server application are implemented. The following sections detail the set of op erations that your proxy server must p erform to solve MSC's internet problem. In addition to this do cument, we have provided a numb er of other resources to help you implement the pro ject, including:
RFC 1945: Hyp ertext Transfer Proto col (HTTP 1.0). This do cument describ es the format of HTTP proto col requests and resp onses, which is what your server will b e receiving and generating.
RFC 1738: Uniform Resource Lo cators (URL). This do cument describ es the format of URLs, which your server will need to parse. Most of the legal URLs sp eci ed in RFC 1738 are not supp orted by the MSC proxy server, however, so consider this text supplemental.
RFC 1808: Relative Uniform Resource Lo cators. This do cument builds on RFC 1738. Like that do cument, RFC 1808 is supplemental.
Berkeley so ckets user guide and examples: This collection of do cuments and sample pro- grams should prove useful for getting started on programming your proxy server using Berke- ley so ckets. Please refer to the \Reference / Resource" section in the pro ject web page (http://www.cs.uh.edu/~jsteach/cosc6377/).
Unix man pages: in addition to the high level primer describ ed ab ove, you should b e sure to read the relevant Unix man pages on gethostbyname(), gethostbyaddr(), byteorder, socket(), bind(), listen(), accept(), and select().
Unix Network Programming sampling: The b o ok, \Unix Network Programming" by W. Richard Stevens is considered the quintessential guide to programming network applications (clients or server) on Unix. You might consider grabbing or b orrowing a copy of the complete text if you want more background material or examples.
Your proxy must implement a subset of the HTTP/1.0 proto col. The proto col is describ ed in RFC 1945. In this section, we describ e the steps that your server must p erform whenever a client request arrives. Up on startup, your server should read a con guration le, op en a TCP so cket on a p ort indicated in the con guration le and start accepting requests. Whenever a connect request arrives, you should translate the client's IP address to a host name and check whether the client is allowed to access the Web. If the client is not authorized, your server must refuse the connection immediately { those marketing dweebs have b een cut o the net entirely until further notice!
2.1 Pro cessing an Incoming Request
For authorized clients, you must accept the request and parse it. A request consists of a request line, followed by request header elds. The request line has three elds: metho d, URL and proto col
all comp osite (multipart) messages through an image removal lter. Your proxy server will not need to p erform the actual ltering! We will provide a set of lters. However, your proxy must examine the media typ e of returned do cuments to determine if the message b o dy should b e handed to a lter that has b een registered for that typ e of data, and if so, replace the returned message b o dy with the output of the sp eci ed lter. The interface for connecting to lter programs is describ ed b elow.
2.3 Other Requirements
Your server should b e capable of handling multiple connections concurrently. This means that you must b e very careful when using system calls that could blo ck the server (i.e., reading or writing from/to a so cket). There are two common ways to handle multiple connections concurrently: (i) forking o a child pro cess for each connection and (ii) using the select(2) system call to p erform asynchronous op erations. You may use whichever technique you prefer. Note that you are not required you fork a pro cess or use select during the resolution of host addresses using the DNS interface. Although functions such as gethostbyname or gethostbyaddr blo ck waiting for an answer, you may blo ck the server while p erforming these op erations. The HTTP proto col de nes persistent connections that are used to send and receive more than one request. If you investigate HTTP requests pro duced by various client browsers, and Netscap e Navigator in particular, you will see header eld Proxy-Connection: Keep-Alive. This indicates that you should not close the connection when the last byte of the reply is sent. We do not require that you implement p ersistent connections { you may close so cket connections as so on as you nish using them.
The runtime b ehaviour of the MSC HTTP proxy server is controlled by means of a con guration le. Figure 1 illustrates a sample proxy con guration le. A con guration le consists of a sequence of lines containing comments or commands. Com- ments are initiated by a p ound sign (#) and extend to the end of the line. Line continuation is done by means of a backslash character (). Each non-empty line consists of a sequence of tokens that make up a command (clause). If a token contains white space, it must b e enclosed in double quotes (``''). No sp ecial escap e sequences are recognized in strings.
3.1 Con guration File Contents
There are six clauses that can app ear in a con guration le:
port n If present, this clause sp eci es the TCP p ort numb er your server should listen on. If omitted, the p ort numb er is indicated by the environment variable PROXY PORT. If the clause is not present and PROXY PORT is unde ned, the default http p ort (80) should b e used.
refuse pattern This clause sp eci es hosts from whom connections have should b e refused. Argu- ment pattern is an extended regular expression; for more information ab out regular expres- sions, consult the regex(5) man page.
block pattern This clause sp eci es that all URLs matching pattern are blo cked { an attempt to access them should return an error message.
port 5000
refuse marketing.mscorp.com # blocks complete `marketing' group
block www.microsoft.com # Microsloth is a bad location
redirect www.netscape.com/(.*) "www3.netscape.com/\1" # www3 server is preferrable to www.
rewrite User-Agent "Netscape (.*)" "Mozilla \1"
filter text/(.*) "scramble -swedish -\1"
Figure 1 Sample proxy con guration le
path. The host name should b e followed by a slash character, even if path in the original URL is empty. If the return value indicates that a new lo cation should b e used, the result string should b e regarded as a full URL (without the http: pre x).
int proxy hfield(proxy t* p, const char* key, const char* value, char* buf, size t bufsiz) : This command rewrites request header elds. The string key is the header keyword { the rst token b efore the colon. The string value p oints to the remainder of the header eld, i.e., the characters after the colon. This function returns PROXY_OK if the mo di cation was successful, and PROXY_ERROR if the bu er was to o small. The new header value (without the key pre x) is available in buf.
int proxy filter(proxy t* p, const char* media, char* buf, size t bufsiz) : This command checks to see whether a lter has b een sp eci ed for media. It returns PROXY_OK if no lter has b een de ned, PROXY_FILTER otherwise. If a lter has b een installed, buf contains a string describing the lter command to b e executed. Note that the command is not split into individual arguments { to execute the command, you should either construct the argument list yourself or execute /bin/sh -c command.
The le proxy.h contains the C header le declarations and de nitions needed to use the library routines. The library co de itself can b e found in the les libproxy.a and librx.a. Di erent versions of the library les are available for di erent architectures and op erating systems. In general it is imp ortant that you use only a single architecture throughout the pro ject (e.g., do all of your development on SunOS or Linux or ... ) or carefully clean up your links to library les and .o les whenever you switch platforms.
To complete this assignment, you need to implement an MSC proxy server that runs on a machine in the 547 PGH lab (either on SunOS or Linux) 2 , and provides the functionality describ ed ab ove. The MSC proxy server pro ject will constitute 20% of your nal course grade. It is a term pro ject, and you should get started on it immediately! Be sure to submit the source co de les and your make le. Do not turn in executable and object les. Be sure to submit every le that we will need in order to recreate your executable, including all header les, .c les and makefile's. Your executable should b e called proxy in your makefile. Also b e sure to submit any external do cumentation that you've written (e.g., a README le) and comment your co de thoroughly and clearly. This pro ject will b e graded primarily on correctness, clarity of your overall design and organi- zation, and clear do cumentation (b oth internal and external) This pro ject is a group pro ject - groups can consist of one or two p eople. You are on your own to form your groups { feel free to use the class mailing list or newsgroup to help nd partners. If you form a two-p erson group, part of your exteral do cumentation should b e included who did what part of the work.
(^2) For grading purp ose, please make sure your programs will run on Linux machines in 547 PGH. If your program can b e compiled using gcc on SunOS machines, it should b e able to run on Linux machines.
Your rst step should b e to read the relevant do cumentation cited ab ove. Make sure you feel comfortable with what this assignment entails b efore sp ending an inordinate amount of time writing co de. A little bit of careful design can go a long way! Directory ~jsteach/www/cosc6377/proxy is the base directory for this pro ject. You should start by copying all the les in the skel directory in a directory where you intend to work. Relevant do cumentation for this pro ject is in the doc sub directory. You might want to p oll it every now and then for new information. Additional do cumentation and links will b e on the class home page http://www.cs.uh.edu/~jsteach/cosc6377/.