













































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of socket programming, focusing on the use of structs, system calls, and functions. It covers the creation of a socket using socket(), the bind() system call, the connect() system call, and the send() and recv() system calls. The document also discusses the use of select() for handling multiple sockets and the getpeername() function for obtaining peer information.
Typology: Exercises
1 / 85
This page cannot be seen from the preview
Don't miss anything!














































































Well, guess what! I've already done this nasty business, and I'm dying to share the information with everyone! You've come to the right place. This document should give the average competent C programmer the edge s/he needs to get a grip on this networking noise.
This document has been written as a tutorial, not a reference. It is probably at its best when read by individuals who are just starting out with socket programming and are looking for a foothold. It is certainly not the complete guide to sockets programming, by any means.
Hopefully, though, it'll be just enough for those man pages to start making sense... :-)
The code contained within this document was compiled on a Linux PC using Gnu's gcc compiler. It should, however, build on just about any platform that uses gcc. Naturally, this doesn't apply if you're programming for Windows--see the section on Windows programming, below.
This official location of this document is http://beej.us/guide/bgnet/.
When compiling for Solaris or SunOS, you need to specify some extra command-line switches for linking in the proper libraries. In order to do this, simply add "-lnsl -lsocket -lresolv" to the end of the compile command, like so:
$ cc -o server server.c -lnsl -lsocket -lresolv
If you still get errors, you could try further adding a "-lxnet" to the end of that command line. I don't know what that does, exactly, but some people seem to need it.
Another place that you might find problems is in the call to setsockopt(). The prototype differs from that on my Linux box, so instead of:
int yes=1;
enter this:
char yes='1';
As I don't have a Sun box, I haven't tested any of the above information--it's just what people have told me through email.
I have a particular dislike for Windows, and encourage you to try Linux, BSD, or Unix instead. That being said, you can still use this stuff under Windows.
First, ignore pretty much all of the system header files I mention in here. All you need to include is:
#include <winsock.h>
Wait! You also have to make a call to WSAStartup() before doing anything else with the sockets library. The code to do that looks something like this:
#include <winsock.h>
{ WSADATA wsaData; // if this doesn't work //WSAData wsaData; // then try this instead
if (WSAStartup(MAKEWORD(1, 1), &wsaData) != 0) { fprintf(stderr, "WSAStartup failed.\n"); exit(1); }
You also have to tell your compiler to link in the Winsock library, usually called wsock32.lib or winsock32.lib or somesuch. Under VC++, this can be done through the Project menu, under Settings.... Click the Link tab, and look for the box titled "Object/library modules". Add "wsock32.lib" to that list.
Or so I hear.
Finally, you need to call WSACleanup() when you're all through with the sockets library. See your online help for details.
If you want to translate the guide into another language, write me at [email protected] and I'll link to your translation from the main page.
Feel free to add your name and email address to the translation.
Sorry, but due to space constraints, I cannot host the translations myself.
Beej's Guide to Network Programming is Copyright © 2005 Brian "Beej" Hall.
This guide may be freely reprinted in any medium provided that its content is not altered, it is presented in its entirety, and this copyright notice remains intact.
Educators are especially encouraged to recommend or supply copies of this guide to their students.
This guide may be freely translated into any language, provided the translation is accurate, and the guide is reprinted in its entirety. The translation may also include the name and contact information for the translator.
The C source code presented in this document is hereby granted to the public domain.
Contact [email protected] for more information.
You hear talk of "sockets" all the time, and perhaps you are wondering just what they are exactly. Well, they're this: a way to speak to other programs using standard Unix file descriptors.
What?
Ok--you may have heard some Unix hacker state, "Jeez, everything in Unix is a file!" What that person may have been talking about is the fact that when Unix programs do any sort of I/O, they do it by reading or writing to a file descriptor. A file descriptor is simply an integer associated
with an open file. But (and here's the catch), that file can be a network connection, a FIFO, a pipe, a terminal, a real on-the-disk file, or just about anything else. Everything in Unix is a file! So when you want to communicate with another program over the Internet you're gonna do it through a file descriptor, you'd better believe it.
"Where do I get this file descriptor for network communication, Mr. Smarty-Pants?" is probably the last question on your mind right now, but I'm going to answer it anyway: You make a call to the socket() system routine. It returns the socket descriptor, and you communicate through it using the specialized send() and recv() ( man send , man recv ) socket calls.
"But, hey!" you might be exclaiming right about now. "If it's a file descriptor, why in the name of Neptune can't I just use the normal read() and write() calls to communicate through the socket?" The short answer is, "You can!" The longer answer is, "You can, but send() and recv() offer much greater control over your data transmission."
What next? How about this: there are all kinds of sockets. There are DARPA Internet addresses (Internet Sockets), path names on a local node (Unix Sockets), CCITT X.25 addresses (X. Sockets that you can safely ignore), and probably many others depending on which Unix flavor you run. This document deals only with the first: Internet Sockets.
What's this? There are two types of Internet sockets? Yes. Well, no. I'm lying. There are more, but I didn't want to scare you. I'm only going to talk about two types here. Except for this sentence, where I'm going to tell you that "Raw Sockets" are also very powerful and you should look them up.
All right, already. What are the two types? One is "Stream Sockets"; the other is "Datagram Sockets", which may hereafter be referred to as " SOCK_STREAM " and " SOCK_DGRAM ", respectively. Datagram sockets are sometimes called "connectionless sockets". (Though they can be connect()'d if you really want. See connect(), below.)
Stream sockets are reliable two-way connected communication streams. If you output two items into the socket in the order "1, 2", they will arrive in the order "1, 2" at the opposite end. They will also be error free. Any errors you do encounter are figments of your own deranged mind, and are not to be discussed here.
What uses stream sockets? Well, you may have heard of the telnet application, yes? It uses stream sockets. All the characters you type need to arrive in the same order you type them, right? Also, web browsers use the HTTP protocol which uses stream sockets to get pages. Indeed, if you telnet to a web site on port 80, and type "GET / HTTP/1.0" and hit RETURN twice, it'll dump the HTML back at you!
(and rarely a footer) by the first protocol (say, the TFTP protocol), then the whole thing (TFTP header included) is encapsulated again by the next protocol (say, UDP), then again by the next (IP), then again by the final protocol on the hardware (physical) layer (say, Ethernet).
When another computer receives the packet, the hardware strips the Ethernet header, the kernel strips the IP and UDP headers, the TFTP program strips the TFTP header, and it finally has the data.
Now I can finally talk about the infamous Layered Network Model. This Network Model describes a system of network functionality that has many advantages over other models. For instance, you can write sockets programs that are exactly the same without caring how the data is physically transmitted (serial, thin Ethernet, AUI, whatever) because programs on lower levels deal with it for you. The actual network hardware and topology is transparent to the socket programmer.
Without any further ado, I'll present the layers of the full-blown model. Remember this for network class exams:
Application Presentation Session Transport Network Data Link Physical
The Physical Layer is the hardware (serial, Ethernet, etc.). The Application Layer is just about as far from the physical layer as you can imagine--it's the place where users interact with the network.
Now, this model is so general you could probably use it as an automobile repair guide if you really wanted to. A layered model more consistent with Unix might be:
Application Layer ( telnet, ftp, etc. ) Host-to-Host Transport Layer ( TCP, UDP ) Internet Layer ( IP and routing ) Network Access Layer ( Ethernet, ATM, or whatever )
At this point in time, you can probably see how these layers correspond to the encapsulation of the original data.
See how much work there is in building a simple packet? Jeez! And you have to type in the packet headers yourself using " cat "! Just kidding. All you have to do for stream sockets is send() the data out. All you have to do for datagram sockets is encapsulate the packet in the method of your choosing and sendto() it out. The kernel builds the Transport Layer and
Internet Layer on for you and the hardware does the Network Access Layer. Ah, modern technology.
So ends our brief foray into network theory. Oh yes, I forgot to tell you everything I wanted to say about routing: nothing! That's right, I'm not going to talk about it at all. The router strips the packet to the IP header, consults its routing table, blah blah blah. Check out the IP RFC if you really really care. If you never learn about it, well, you'll live.
Well, we're finally here. It's time to talk about programming. In this section, I'll cover various data types used by the sockets interface, since some of them are a real bear to figure out.
First the easy one: a socket descriptor. A socket descriptor is the following type:
int
Just a regular int.
Things get weird from here, so just read through and bear with me. Know this: there are two byte orderings: most significant byte (sometimes called an "octet") first, or least significant byte first. The former is called "Network Byte Order". Some machines store their numbers internally in Network Byte Order, some don't. When I say something has to be in Network Byte Order, you have to call a function (such as htons()) to change it from "Host Byte Order". If I don't say "Network Byte Order", then you must leave the value in Host Byte Order.
(For the curious, "Network Byte Order" is also known as "Big-Endian Byte Order".)
My First Struct TM--struct sockaddr. This structure holds socket address information for many types of sockets:
struct sockaddr { unsigned short sa_family; // address family, AF_xxx char sa_data[14]; // 14 bytes of protocol address };
sa_family can be a variety of things, but it'll be AF_INET for everything we do in this document. sa_data contains a destination address and port number for the socket. This is rather unwieldy since you don't want to tediously pack the address in the sa_data by hand.
It's almost too easy...
You can use every combination of "n", "h", "s", and "l" you want, not counting the really stupid ones. For example, there is NOT a stolh() ("Short to Long Host") function--not at this party, anyway. But there are:
htons() -- "Host to Network Short" htonl() -- "Host to Network Long" ntohs() -- "Network to Host Short" ntohl() -- "Network to Host Long"
Now, you may think you're wising up to this. You might think, "What do I do if I have to change byte order on a char?" Then you might think, "Uh, never mind." You might also think that since your 68000 machine already uses network byte order, you don't have to call htonl() on your IP addresses. You would be right, BUT if you try to port to a machine that has reverse network byte order, your program will fail. Be portable! This is a Unix world! (As much as Bill Gates would like to think otherwise.) Remember: put your bytes in Network Byte Order before you put them on the network.
A final point: why do sin_addr and sin_port need to be in Network Byte Order in a struct sockaddr_in, but sin_family does not? The answer: sin_addr and sin_port get encapsulated in the packet at the IP and UDP layers, respectively. Thus, they must be in Network Byte Order. However, the sin_family field is only used by the kernel to determine what type of address the structure contains, so it must be in Host Byte Order. Also, since sin_family does not get sent out on the network, it can be in Host Byte Order.
Fortunately for you, there are a bunch of functions that allow you to manipulate IP addresses. No need to figure them out by hand and stuff them in a long with the << operator.
First, let's say you have a struct sockaddr_in ina, and you have an IP address "10.12.110.57" that you want to store into it. The function you want to use, inet_addr(), converts an IP address in numbers-and-dots notation into an unsigned long. The assignment can be made as follows:
ina.sin_addr.s_addr = inet_addr("10.12.110.57");
Notice that inet_addr() returns the address in Network Byte Order already--you don't have to call htonl(). Swell!
Now, the above code snippet isn't very robust because there is no error checking. See, inet_addr() returns -1 on error. Remember binary numbers? (unsigned)-1 just happens to
correspond to the IP address 255.255.255.255! That's the broadcast address! Wrongo. Remember to do your error checking properly.
Actually, there's a cleaner interface you can use instead of inet_addr(): it's called inet_aton() ("aton" means "ascii to network"):
#include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h>
int inet_aton(const char *cp, struct in_addr *inp);
And here's a sample usage, while packing a struct sockaddr_in (this example will make more sense to you when you get to the sections on bind() and connect().)
struct sockaddr_in my_addr;
my_addr.sin_family = AF_INET; // host byte order my_addr.sin_port = htons(MYPORT); // short, network byte order inet_aton("10.12.110.57", &(my_addr.sin_addr)); memset(&(my_addr.sin_zero), '\0', 8); // zero the rest of the struct
inet_aton(), unlike practically every other socket-related function , returns non-zero on success, and zero on failure. And the address is passed back in inp.
Unfortunately, not all platforms implement inet_aton() so, although its use is preferred, the older more common inet_addr() is used in this guide.
All right, now you can convert string IP addresses to their binary representations. What about the other way around? What if you have a struct in_addr and you want to print it in numbers-and- dots notation? In this case, you'll want to use the function inet_ntoa() ("ntoa" means "network to ascii") like this:
printf("%s", inet_ntoa(ina.sin_addr));
That will print the IP address. Note that inet_ntoa() takes a struct in_addr as an argument, not a long. Also notice that it returns a pointer to a char. This points to a statically stored char array within inet_ntoa() so that each time you call inet_ntoa() it will overwrite the last IP address you asked for. For example:
char *a1, *a2;
a1 = inet_ntoa(ina1.sin_addr); // this is 192.168.4. a2 = inet_ntoa(ina2.sin_addr); // this is 10.12.110. printf("address 1: %s\n",a1);
This is the section where we get into the system calls that allow you to access the network functionality of a Unix box. When you call one of these functions, the kernel takes over and does all the work for you automagically.
The place most people get stuck around here is what order to call these things in. In that, the man pages are no use, as you've probably discovered. Well, to help with that dreadful situation, I've tried to lay out the system calls in the following sections in exactly (approximately) the same order that you'll need to call them in your programs.
That, coupled with a few pieces of sample code here and there, some milk and cookies (which I fear you will have to supply yourself), and some raw guts and courage, and you'll be beaming data around the Internet like the Son of Jon Postel!
I guess I can put it off no longer--I have to talk about the socket() system call. Here's the breakdown:
#include <sys/types.h> #include <sys/socket.h>
int socket(int domain, int type, int protocol);
But what are these arguments? First, domain should be set to " PF_INET ". Next, the type argument tells the kernel what kind of socket this is: SOCK_STREAM or SOCK_DGRAM. Finally, just set protocol to " 0 " to have socket() choose the correct protocol based on the type. (Notes: there are many more domain s than I've listed. There are many more type s than I've listed. See the socket() man page. Also, there's a "better" way to get the protocol , but specifying 0 works in 99.9% of all cases. See the getprotobyname() man page if you're curious.)
socket() simply returns to you a socket descriptor that you can use in later system calls, or - on error. The global variable errno is set to the error's value (see the perror() man page.)
(This PF_INET thing is a close relative of the AF_INET that you used when initializing the sin_family field in your struct sockaddr_in. In fact, they're so closely related that they actually have the same value, and many programmers will call socket() and pass AF_INET as the first argument instead of PF_INET. Now, get some milk and cookies, because it's times for a
story. Once upon a time, a long time ago, it was thought that maybe a address family (what the "AF" in " AF_INET " stands for) might support several protocols that were referred to by their protocol family (what the "PF" in " PF_INET " stands for). That didn't happen. And they all lived happily ever after, The End. So the most correct thing to do is to use AF_INET in your struct sockaddr_in and PF_INET in your call to socket().)
Fine, fine, fine, but what good is this socket? The answer is that it's really no good by itself, and you need to read on and make more system calls for it to make any sense.
Once you have a socket, you might have to associate that socket with a port on your local machine. (This is commonly done if you're going to listen() for incoming connections on a specific port--MUDs do this when they tell you to "telnet to x.y.z port 6969".) The port number is used by the kernel to match an incoming packet to a certain process's socket descriptor. If you're going to only be doing a connect(), this may be unnecessary. Read it anyway, just for kicks.
Here is the synopsis for the bind() system call:
#include <sys/types.h> #include <sys/socket.h>
int bind(int sockfd, struct sockaddr *my_addr, int addrlen);
sockfd is the socket file descriptor returned by socket(). my_addr is a pointer to a struct sockaddr that contains information about your address, namely, port and IP address. addrlen can be set to sizeof(struct sockaddr).
Whew. That's a bit to absorb in one chunk. Let's have an example:
#include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h>
#define MYPORT 3490
main() { int sockfd; struct sockaddr_in my_addr;
sockfd = socket(PF_INET, SOCK_STREAM, 0); // do some error checking!
Sometimes, you might notice, you try to rerun a server and bind() fails, claiming "Address already in use." What does that mean? Well, a little bit of a socket that was connected is still hanging around in the kernel, and it's hogging the port. You can either wait for it to clear (a minute or so), or add code to your program allowing it to reuse the port, like this:
int yes=1; //char yes='1'; // Solaris people use this
// lose the pesky "Address already in use" error message if (setsockopt(listener,SOL_SOCKET,SO_REUSEADDR,&yes,sizeof(int)) == -1) { perror("setsockopt"); exit(1); }
One small extra final note about bind(): there are times when you won't absolutely have to call it. If you are connect()ing to a remote machine and you don't care what your local port is (as is the case with telnet where you only care about the remote port), you can simply call connect(), it'll check to see if the socket is unbound, and will bind() it to an unused local port if necessary.
Let's just pretend for a few minutes that you're a telnet application. Your user commands you (just like in the movie TRON ) to get a socket file descriptor. You comply and call socket(). Next, the user tells you to connect to "10.12.110.57" on port " 23 " (the standard telnet port.) Yow! What do you do now?
Lucky for you, program, you're now perusing the section on connect()--how to connect to a remote host. So read furiously onward! No time to lose!
The connect() call is as follows:
#include <sys/types.h> #include <sys/socket.h>
int connect(int sockfd, struct sockaddr *serv_addr, int addrlen);
sockfd is our friendly neighborhood socket file descriptor, as returned by the socket() call, serv_addr is a struct sockaddr containing the destination port and IP address, and addrlen can be set to sizeof(struct sockaddr).
Isn't this starting to make more sense? Let's have an example:
#include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h>
#define DEST_IP "10.12.110.57" #define DEST_PORT 23
main() { int sockfd; struct sockaddr_in dest_addr; // will hold the destination addr
sockfd = socket(PF_INET, SOCK_STREAM, 0); // do some error checking!
dest_addr.sin_family = AF_INET; // host byte order dest_addr.sin_port = htons(DEST_PORT); // short, network byte order dest_addr.sin_addr.s_addr = inet_addr(DEST_IP); memset(&(dest_addr.sin_zero), '\0', 8); // zero the rest of the struct
// don't forget to error check the connect()! connect(sockfd, (struct sockaddr *)&dest_addr, sizeof(struct sockaddr)); . . .
Again, be sure to check the return value from connect()--it'll return -1 on error and set the variable errno.
Also, notice that we didn't call bind(). Basically, we don't care about our local port number; we only care where we're going (the remote port). The kernel will choose a local port for us, and the site we connect to will automatically get this information from us. No worries.
Ok, time for a change of pace. What if you don't want to connect to a remote host. Say, just for kicks, that you want to wait for incoming connections and handle them in some way. The process is two step: first you listen(), then you accept() (see below.)
The listen call is fairly simple, but requires a bit of explanation:
int listen(int sockfd, int backlog);
sockfd is the usual socket file descriptor from the socket() system call. backlog is the number of connections allowed on the incoming queue. What does that mean? Well, incoming connections are going to wait in this queue until you accept() them (see below) and this is the
Like before, this is a bunch to absorb in one chunk, so here's a sample code fragment for your perusal:
#include <string.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h>
#define MYPORT 3490 // the port users will be connecting to
#define BACKLOG 10 // how many pending connections queue will hold
main() { int sockfd, new_fd; // listen on sock_fd, new connection on new_fd struct sockaddr_in my_addr; // my address information struct sockaddr_in their_addr; // connector's address information int sin_size;
sockfd = socket(PF_INET, SOCK_STREAM, 0); // do some error checking!
my_addr.sin_family = AF_INET; // host byte order my_addr.sin_port = htons(MYPORT); // short, network byte order my_addr.sin_addr.s_addr = INADDR_ANY; // auto-fill with my IP memset(&(my_addr.sin_zero), '\0', 8); // zero the rest of the struct
// don't forget your error checking for these calls: bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
listen(sockfd, BACKLOG);
sin_size = sizeof(struct sockaddr_in); new_fd = accept(sockfd, (struct sockaddr *)&their_addr, &sin_size); . . .
Again, note that we will use the socket descriptor new_fd for all send() and recv() calls. If you're only getting one single connection ever, you can close() the listening sockfd in order to prevent more incoming connections on the same port, if you so desire.
These two functions are for communicating over stream sockets or connected datagram sockets. If you want to use regular unconnected datagram sockets, you'll need to see the section on sendto() and recvfrom(), below.
The send() call:
int send(int sockfd, const void *msg, int len, int flags);
sockfd is the socket descriptor you want to send data to (whether it's the one returned by socket() or the one you got with accept().) msg is a pointer to the data you want to send, and len is the length of that data in bytes. Just set flags to 0. (See the send() man page for more information concerning flags.)
Some sample code might be:
char *msg = "Beej was here!"; int len, bytes_sent; . . . len = strlen(msg); bytes_sent = send(sockfd, msg, len, 0); . . .
send() returns the number of bytes actually sent out-- this might be less than the number you told it to send! See, sometimes you tell it to send a whole gob of data and it just can't handle it. It'll fire off as much of the data as it can, and trust you to send the rest later. Remember, if the value returned by send() doesn't match the value in len , it's up to you to send the rest of the string. The good news is this: if the packet is small (less than 1K or so) it will probably manage to send the whole thing all in one go. Again, -1 is returned on error, and errno is set to the error number.
The recv() call is similar in many respects:
int recv(int sockfd, void *buf, int len, unsigned int flags);
sockfd is the socket descriptor to read from, buf is the buffer to read the information into, len is the maximum length of the buffer, and flags can again be set to 0. (See the recv() man page for flag information.)
recv() returns the number of bytes actually read into the buffer, or -1 on error (with errno set, accordingly.)
Wait! recv() can return 0. This can mean only one thing: the remote side has closed the connection on you! A return value of 0 is recv()'s way of letting you know this has occurred.
There, that was easy, wasn't it? You can now pass data back and forth on stream sockets! Whee! You're a Unix Network Programmer!