Unix network programming: a practical approach -- part 2

By: J.D. Hildebrand

Abstract: Here's the second part of James Buchanan's tutorial on network programming in Linux and Unix.

Unix network programming: a practical approach -- part 2

By James Buchanan

In this article we'll have a look at some conversion functions, write a small library of reusable code for network programs, and revisit the TCP echo client and server from the previous installment before we progress to bigger and better programs.

THE SOCKET ADDRESS STRUCTURE

As we saw in part 1 of this series, the sockaddr_in structure contains these data members: sin_family, sin_port, sin_addr, sin_len and sin_zero (which is unused). The data stored in the structure is in network byte order, which is big-endian.

The difference between big-endian and little-endian is in which direction increasing memory addresses go. Most data types are multiple-byte values, and unfortunately there is no standard in "endianness." For example in a word, which is a 16-bit value, there are two eight-bit bytes, each of which has its own address in memory. The first byte (low-order byte) is the byte which contains the LSB (least significant bit) and the last byte (high-order byte) contains the MSB (most significant bit). There are two ways to store the 16-bit word: with the first byte of the data at the beginning address of the 16-bit word, or with the last byte of the 16-bit word in the beginning address of the 16-bit word. These two methods of storage are the source of the terms "little-endian" and "big-endian."

Addresses in the sockaddr_in structure are big-endian. Some machines are little-endian. So if we wish to print values from the structure we will need conversion functions. If our host is little-endian, the conversion functions will convert values from big-endian to little-endian. If the host is big-endian, the conversion functions may be defined as null macros.

USEFUL CONVERSION FUNCTIONS

Here are the prototypes of some conversion functions and a description of what they do.

#include <netinet/in.h>
uint16_t htons(uint16_t host16bitvalue);
htons takes a host's 16-bit value, say an integer, and returns the network byte ordered equivalent. htons stands for "host to network short."
#include <netinet/in.h>
uint32_t htonl(uint32_t host32bitvalue);
htonl takes a host's 32-bit value and returns the network byte ordered equivalent. htonl stands for "host to network long."
#include <netinet/in.h>
uint16_t ntohs(uint16_t network16bitvalue);
ntohs takes the network byte ordered 16-bit value and returns it in the host's byte order. ntohs stands for "network to host short".
#include <netinet/in.h>
uint32_t ntohl(uint32_t network32bitvalue);
ntohl takes a network byte ordered 32-bit value and returns it in the host's byte order. ntohl stands for "network to host long."
#include <arpa/inet.h>
int inet_pton(int family, const char *ip_str, void *socket_address);
inet_pton stakes a dotted ASCII string representing an IPv4 or IPv6 address and stores it in network byte order through the socket address structure, which is the third parameter. The dotted string IP address is the second parameter, ip_str. The first parameter is the family, either AF_INET or AF_INET6. The p stands for "presentation" and the n stands for "numeric." Presentation means the ASCII string representation of the address, and numeric means the binary network byte ordered representation.

inet_pton returns 1 if OK, 0 if the input is not a valid presentation format, and -1 on error.

#include <arpa/inet.h>
const char* inet_ntop(int family, const void *socket_address, char *ip_str, size_t len);
inet_ntop does the reverse of inet_pton. It takes a binary representation (numeric) of the IP address stored in the socket address structure and converts it to an ASCII string (presentation) format, storing the new IP address string in ip_str. family is AF_INET or AF_INET6 and len is the size of the socket address structure, which we will give through the use of the sizeof operator.

inet_ntop returns a pointer to the result (the ASCII string IP address) if OK, or NULL on error.

In old network code, you will see old conversion functions that work only with IPv4, which are: inet_aton (ASCII to numeric), inet_ntoa (numeric to ASCII) and inet_addr (returns a 32-bit binary network byte ordered IPv4 address, or INADDR_NONE on error). inet_aton corresponds to inet_pton and inet_ntoa corresponds to inet_ntop. We use the newer inet_pton and inet_notp because they work with both IPv4 and IPv6 addresses.

Here are some other functions which can be thought of as being related to conversion functions:

#include <netdb.h>
struct hostent *gethostbyaddr(const char *addr, size_t len, int> family);
gethostbyaddr does a DNS lookup and tries to find the name of the server corresponding to the dotted string IP address. It returns a pointer to the a hostent structure, or NULL on error. hostent contains a data member called h_name (host name) which is what will be set for us when the function returns. The addr parameter is not a char pointer, but actually is a pointer to a hostent structure, or is supposed to be anyway. len is the size of the hostent structure, and family is AF_INET or AF_INET6, so you can use it with IPv4 or IPv6 applications.

You pass a hostent struct with the h_addr_list data member set. It is actually a pointer to an array of pointers. So if you had a hostent struct called host, you could write strcpy(host.h_addr_list[0], dotted_string_ip_address); to get the "name" of the host from its (or one of its) IP Address(es).

#include <netdb.h>
struct hostent *gethostbyname(const char *hostname);
gethostbyname does the reverse of gethostbyaddress. It takes the host's name and finds its corresponding dotted string IP address. Since a host can have many IP addresses, h_addr_list may contain more than one IP address. It returns a hostent struct or NULL on error.

With both gethostbyaddr and gethostbyname you can loop through the pointer to the array of pointers within the hostent structure by testing for hostent->h_addr_list[array_index] == NULL because the last pointer will be NULL.

MAKING THE ECHO SERVER CONCURRENT

The echo server which was the download for part 1 of this series is an iterative server. That is, it cannot service multiple clients simultaneously. We can change this by forking a new server process each time a client connection is accepted. This means that each client will essentially get its own server. The execution of the forked process begins at the point that you call fork and ends at the point that the forked process calls exit or returns. (fork actually returns twice, once for the parent process and once for the child process, so it can tell whether it is the child or parent.)

Here is the prototype for fork:

#include <unistd.h>
pid_t fork(void);
Nice and simple, eh? pid_t is the process ID of the child in the parent and 0 in the child, or -1 on error. All open descriptors, including socket descriptors, are shared by the parent and its children. Here is an example, showing the fork and neglecting the rest:
... somewhere declare a pid_t called child_pid perhaps...
... server calls accept to accept a client connection, then
...
if ( (child_pid = fork()) == 0) {
  /* ... service client ... */
  exit(0);    /* successful termination of child server */
}
The only change we need to make in the echo server is after accept returns, to fork a copy of the server to service the client and then terminate execution of the child server with a call to exit (apart from replacing common code with our custom library functions. The client in the download for this article will also have its common code replaced with our own library routines). A concurrent echo server is overkill, but serves as a good introduction to using fork. Get a copy of the concurrent TCP echo server here.

A LIBRARY OF REUSABLE NETWORK CODE

Instead of calling socket, bind, accept, connect and listen and checking for errors and then exiting if we find an error within the main program code, (and so cluttering the main program with ugly and distracting error detecting code), we will write wrapper functions that encapsulate the calls to socket, bind, accept, connect, and listen, for which we won't have to check the return values in the main program. If errors are detected, the main program will be terminated from within the library code where the wrapper functions are defined. We'll prefix all the function names in the wrappers with custom_ so that we (and readers of our code) know that they are our own custom wrapper functions when we look at the code at a later date. So socket becomes custom_socket, accept becomes custom_accept, bind custom_bind, connect custom_connect and listen custom_listen.

The sources for the custom_networking library of wrapper functions is in the download for this article, which you can get here.

The parameters that our custom wrapper functions take are identical to those of the functions they call.

A PREVIEW OF PART 3

In the next installment we'll write a simple iterative chat server and a chat client. You should have some friends who are willing to run the client and a server to run the chat server so you can try it out. See the instructions file in the download. Please don't email me asking how to use the chat system until you have read and followed the instructions.

I hope that you liked this article. If you found anything difficult to understand or you found any errors please email me.


Server Response from: ETNASC02