Webserver use, configuration and management

Unit WUCM1

Web servers and the Internet

Web server functions

Basic functions

In essence, the Web Server has a very simple job; to receive URLs from the TCP/IP protocol stack and to translate them into either a filename or a program name. If the URL corresponds to an HTTP file, then send that file back over the Internet. On the other hand, if the URL corresponds to a program, then run the program and send back its output. All the rest of the Web Server tasks are mere trimmings on this basic cake.

The essence of the HyperText Transfer Protocol (HTTP) was discussed in the first year, but to recap briefly, suppose you enter http://www.port.ac.uk/index.html into your WebBrowser. (Remember, the URL has three components, <method>://<host>/<path>. The browser would interpret the example to be a request to use the http: <method>, i.e. use the hypertext transfer protocol to communicate with <host> www.port.ac.uk and then access the file index.html. If you put a trailing slash / on the host identifier it will be interpreted as 'root' of that www site, which in practice will never be the root of the computer file system for security reasons.

Your browser might use this parsed information about your request to send the following message to host: www.port.ac.uk:

GET index.html HTTP/1.1 <CR><LF><CR><LF>

The request would arrive at port 80 (the default HTTP port) on the host www.port.ac.uk. The message is again in three parts; a method, (an HTTP method not a URL method!) that in this case is GET, but could be PUT, POST, DELETE or CONNECT; the Uniform Resource Identifier (URI) "index.html"; and the version of the protocol used in the message (HTTP v1.1 in this case). It is then up to the Web Server to make sense of this message.

The carriage returns (<CR>) and line feeds (<LF>) are an important part of the message. If you use telnet to access a webserver you can manually type these http commands, but you need to add the extra returns to get a response.

Additional functions

Laurie (2003) goes on to list a number of other features of a webserver, as lightly paraphrased below:

There are many other refinements that have been added over the years, usually with the intention of improving the situation in some way, however, the accretion of features often introduces subtle bugs that take a long time to be identified and eliminated. For the purposes of this unit we will only be looking at the basic features in any depth, but do spend some time researching the more esoteric topics on the Internet.

Recap of TCP/IP

(Ref: Laurie, 2003, pp5-13; Hawkins, 2001, pp 261-268)

In order to put the above in a relevant context a short recap on TCP/IP might be a good idea. Firstly the acronym stands for Transmission Control Protocol/Internet Protocol, and represents a large family of related protocols. The software to implement the TCP/IP protocols has been embedded in a wide range of operating systems and applications. TCP/IP is a network protocol, so for example is only installed on a Windows system if you gave at least one network peripheral (a modem or a network interface card (NIC). Once installed, the TCP/IP stack needs configuring, of which more later.

IP addresses

Each computer on a network that will use TCP/IP needs to have an IP address, for example 192.168.27.58, the address of the main computer on a home network. (You will find that many books just happen to choose IP addresses that start with 192.168.x.y as this is a 'private' address, i.e. not a legal address for any node on the Internet, hence always safe to use as none of the Internet connection devices (routers) will forward the messages.)

To check what your PC's IP address is currently configured to, use the ipconfig /all command from a Windows console, as illustrated:

There are four parts to the address, separated by 'dots', each of these parts is a byte in the 4-byte IP address. Whilst the dotted decimal is generally easier for people, it is quite often illumination to convert to binary to see what is being carried out in the address manipulations. This 4-byte form is known as IPv4, but as you might expect there is a revised and much larger IP protocol suite, known as IPv6 (or some times IPng) though for the moment we will stick to the older format used throughout the UoP. IPv6 uses 128 bits (16 bytes) so has room for lots more addresses!

The IP address has two logical components, distinct from the byte divisions above. These are the host number and the network number. The 32 bit binary IP address is split into these two components at what might seem an arbitrary and random point. There is a logic. (See Laurie 2003)

Default IP address partition

The default dividing line is this separation is given by the first few bits; if the first byte is:-

There are a few other special addresses, e.g. 127.0.0.1 is reserved for loopback testing, and is generally set to the name "localhost", i.e. return the message to yourself. As these rules are a bit restrictive, it is possible to divide any given network into smaller parts, called subnets! In order to identify which subnet a host belongs to you need to know the "subnet mask". In this case some of the bits notionally reserved for hosts are used to identify the subnet, but for the purposes of configuring Apache, these details can usually be ignored. If you need a subnet mask, ask, but it is usually safe to take the default suggested.

IP communications

Now that each machine has its IP address, how do they communicate? If both are on the same physical network, e.g. the same Ethernet LAN segment, and both are correctly configured so that they have the same network number and different host numbers, then they can safely send IP packets to each other with out further problems.

What if there are on different networks, say in different buildings? In this case devices with the grand name of routers, find a route from one network to the other, and forward the necessary IP packets. This does of course assume that all are correctly configured, know (or can discover) where the other network is, and that there is at least one physical path between the two networks, usually involving WAN links.

There are two main styles of packet that can be exchanged by two computers using IP:-

One last refinement is to consider how the various different protocols that can be used as part of the TCP/IP suite can co-exist on the same effective TCP 'circuit'. The various services that our server might offer each use a different 'port' or logical channel number. Of specific interest to us is the port used by the HTTP web protocol, which is by convention port 80. (You could configure your webserver to use a different port, but all clients would also need to know - however it might offer a degree of security by obscurity!). A few of the common ports would include:

Each host needs an IP address for each network interface card that it has, thus a host with two Ethernet cards would have two IP addresses. Just to confuse matters, it is perfectly possible to bind more than one IP address to each interface card, more on this later.

Web server names

Now that we have an IP address for our server sorted out, we really need a name as well, so that mere mortals have a chance of remembering it and typing it in correctly. As with the IP addresses, the IP names are divided into parts with dots, but there is no fixed relationship between dotted decimal addresses and dotted Internet names. The mapping is done using some kind of lookup table. In the simple case using a text file (called HOSTS) on each machine with a list of all the names and their corresponding numbers, or using a Name Service to manage the lookup for you. The Internet maintains a large distributed database managed by Domain Name Servers, using the DNS protocol. This will be discussed in more detail in Session 05

The simple HOSTS file is fine for small home networks, but gets difficult to manage when the number of hosts to be listed grows. This problem was faced in the very early days of the Internet, as the original set of mainframes that constituted the Internet did use a file called HOSTS, and is the reason why it is still present in most computers, Windows, Linux etc.

Windows (Win 3.1, Win 95 etc) used to hold the HOSTS file in C:\Windows\System, but more recently (Win2K or WinXP) it is held in C:\WINNT\system32\drivers\etc\ Below is an old example from a home system.

# Copyright (c) 1994 Microsoft Corp. 
#
# This file contains the mappings of IP addresses to host names. Each
# entry should be kept on an individual line. The IP address should
# be placed in the first column followed by the corresponding host name.
# The IP address and the host name should be separated by at least one space.
#
# Additionally, comments (such as these) may be inserted on individual
# lines or following the machine name denoted by a '#' symbol.
#
# For example:
#
# 102.54.94.97 rhino.acme.com # source server
# 38.25.63.10 x.acme.com # x client host
127.0.0.1 localhost
##########################################################################
# Private network addresses
192.168.27.52 RanaEnorma.Home RanaEnorma # Linux Webserver
192.168.27.54 Cecil.Home Cecil # Mary's PC
192.168.27.55 WebbedFrog.Home WebbedFrog # Win2K Server
192.168.27.56 Iceline.Home IceLine # Fred's PC, Dead
192.168.27.57 RanaPerforma.Home RanaPerforma # Liz's PC
192.168.27.58 MeshedFrog.Home MeshedFrog # David's PC
192.168.27.59 RanaPortable.Home RanaPortable # DIS Portable 100baseT
192.168.27.60 DayBat.Home DayBat # Fred's PC
192.168.27.101 RanaNovell.Home RanaNovell # Novell PC, dynamic IP
192.168.27.2 JetServLaser.Home JetServLaser # Downstairs Laser
192.168.27.3 JetServColour.Home JetServColour # Upstairs Colour

As with the IP address, it can be divided into two components, a host name, and a domain name. E.g. www.tech is the host name of the faculty's web server and port.ac.uk is the domain name for the University. At what point you make the division is, in most cases, a bit arbitrary, as it depends on who is asking the question. The question of Domain Names will be addressed in more detail in Session 05.

Web server hardware

In view of the fairly modest tasks required of a web server, fairly modest hardware will suffice to provide a service. The question to resolve is:

Large commercial webservers would include a number of speed enhancements - lots of memory, multiple fast processors, etc., but many are very slim very basic boxes with hard disc, memory, processor and NIC, setup and managed via a web browser, so no screen or keyboard.

Web server software

Since a webserver is essentially very simple in concept, and so ubiquitous in deployment there is a wide range of possible Web Servers to choose from. The most popular two webservers are Apache and the Microsoft Internet Information Server (IIS). Others include the Roxen webserver, Xitami, Enterprise Netware Webserver and many others. Many of these are freely downloadable, though not all will come with 'easy to install' instructions.

For an analysis of the current situation on the availability and capability of different webservers look at http://serverwatch.internet.com/webservers.html. We will concentrate on the Apache web server, which comes in two flavours, one for Unix based systems (including Linux) and one for Windows (9X and NT/2K/XP).

The CommsLab holds a lab full of Windows workstations isolated from the general University network, and will be suitable for your initial experiments. What you try at home is of course up to you, but it would be good to try out some of the configuration experiments in an environment that you can play with for a reasonable span of time.

Client software

Here the position is a little more familiar, what web browser are the users of your site going to have? In addition, what version of the browser are they using? Then of course what plug-ins are they going to have installed?

Web clients

  • Microsoft Internet Explorer
  • Mozilla Firefox
  • Netscape Navigator
  • Opera
  • Lynx

Plug-ins

  • For Multimedia, (e.g. Flash, Quicktime, Realplayer etc.)
  • For executing programs, (e.g. JavaScript, Java, VBscript etc.)
  • For representing data (e.g. XML)

Related servers

A significant number of current web servers are hosting websites that interact with a database for much of their content. There are many database management systems that will connect (reasonably) easily to a webserver to achieve an easily maintainable web-reachable archive of live data.

The database systems range from:

Expensive ones

  • Oracle
  • Microsoft SQL Sever
  • IBM DB2
  • Informix

Free/cheap ones

  • MySQL
  • Postgres
  • Microsoft Access

It is doubtful if there will be time to explore database interaction in any depth, but a simple interaction with a basic database is reasonable easy to set up. In all cases (at least those I have tried) there is a need of some 'glue' code to drive the interaction between the database server and the webserver. The two servers may be co-located on the same hardware platform, or on completely different machines, as ever the cost/performance balancing act.

In order to build these composite applications you need to ponder the following questions:

What programming language?

  • Java
  • PHP
  • ASP
  • Perl
  • C/C++
  • PL/SQL

What application development environment?

  • Oracle WebDB
  • iPortal
  • IBM WebSphere
  • Microsoft InterDev and the various .NET tools
 

Last updated by Prof Jim Briggs of the School of Computing at the University of Portsmouth