Webserver use, configuration and management

Unit WUCM1

Practical: Web Server Installation and Configuration

See Laurie (2003), Chapters 1, 2 and 3

Background

If you do this practical on a machine connected to the Internet, be warned that there is a very small chance that someone out there might be able to take advantage of a incorrectly configured web server to access your machine. If you do this on a machine on the University network, the University firewall should provide protection against this. If you do this practical in the special CommsLab environment, your server will be visible only over a local network supported by an isolated switched hub in the CommsLab. In most real cases, the server would be advertised to the world and reachable via its entry in the DNS. Useful references in this area are (Laurie, 2003), (Arnold, 2000) or (Bloom, 2002) though for longer term administration of a web server, either (Kabir, 1999) or (Iseminger, 1999) would be worth a look. All of these are in the library. Wainwright (1999) includes a wealth of material including relevant security issues.

In order to get Apache working you need to have TCP/IP installed and configured correctly on the machine to be used as web host. For experimental use a standard Windows machines is adequate.  However, for serious release to a wider audience, you will need to move it to a more secure platform - usually Unix in one of its flavours or a Windows server.

Details of how to boot up the Comms Lab PCs into CommsLab mode and login on them are available.

Practical requirements

You can do this practical alone, but if you have two people, or more particularly two networked machines, you can more easily test the client-server interaction.

If you have experimented with setting up Apache in the past you might like to take a back seat for this session. Sessions 2 and 3 are both devoted to getting this initial Apache server up and running, so you should have time to experiment, backtrack and repeat.   If you do not complete the exercise, then the CommsLab can be used when not in use by classes, so you can continue at your leisure.

Note that in the Comms Lab, each machine should be set up with a fixed IP address. These will normally of the form 148.197.158.???. In a real network environment, users would never need to know these numeric addresses as the DNS would resolve the names, www.port.ac.uk, etc into the relevant address.

Setting up the Apache Win32 Server

Installation of Apache

The install files for Apache can be downloaded from http://httpd.apache.org/ or can be found in the WUCM1 area of the L drive. To install, run the appropriate .msi or .exe file. These notes apply to Apache version 1.3. If you want to try this at home, try using version 2.2, but note that you will only be able to install it on a machine for which you have administrator rights.

For the purposes of this tutorial, it is instructive to complete the whole process, though as with most Windows programs the install is not a very exciting one.

From past experience, most students happily click away and get Apache installed, but fail to choose the best set of options from the installer. In order to help, the following set of screen grabs illustrate installation on a laptop.   (Note this PC was joined to a home network via a WiFi link, so had picked up a name and address from the home network, not a UoP one. In the CommsLab the domain would be tech.port.ac.uk, and your server name would be cl110.tech.port.ac.uk or whatever, depending on which PC you were using. 

Invent any valid email for your domain, not your normal UoP email address. You need to choosing the custom setup type so that you can specify the installation folder. Choose a distinct directory on your N: drive, say N:\Apache. Do NOT install it into "C:\Program Files". If all goes well you will have an operational Web Server.

Note - choose the manual method

NB - make sure you choose a simple location; N:\Apache is a good idea, worked well for most students last year. It is best to avoid folder names including spaces (like "C:\Program Files\") because not all programs originally designed for Unix (where spaces in names are very rare) transport well to Windows (where it is more common).

Testing the default installation

The Apache server version 1 is inherently a command line animal; no fancy Windows, icons, buttons etc. You control and manage it via text files and scripts, more or less the same as when run on a Unix system. Recent releases do create an entry on the start menu. For most of the work today we will ignore this and do it the old fashioned way from a DOS type command console.   Once all is well and you are satisfied with the operation of the server, it is worth setting up an icon.  If you took all the defaults and have Apache set up as a Service or in a difficult location on your file system this can be fixed, but is often quicker to uninstall and start again!

To test out your Apache, open a DOS command console, change drive and directory to Apache's, and set it going with:

N:

cd Apache

apache

The screen dump on the next page illustrates the successful run - not very exciting!

The older versions used to report an error at this point, as the default configuration script did not know anything about your server, for example:

Apache/1.3.6

Syntax error on line 44 of /apache/conf/httpd.conf

ServerRoot must be a valid directory. 

The more recent installers will correctly substitute the name of your server, and the location of all the files, so it may well seem to work.  

Apache server running following a console start, note that there is no interface, not even a cursor!!

 

 

There are a number of things to notice.  

Firstly Apache assumes a Unix style directory separator, so it's '/' not '\' as in DOS and Windows. Even though it is a Win32 application, internally it processes directory paths in the Unix style, so you have to enter them in that style, EXCEPT in cases where you are communicating with DOS or Windows not Apache - very easy to make mistakes. The actual complaint in the old example on the previous page refers to a file in the conf subdirectory, called httpd.conf. This is a standard text file, so can be opened with Notepad or the DOS editor EDIT. (NB it is important to preserve its name - be careful about the habit of Notepad in adding .txt on the end of file names.)

The filename httpd.conf is related to the Unix name of the program, where the Apache program runs as a daemon, hence the name HTTPD makes a kind of sense. Many of the oddities in Apache can be traced back to the common ancestry of much of the core code with the Unix version. If you open httpd.conf you will find a daunting volume of configuration information. For the purposes of this tutorial we will start to build it up from scratch. This is the approach taken by Laurie (2003) and has the virtue of leaving all the difficult stuff until I have left the room!   Keep a copy of the original by renaming the file to a *.bak version. (ren httpd.conf *.bak)   Otherwise delete it, and also srm.conf and access.conf.

Now if you run Apache you get an error message saying the file is not there, not a surprise really.   Run either DOS EDIT or (Notepad) and create a new httpd.conf to save in the conf directory.   NB: check carefully that the file you save really is called httpd.conf and not httpd.conf.txt   (To force notepad not to add a txt, put double quotes round the name!)   What we will do is to slowly add in just enough configuration detail to get things to work, and then tweek it to do what we want. Add a comment line, as below, and save:

#new config file

Rerun Apache, now we get an error message saying:

[alert] cannot determine server's fully qualified domain name.

use ServerName to set it manually

On recent versions of Apache, it may well use the current numeric address from the Windows setup as the server name and start anyway!   So now open up the httpd.conf file in the editor and add in the following, substituting the name of your 'server' host, and where you installed the software

#new config file

ServerName "H01.CommsLab.port.ac.uk"

ServerRoot "N:/Apache/"

DocumentRoot "N:/Apache/htdocs/"

Serving pages!

Once Apache is running, it should be possible to get it to serve pages. Start up a browser (e.g. IE or Firefox) and try to access:

You should see this...

The first gremlin is likely to be that because of the mult-language support in the Apache documentation demo it will not come up - there is no file index.html, it is in many different language versions, index.html.en being the English version - try asking for it by name rather than by default.

If you get an index listing, then clicking on index.html.en will give you the display above, or you could try a different language version!

 

Closing down Apache

If you look back at the server, the DOS console window is just displaying a message saying the server is running - how do you get it to exit? Selecting the DOS console window and hitting Ctrl-C will work, but the next run used to give a warning message about an unclean shut down - nasty.   The preferred way is to open another DOS console window and give the command:

apache -k shutdown

You will need to change directories to Apache first of course.

As in this example, try it but note that Apache will complete its current transactions and close down gracefully, so there will be a delay before the text cursor returns to the launch console window

With the later versions of Apache there is a clean shutdown after a Ctrl-C so it would be ok now, up to you how you kill the server.

A new web site

For the purposes of this tutorial, I assume that you can all produce simple HTML files. You could use DreamWeaver or an equivalent, if available. If not, use any existing HTML files you have or create them from scratch using Notepad or similar. You could even use this page if you have it available electronically!

Roger website

In order to get you going I have provided a few poor examples from my old University home page (It is the original Department standard!!). Ideally we would not want the website to be in the same directory as the Apache program files - errors far too likely, so a different root directory seems a sensible minimum alternative.  To this end set up a new WebRoot directory, and within it our first website directory, say Roger, and copy my html and image files over. Apache expects all of the pages making up a website to be in a subdirectory of your website, with the name htdocs. So the files need to be copied over in this subdirectory.

In addition to the HTDOCS subdirectory you will also need a conf and logs subdirectory in your equivalent of Roger. The conf directory holds the configuration files, with the error logs and access logs being stored in the logs directory. The conf file you have already met, though this time it is stored as part of the website, rather than part of Apache.   In addition, the conf directory holds a file that describes the MIME types allowed. Apache provides a reasonably standard one; this is a copy. The logs file will initially be empty; though will accumulate an access log showing which browsers access your site, the time, and what they asked for.

The httpd.conf file now looks like:

#  Simple conf file for the web site Roger
#Tell  Apache what sort of server it is.
ServerType  standalone
ServerRoot  ”N:/Apache/”

#  Require parameter to tell Apache its name.

ServerName  ”H01.CommsLab.port.ac.uk”

#  Where the documents that make up the site are stored.
DocumentRoot  ”N:/WebRoot/Roger/htdocs”
#  Where the error and access logs should go.
TransferLog  ”logs/access.log”

The problem now, is that firing up Apache as before, gives the default "It Worked" message and the manual pages.   How do we tell Apache to use the newly installed Roger website.   The conf file has a clue, in that DocumentRoot is specified, but how to tell Apache to look in N:\WebRoot\Roger\htdocs to kick things off.   We could edit the conf file in the Apache install directory, but an alternative way is to start Apache with a switch that tells it where to look. Assuming your DOS console window still has Apache as its default directory, then the command:

apache -d N:\WebRoot\Roger

will start the ball rolling on the Roger website. You might want to add the full path in to the TransferLog directive so as to put the log files into your new Roger site, not in the install directories, i.e. "N:/WebRoot/Roger/logs/access.log" assuming that you constructed the logs directory parallel to the htdocs directory.

Log files

The access.log file produced by a brief wander round the Roger website will look something like:

148.197.157.52 - -  [24/Feb/2000:20:14:41 +0000] "GET / HTTP/1.1" 200 533
148.197.157.52 - -  [24/Feb/2000:20:14:41 +0000] "GET /intro.html HTTP/1.1" 200 1501
148.197.157.52 - -  [24/Feb/2000:20:14:42 +0000] "GET /banhom.htm HTTP/1.1" 200 1527
148.197.157.52 - -  [24/Feb/2000:20:14:42 +0000] "GET /images/paper.jpg HTTP/1.1" 200  11120
148.197.157.52 - -  [24/Feb/2000:20:14:43 +0000] "GET /banhom.htm HTTP/1.1" 200 1527
148.197.157.52 - -  [24/Feb/2000:20:14:44 +0000] "GET /images/buthome.GIF HTTP/1.1" 200  2433
148.197.157.52 - -  [24/Feb/2000:20:14:44 +0000] "GET /images/butadmin.GIF HTTP/1.1" 200  2506
148.197.157.52 - -  [24/Feb/2000:20:14:45 +0000] "GET /images/butres.GIF HTTP/1.1" 200  3010
148.197.157.52 - -  [24/Feb/2000:20:14:45 +0000] "GET /images/Roger3.jpg HTTP/1.1" 200  100830
148.197.157.52 - -  [24/Feb/2000:20:14:46 +0000] "GET /images/butproj.GIF HTTP/1.1" 200  3050

The 200 on the end of each line is the response code, in this case "all ok", and the last number is the number of bytes transferred. Each line starts with the IP address of the browser making the request; followed by the date and time they did it. The quoted section is the message received by Apache - a GET request in all of these cases. The error log at this point should be empty, unless something went wrong. Try removing one of the image files, say buthome.gif and trying again.   The error log is usually only committed to the disk when the server is brought down, though when the log gets large it is flushed to the disk.   

The extract below illustrates entries in a typical error log:

[Thu Feb 24 13:18:32  2000] [error] [client 148.197.200.10] File does not exist: /webroot/roger/htdocs/butproj.gif
[Thu Feb 24 13:18:33  2000] [error] [client 148.197.200.10] File does not exist:  /webroot/roger/htdocs/butproj.gif
[Thu Feb 24 13:18:39  2000] [error] [client 148.197.200.10] File does not exist:  /webroot/roger/htdocs/butproj.gif

Now that the server is started using a -d directory directive, you need to modify the kill command.  To shut down the web server now you need:

apache -d c:\WebRoot\Roger -k shutdown

What to do about errors

Checking the error log periodically is a good idea, and would help the webmaster, but what about the user typing away on the browser? What happens at the client end?  

Error response to client

The most likely error is that the client will ask for a document that the server cannot find. There may be a variety of reasons ranging from finger trouble by the client user, files moved on the server, or links not updated, etc.

In the event of a problem or error, Apache can be configured to do one of four things:

  1. Output a simple hard-coded error message
  2. Output a customised message
  3. Redirect to a local URL to handle the problem/error
  4. Redirect to an external URL to handle the problem/error

Apache (in common with other webservers), generates a standard set of HTTP status codes in response to requests from client browsers.  Note that not all of them are errors.

The first option is the default, while the other options can be configured using the ErrorDocument directive in the conf file, followed by the HTTP response code and a message or URL. Messages in this context begin with a double quote mark that is not included in the message. URLs can be local URLs beginning with a slash ("/") or full URLs the client can resolve. Examples (from Laurie) include:

ErrorDocument 500 http://foo.example.com/cgi-bin/tester

ErrorDocument 404 /cgi-bin/bad_urls.pl

ErrorDocument 401 /subscription_info.html

ErrorDocument 403 "Sorry, can't allow you access today"

Note that for any remote URLs, i.e. those starting with the full http:// part, Apache will send a redirect to the client to tell it where to find the document. If you have time, designing a couple of error documents and adding these references into your httpd.conf file would be a good idea.

Other Apache features

Thus far we have established a web server with a simple set of web pages. There are many features not yet explored. In the time available they will be left for you to explore on your own. We will investigate some of them in the coming practical sessions.  

The following areas are important and worth study:

References:

Ben Laurie and Peter Laurie
Apache: The Definitive Guide (3e)
O'Reilly, 2003
ISBN: 0596002033

Mohammed J. Kabir
Apache Server Adminstrator's Handbook
IDG Books Worldwide, 1999
ISBN: 0764533061

David Iseminger
IIS 4 Administrator's Handbook
IDG Books Worldwide, 1999
ISBN: 0764532758

Peter Wainwright
Professional Apache
Wrox, 1999
ISBN: 1861003021

Ryan B. Bloom
The Complete Reference: Apache Server 2.0
Osbourne, 2002
ISBN:0072223448

 

 

Last updated by Prof Jim Briggs of the School of Computing at the University of Portsmouth