Webserver use, configuration and managementUnit WUCM1 |
If you do this practical on a machine connected to the Internet, be warned that there is a very small chance that someone out there might be able to take advantage of a incorrectly configured web server to access your machine. If you do this on a machine on the University network, the University firewall should provide protection against this. If you do this practical in the special CommsLab environment, your server will be visible only over a local network supported by an isolated switched hub in the CommsLab. In most real cases, the server would be advertised to the world and reachable via its entry in the DNS. Useful references in this area are (Laurie, 2003), (Arnold, 2000) or (Bloom, 2002) though for longer term administration of a web server, either (Kabir, 1999) or (Iseminger, 1999) would be worth a look. All of these are in the library. Wainwright (1999) includes a wealth of material including relevant security issues.
In order to get Apache working you need to have TCP/IP installed and configured correctly on the machine to be used as web host. For experimental use a standard Windows machines is adequate. However, for serious release to a wider audience, you will need to move it to a more secure platform - usually Unix in one of its flavours or a Windows server.
Details of how to boot up the Comms Lab PCs into CommsLab mode and login on them are available.
You can do this practical alone, but if you have two people, or more particularly two networked machines, you can more easily test the client-server interaction.
If you have experimented with setting up Apache in the past you might like to take a back seat for this session. Sessions 2 and 3 are both devoted to getting this initial Apache server up and running, so you should have time to experiment, backtrack and repeat. If you do not complete the exercise, then the CommsLab can be used when not in use by classes, so you can continue at your leisure.
Note that in the Comms Lab, each machine should be set up with a fixed IP address. These will normally of the form 148.197.158.???. In a real network environment, users would never need to know these numeric addresses as the DNS would resolve the names, www.port.ac.uk, etc into the relevant address.
The install files for Apache can be downloaded from http://httpd.apache.org/ or can be found in the WUCM1 area of the L drive. To install, run the appropriate .msi or .exe file. These notes apply to Apache version 1.3. If you want to try this at home, try using version 2.2, but note that you will only be able to install it on a machine for which you have administrator rights.
For the purposes of this tutorial, it is instructive to complete the whole process, though as with most Windows programs the install is not a very exciting one.
From past experience, most students happily click away and get Apache installed, but fail to choose the best set of options from the installer. In order to help, the following set of screen grabs illustrate installation on a laptop. (Note this PC was joined to a home network via a WiFi link, so had picked up a name and address from the home network, not a UoP one. In the CommsLab the domain would be tech.port.ac.uk, and your server name would be cl110.tech.port.ac.uk or whatever, depending on which PC you were using.
Invent any valid email for your domain, not your normal UoP email address. You need to choosing the custom setup type so that you can specify the installation folder. Choose a distinct directory on your N: drive, say N:\Apache. Do NOT install it into "C:\Program Files". If all goes well you will have an operational Web Server.
Note - choose the manual method |
NB - make sure you choose a simple location; N:\Apache is a good idea, worked well for most students last year. It is best to avoid folder names including spaces (like "C:\Program Files\") because not all programs originally designed for Unix (where spaces in names are very rare) transport well to Windows (where it is more common). |
The Apache server version 1 is inherently a command line animal; no fancy Windows, icons, buttons etc. You control and manage it via text files and scripts, more or less the same as when run on a Unix system. Recent releases do create an entry on the start menu. For most of the work today we will ignore this and do it the old fashioned way from a DOS type command console. Once all is well and you are satisfied with the operation of the server, it is worth setting up an icon. If you took all the defaults and have Apache set up as a Service or in a difficult location on your file system this can be fixed, but is often quicker to uninstall and start again!
To test out your Apache, open a DOS command console, change drive and directory to Apache's, and set it going with:
N:
cd Apache
apache
The screen dump on the next page illustrates the successful run - not very exciting!
The older versions used to report an error at this point, as the default configuration script did not know anything about your server, for example:
Apache/1.3.6
Syntax error on line 44 of /apache/conf/httpd.conf
ServerRoot must be a valid directory.
The more recent installers will correctly substitute the name of your server, and the location of all the files, so it may well seem to work.
Apache server running following a console start, note that there is no interface, not even a cursor!!
|
There are a number of things to notice.
Firstly Apache assumes a Unix style directory separator, so it's '/' not '\' as in DOS and Windows. Even though it is a Win32 application, internally it processes directory paths in the Unix style, so you have to enter them in that style, EXCEPT in cases where you are communicating with DOS or Windows not Apache - very easy to make mistakes. The actual complaint in the old example on the previous page refers to a file in the conf subdirectory, called httpd.conf. This is a standard text file, so can be opened with Notepad or the DOS editor EDIT. (NB it is important to preserve its name - be careful about the habit of Notepad in adding .txt on the end of file names.)
The filename httpd.conf is related to the Unix name
of the program, where the Apache program runs as a daemon, hence the name HTTPD
makes a kind of sense. Many of the oddities
in Apache can be traced back to the common ancestry of much of the core code
with the Unix version. If you open httpd.conf you will find a
daunting volume of configuration information. For the purposes of this
tutorial we will start to build it up from scratch. This
is the approach taken by Laurie (2003) and has the virtue of leaving all the
difficult stuff until I have left the room! Keep a copy of the
original by renaming the file to a *.bak version. (ren httpd.conf *.bak
) Otherwise delete it, and also srm.conf and access.conf.
Now if you run Apache you get an error message saying the file is not there, not a surprise really. Run either DOS EDIT or (Notepad) and create a new httpd.conf to save in the conf directory. NB: check carefully that the file you save really is called httpd.conf and not httpd.conf.txt (To force notepad not to add a txt, put double quotes round the name!) What we will do is to slowly add in just enough configuration detail to get things to work, and then tweek it to do what we want. Add a comment line, as below, and save:
#new config file
Rerun Apache, now we get an error message saying:
[alert] cannot determine server's fully qualified domain name.
use ServerName to set it manually
On recent versions of Apache, it may well use the current numeric address from the Windows setup as the server name and start anyway! So now open up the httpd.conf file in the editor and add in the following, substituting the name of your 'server' host, and where you installed the software
#new config file
ServerName "H01.CommsLab.port.ac.uk"
ServerRoot "N:/Apache/"
DocumentRoot "N:/Apache/htdocs/"
Once Apache is running, it should be possible to get it to serve pages. Start up a browser (e.g. IE or Firefox) and try to access:
You should see this... |
|
The first gremlin is likely to be that because of the mult-language support in the Apache documentation demo it will not come up - there is no file index.html, it is in many different language versions, index.html.en being the English version - try asking for it by name rather than by default. If you get an index listing, then clicking on index.html.en will give you the display above, or you could try a different language version!
|
If you look back at the server, the DOS console window is just displaying a message saying the server is running - how do you get it to exit? Selecting the DOS console window and hitting Ctrl-C will work, but the next run used to give a warning message about an unclean shut down - nasty. The preferred way is to open another DOS console window and give the command:
apache -k shutdown
You will need to change directories to Apache first of course.
As in this example, try it but note that Apache will complete its current transactions and close down gracefully, so there will be a delay before the text cursor returns to the launch console window |
With the later versions of Apache there is a clean shutdown after a Ctrl-C so it would be ok now, up to you how you kill the server.
For the purposes of this tutorial, I assume that you can all produce simple HTML files. You could use DreamWeaver or an equivalent, if available. If not, use any existing HTML files you have or create them from scratch using Notepad or similar. You could even use this page if you have it available electronically!
In order to get you going I have provided a few poor examples from my old University home page (It is the original Department standard!!). Ideally we would not want the website to be in the same directory as the Apache program files - errors far too likely, so a different root directory seems a sensible minimum alternative. To this end set up a new WebRoot directory, and within it our first website directory, say Roger, and copy my html and image files over. Apache expects all of the pages making up a website to be in a subdirectory of your website, with the name htdocs. So the files need to be copied over in this subdirectory.
In addition to the HTDOCS subdirectory you will also need a conf and logs subdirectory in your equivalent of Roger. The conf directory holds the configuration files, with the error logs and access logs being stored in the logs directory. The conf file you have already met, though this time it is stored as part of the website, rather than part of Apache. In addition, the conf directory holds a file that describes the MIME types allowed. Apache provides a reasonably standard one; this is a copy. The logs file will initially be empty; though will accumulate an access log showing which browsers access your site, the time, and what they asked for.
The httpd.conf file now looks like:
# Simple conf file for the web site Roger #Tell Apache what sort of server it is. ServerType standalone ServerRoot ”N:/Apache/” # Require parameter to tell Apache its name. ServerName ”H01.CommsLab.port.ac.uk” # Where the documents that make up the site are stored. DocumentRoot ”N:/WebRoot/Roger/htdocs” # Where the error and access logs should go. TransferLog ”logs/access.log” |
The problem now, is that firing up Apache as before, gives the default "It Worked" message and the manual pages. How do we tell Apache to use the newly installed Roger website. The conf file has a clue, in that DocumentRoot is specified, but how to tell Apache to look in N:\WebRoot\Roger\htdocs to kick things off. We could edit the conf file in the Apache install directory, but an alternative way is to start Apache with a switch that tells it where to look. Assuming your DOS console window still has Apache as its default directory, then the command:
apache -d
N:\WebRoot\Roger
will start the ball rolling on the Roger website. You might want to add the full path in to the TransferLog directive so as to put the log files into your new Roger site, not in the install directories, i.e. "N:/WebRoot/Roger/logs/access.log" assuming that you constructed the logs directory parallel to the htdocs directory.
The access.log file produced by a brief wander round the Roger website will look something like:
148.197.157.52 - - [24/Feb/2000:20:14:41 +0000] "GET / HTTP/1.1" 200 533 148.197.157.52 - - [24/Feb/2000:20:14:41 +0000] "GET /intro.html HTTP/1.1" 200 1501 148.197.157.52 - - [24/Feb/2000:20:14:42 +0000] "GET /banhom.htm HTTP/1.1" 200 1527 148.197.157.52 - - [24/Feb/2000:20:14:42 +0000] "GET /images/paper.jpg HTTP/1.1" 200 11120 148.197.157.52 - - [24/Feb/2000:20:14:43 +0000] "GET /banhom.htm HTTP/1.1" 200 1527 148.197.157.52 - - [24/Feb/2000:20:14:44 +0000] "GET /images/buthome.GIF HTTP/1.1" 200 2433 148.197.157.52 - - [24/Feb/2000:20:14:44 +0000] "GET /images/butadmin.GIF HTTP/1.1" 200 2506 148.197.157.52 - - [24/Feb/2000:20:14:45 +0000] "GET /images/butres.GIF HTTP/1.1" 200 3010 148.197.157.52 - - [24/Feb/2000:20:14:45 +0000] "GET /images/Roger3.jpg HTTP/1.1" 200 100830 148.197.157.52 - - [24/Feb/2000:20:14:46 +0000] "GET /images/butproj.GIF HTTP/1.1" 200 3050 |
The 200 on the end of each line is the response code, in this case "all ok", and the last number is the number of bytes transferred. Each line starts with the IP address of the browser making the request; followed by the date and time they did it. The quoted section is the message received by Apache - a GET request in all of these cases. The error log at this point should be empty, unless something went wrong. Try removing one of the image files, say buthome.gif and trying again. The error log is usually only committed to the disk when the server is brought down, though when the log gets large it is flushed to the disk.
The extract below illustrates entries in a typical error log:
[Thu Feb 24 13:18:32 2000] [error] [client 148.197.200.10] File does not exist: /webroot/roger/htdocs/butproj.gif [Thu Feb 24 13:18:33 2000] [error] [client 148.197.200.10] File does not exist: /webroot/roger/htdocs/butproj.gif [Thu Feb 24 13:18:39 2000] [error] [client 148.197.200.10] File does not exist: /webroot/roger/htdocs/butproj.gif |
Now that the server is started using a -d directory directive, you need to modify the kill command. To shut down the web server now you need:
apache -d
c:\WebRoot\Roger -k shutdown
Checking the error log periodically is a good idea, and would help the webmaster, but what about the user typing away on the browser? What happens at the client end?
The most likely error is that the client will ask for a document that the server cannot find. There may be a variety of reasons ranging from finger trouble by the client user, files moved on the server, or links not updated, etc.
In the event of a problem or error, Apache can be configured to do one of four things:
Apache (in common with other webservers), generates a standard set of HTTP status codes in response to requests from client browsers. Note that not all of them are errors.
The first option is the default,
while the other options can be configured using the ErrorDocument
directive in
the conf file, followed by the HTTP response code and a message or
URL. Messages in this context begin with a double quote mark that is not
included in the message. URLs can be local URLs beginning with a slash ("/")
or full URLs the client can resolve. Examples (from Laurie) include:
ErrorDocument 500
http://foo.example.com/cgi-bin/tester
ErrorDocument 404
/cgi-bin/bad_urls.pl
ErrorDocument 401
/subscription_info.html
ErrorDocument 403
"Sorry, can't allow you access today"
Note that for any remote URLs, i.e. those starting with the full http:// part, Apache will send a redirect to the client to tell it where to find the document. If you have time, designing a couple of error documents and adding these references into your httpd.conf file would be a good idea.
Thus far we have established a web server with a simple set of web pages. There are many features not yet explored. In the time available they will be left for you to explore on your own. We will investigate some of them in the coming practical sessions.
The following areas are important and worth study:
Ben Laurie and Peter Laurie |
Mohammed J. Kabir |
David Iseminger |
Peter Wainwright |
Ryan B. Bloom |
|
Last updated by Prof Jim Briggs of the School of Computing at the University of Portsmouth |
||