Webserver use, configuration and management

Unit WUCM1

Initial web server configuration and testing

Installing Apache

Laurie (2003) Chapter 1 opens by discussing how to go about installing an Apache distribution. There are two basic options: installing the Apache Server from the source code (requires compilation and hence a validated C compiler) or from the distributed binaries targeted at the various operating systems supported.

Most Linux distributions include a binary version of Apache, and if the relevant options are selected, will install and auto-configure the web server. The normal practice for a Windows platform would be to install the binary distribution, though the sources are provided.

The practical session for this week looks at the installation and initial configuration of a Win32 version of Apache, and would be a good start for any of your own experiments.

It is important to halt any pre-existing and running servers prior to setting up a new one. In essence you need to locate and kill off all of the httpd (Linux) or apache (Windows) processes. You also need to remove the autorun instructions from the Linux start-up scripts. Compiling and installing a Unix source distribution does require some familiarity with the Unix command line environment, see Laurie (2003) for the detail if you want to take this route, but for the purposes of this unit you can omit this stage.

Configuration

This lecture will be based on Apache - see Jepson (2001) or Iseminger (1999) for an IIS approach. Whilst IIS is usually configured with a GUI tool, Apache is usually configured using a plain text configuration file.

Before looking in more detail at Apache, the screen grab illustrates the IIS approach:

The substance of what needs to be configured is essentially the same using either Web server, though the IIS version will be more familiar to Windows users.

The are open source versions of GUI tools, for example Comanche and TkApache, both of which use the Tcl/Tk libraries:

http://comanche.com.dtu.dk/comanche/

http://everythinglinux.org/TkApache/TkApache_content.html

LinuxConf is a general purpose GUI tool for Linux that includes an Apache configuration module, http://www.solucorp.qc.ca/linuxconf/ or Chapter 2 of Wainwright (1999)

Configuration files

However you have installed Apache, once that is complete you need to configure it to undertake your needs, not the merely serve the Apache or Linux manuals/help files. In recent distributions a great deal more script configuration is undertaken automatically, though much remains.

The default set of config files for Apache is:

http.conf The main Apache server configuration file. This file has the same name under both Linux and Windows, even though the main webserver executable is different (vis httpd under Linux and apache under Windows)
srm.conf (Apache 1 only) The server resource configuration file, now deprecated.
access.conf (Apache 1 only) The directory and file access restrictions descriptions, now deprecated.
mime.types This file relates media types to file name extensions.
magic This file lists recognisable byte sequences used by the mod_mime_magic module (if enabled) to determine a file’s media type by examining the first few bytes of the file.

The files access.conf and srm.conf are now deprecated and magic and mime.types are usually best left as default. We will look at httpd.conf in a little more detail, if have a default install of Apache it is well worth a study. Beware it is some 1500 lines long, though much of it is comment. The following discussion sets out some of the basic configurations.

Configuration file syntax

A hash symbol ( # ) is used to introduce a comment - the comment starts with the hash and continues to the end of the line.

There are two kinds of directives:

The syntax of a variable assignment is <variable name><white space><value>. For example, one of the required directives is to set up the document root, i.e. to tell Apache where the root of the tree of files and directories that constitute the webspace is located in the operating system’s file space, vis: from the tutorial example:

# Where the documents that make up the site are stored.
DocumentRoot "/WebRoot/Roger/htdocs"

Values that may contain embedded spaces must be surrounded by double quotes, as in the example. Apache v2 tightens the syntax rules and now requires more explicit quoting.

A block directive looks a little like an HTML tag (but isn't one), in that it is introduced by angle brackets (greater and less than symbols), and serves to apply specific directives to specific parts of the Apache system. This might be for example to set access rules for a particular directory, to give rules for specific file types, carry out specific configuration if certain modules are loaded, etc.

For example, the following will set up a very restricted access to the root directory and all subdirectories, assuming that later directory clauses will relax the position for the directories in the document tree.

<Directory />
Options none
AllowOverride none
Order allow,deny
Deny from all
</Directory>

We will return to this later as access control is central to a well-managed and secure webserver.

Minimum basic configuration file

The following page sets out the minimal basic Win32 configuration file from the tutorial session. It describes the location and type of the server and its files. (Note that is targeted at Apache v1.3.29 – v2.0.48 is different, principally it needs a deal more pre-installation of dynamically loaded modules to offer basic services.) The minimum configuration file for Linux will be also be different, as the binary Windows install adds a number of defaults into the registry.

For detail on the use of Apache v2 see (Bloom, 2002), though most of the practical work will make use of the last year’s version of Apache 1.3, namely 1.3.29, as the initial setup is simpler. (Current version is 1.3.33). After the first week’s practical the difference is less noticeable; either may be used for your log book exercises, or indeed the Linux version.

#     Simple conf file for the web site Roger
#Tell     Apache what sort of server it is. (Omit for v2.0.43, others not possible)
ServerType     standalone
#Where     the install directory is located – prepended to all non-root directories
ServerRoot     "C:/Apache"
#     Required parameter to tell Apache its name.
ServerName     H01.CommsLab.port.ac.uk
#     Where the documents that make up the site are stored.
DocumentRoot     "D:/WebRoot/Roger/htdocs"
#     Where the error and access logs should go.
TransferLog     "logs/access.log"

ServerType

The ServerType variable assignment tells Apache what type of server it is - the options being ‘standalone’ (the most common on Linux systems and the only option for Win32 systems) or ‘inetd’, in which case Apache remains inactive until inetd (the Internet daemon) explicitly runs it. Inetd listens to a number of internet ports (as setup by its configuration files (usually /etc/services and /etc/inetd.conf) and when it finds a match (e.g. tcp port 80) launches Apache to deal with the request and then exits. This is generally only sensible for web servers with a light expected load and there have been problems reported using it; best avoided. Now removed from Apache v2.

ServerRoot

The server root is the directory where Apache is installed (usually) or where its configuration files can be found (most importantly). See also DocumentRoot below.

The default is sufficient in most cases, unless you have installed (a Linux) Apache in an unusual location. Under Windows this information is included in the registry entries made at installation so is less vital, however Apache version 2 returns the core assumptions back to the Unix basics, so Apache v2 setups need more detail in a minimum configuration. In addition there are a number of other system level configuration variables that may usually be left at default on Windows (LockFile, PidFile, ScoreBoardFile etc.)

ServerName

The ServerName variable holds the IP hostname of the webserver. This may be the usual IP name of the host or sometimes a more web oriented name, such as www.myhost.mydomain.co.uk. In this case the new name needs to be included in the name servers dealing with your domain.

In your small-scale experiments you would need to add a second entry in all of the hosts files to associate the www based name with the same IP address as the host. It is also possible to have an extra IP address for the host, but this would need to be setup in the system network configuration as well. It is vital that what ever name you put in the conf file is the same as the name in the hosts file and matches the number configured under the Windows Networks utility. This is especially true of the name "localhost".

DocumentRoot

The DocumentRoot variable holds the subdirectory in the file system where the default web site is located. It is possible for one webserver (i.e. one host running one instantiation of Apache) to host many different websites, but for now we will keep to the simple case. The significant effect of this directive is that Apache will not permit any access to files in directories higher up the server file system structure, so affording a basic measure of security.

Note that the DocumentRoot does not have to be a subdirectory of the ServerRoot. Indeed, it is considered good security to keep them separate, though the default DocumentRoot will be the htdocs directory of the ServerRoot if you don't specify anything else.

TransferLog

The last variable given above is TransferLog, and describes where Apache should store the log of who has accessed your server, when, and what they asked for. Note that as there is no drive letter specified in the example, the ServerRoot as given will be pre-pended to the directory specified. There are a range of other log files that can be setup, including an error log (useful for debugging CGI scripts). The directive TransferLog is removed from Apache v2, use the CustomLog instead. We will get to this later, so if you want to experiment with Apache v2, look ahead or be prepared for a little tweaking.

Simple configuration file

The following is a more extended set of basic directives from a default Win32 install. The httpd.conf file includes a significant level of commenting, starting with a caveat about making changes with out understanding – clearly vital on a live server, but merely a caution in respect of your experiments. Most of the Apache texts include discussion of all of the various variables and what the implications for the various values they can be given.

###         Section 1: Global Environment
ServerType         standalone
ServerRoot         "C:/Apache"  # or wherever yours ended up!
PidFile         logs/httpd.pid
ScoreBoardFile         logs/apache_runtime_status
Timeout         300
KeepAlive         On
MaxKeepAliveRequests         100
KeepAliveTimeout         15
MaxRequestsPerChild         0
ThreadsPerChild         50
###         Section 2: 'Main' server configuration
Port         80
ServerAdmin         admin@Ranvilles
ServerName         RanaWifi.Ranvilles
DocumentRoot         "D:/WebRoot/Roger/htdocs"  # or wherever you put yours.
 
DefaultType         text/plain
HostnameLookups         Off
      
ErrorLog         logs/error.log

#         Possible values include: debug, info, notice, warn, error, crit, alert, emerg
LogLevel         warn

LogFormat         "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"         \"%{User-Agent}i\"" combine
LogFormat         "%h %l %u %t \"%r\" %>s %b" common
LogFormat         "%{Referer}i -> %U" referer
LogFormat         "%{User-agent}i" agent

CustomLog         logs/access.log common     

The set of variables setup after the Timeout 300 relate to the way the server handles requests. Note that while the Linux version will fork many child processes to deal with the incoming requests from users, the Windows version creates many threads to accomplish the same task.

The second section relate to the basic description of the main server - we will look at virtual servers later. The fields included in the log files are: host, identity, authuser, timedate, request, status and bytes.   In the example above, the access log is the standard one known usually as common. The extract below is of the first few lines of a new install of Apache, looking at the manual pages.

192.168.27.58 - - [10/Feb/2002:17:59:02 +0000]     "GET / HTTP/1.1" 200 1494
192.168.27.58 - - [10/Feb/2002:17:59:02 +0000]     "GET /apache_pb.gif HTTP/1.1" 200 2326
192.168.27.58 - - [10/Feb/2002:17:59:21 +0000]     "GET /manual/ HTTP/1.1" 200 9580
192.168.27.58 - - [10/Feb/2002:17:59:21 +0000]     "GET /manual/images/apache_header.gif HTTP/1.1" 200 4084
192.168.27.58 - - [10/Feb/2002:17:59:21 +0000]     "GET /manual/images/pixel.gif HTTP/1.1" 200 61
192.168.27.58 - - [10/Feb/2002:17:59:21 +0000]     "GET /manual/images/index.gif HTTP/1.1" 200 1540
192.168.27.58 - - [10/Feb/2002:17:59:33 +0000]     "GET /manual/windows.html HTTP/1.1" 200 27725
192.168.27.58 - - [10/Feb/2002:17:59:34 +0000]     "GET /manual/images/sub.gif HTTP/1.1" 200 608

Can you relate the field data above to the common logfile field specified above?

Testing

Implications of running as a service/daemon

Because the Apache application is intended to be run as a service in the background (aka a daemon), it does not have much in the way of a user interface. The default Windows install of v1.3.29 should put a "Start Apache in Console" on the Start menu, but that is all. You can start Apache server running using a command console, and the command apache – assuming you have cd'd to the installation directory.

Most students, having installed and configured the Apache Server, are a bit put out by what happens – i.e. nothing!!! This is very noticeable under Windows, as users expect menus, icons etc. and all you get is a command window, with a brief announcement, as here:

Not even a text cursor to give commands!

Explain to yourself why this is so.

 

Under Windows NT and later, there is the additional option of running the Apache server as a "service", i.e. automatically and in the background, so it has no user interface at all. This is the equivalent of running the Apache server under Linux where it is a daemon – that is why it is called httpd, the HTTP daemon. The benefit of setting the Apache server up as a service or daemon is that it will be brought up when the server operating system boots, and taken down when the server operating system shuts the machine down. For the purposes of the Windows experiments, we will be using Console mode, but you will need to sort out how your Linux server has Apache set-up in order to restart without a reboot. Every time you make any changes to the configuration files Apache needs to be restarted for them to come into effect – remember this if things seem not to be working!

Starting and stopping Apache

For a Windows based console Apache the tutorial discusses the original start and stop methods. Starting is easy, open a console window and use the apache command, with switches as necessary. Stopping for most versions of Windows Apache requires a separate console window and use of the kill command, vis:  apache –k shutdown. It is always possible to force a quick kill using CTRL-C in the start console window, but on earlier versions, Apache would complain about improper shutdown and possible data corruption. CTRL-C works properly in v1.3.22 and after, though the tutorial will encourage the older ‘proper’ method.

Under Linux/Unix then, the command httpd with the relevant directory switches will start Apache, but to kill it you need to know its process identity (PID). The command ps –aux will list all of the processes, giving their PIDs. Find httpd in the list and the use the kill command, kill <PID> using the value from the list, to kill off the Apache server. However, as you may have noted, Apache notes its PID in a text file, so you can use it to tell kill which process to stop. 

There is usually an extra helper program, apachectl, that can be used to start, stop, or restart Apache - for example ./apachectrl stop  (assuming that you are in the bin subdirectory of the Apache install directory.  Laurie (2003) discusses this in detail in chapter 2.

Default web site

A default install of Apache includes a default ‘hello world’ type web site that includes a web version of the manuals current at the time of the release of the server application. If your system install is successful, then pointing your browser at http://localhost or http://myhost.mydomain should result in the following index page (v1.3.29)

References

Mark Arnold, Jeff Almeida, & Clint Miller
Administering Apache,
McGraw-Hill, (2000),
ISBN: 0072122919            (in Library)

Eric Larson & Brian Stephens
Administrating Web Servers, Security and Maintenance,
Prentice Hall, (2000),
ISBN: 0130225347            (in Library)

Ben Laurie & Peter Laurie
Apache: The Definitive Guide (3e),
O'Reilly, (2003),
ISBN: 1596002033            (in Library)

Scott Hawkins
Apache Webserver Administration & eCommerce Handbook,
 Prentice Hall, (2001),
ISBN: 0130898732            (in Library)

Heather Osterloh
IP Routing: Primer Plus
SAMS, (2002)
ISBN: 0672322102            (in Library)

Peter Wainwright
Professional Apache
Wrox Press, (1999)
ISBN: 1861003021            (in Library)

Ben Laurie & Peter Laurie
Apache: The Definitive Guide (2e),
O'Reilly, (1999),
ISBN: 1565925289            (in Library)

Ryan B. Bloom
The Complete Reference: Apache Server 2.0
Osbourne, 2002
ISBN:0072223448             (in Library)

David Iseminger
IIS 4 administrator's handbook
IDG Books, (1999)
ISBN: 0764532758            (in Library)

Brian Jepson & Stephen Spainhour
IIS in a Nutshell: A Desktop Quick Reference
O’Reilly, (2001)
ISBN:
1565926072            (not in Library)

 

Last updated by Prof Jim Briggs of the School of Computing at the University of Portsmouth