Webserver use, configuration and management

Unit WUCM1

Common Gateway Interface

A CGI script is a program run on the server, so how can it get its input, and what should it do with its output? CGI scripts are loaded and executed at the request of the server, which will pass the details about the particular request through environment variables, which tell the script the URL it was called from, any additional parameters, the HTTP method used, and general information about the request (see Wainwright, 2000). In addition, input can come from the standard input (STDIN) if the method was POST.

Having done its task, the script passes information back to the web browser via standard output (STDOUT). The web server (e.g. Apache) will ensure that data presented to STDOUT is routed to the browser.

Note: STDOUT and STDIN are Unix based terms, which on a standalone PC, would match the keyboard and screen used for common input and output. (Note the implicit assumption of a character based system. This goes back to the time when your computer was a remote Unix mainframe, and you the user had a dumb terminal - no mouse, no graphics, no multimedia etc.)

Simple Apache configuration for CGI

The server must be set up to recognise that some URLs actually represent programs to execute rather than files to deliver. Also, the CGI script needs to be found somewhere in the file space of the web server for it to be executed, however there is a deal of additional configuration needed before it will work.

Assuming your Apache has mod_cgi setup correctly, the next step is to tell Apache which directory contains scripts that are to be executable, and how to recognise them as executable. There are two ways of doing it, but the first one is the most common:

For the former, if you have constructed a cgi-bin directory parallel to the htdocs directory in your web directory (see figure), you need to add the following in to your httpd.conf file. The screen grab illustrates the basic server directory structure.

#     Fix to permit access to cgi-bin directory
#
<Directory "D:/WebRoot/Roger/cgi-bin">
  Options Indexes FollowSymLinks MultiViews
  AllowOverride None
  Order allow,deny
  Allow from all
</Directory>

# Where the error and access logs should go, you may already have this.
# note as relative paths, the ServerRoot will be prepended, use full path
# to D:/WebRoot/Roger if that is where you want the logs etc.
TransferLog "logs/access.log"
ErrorLog "logs/error.log"
LogLevel     warn

#     tell Apache where the cgi program files are to be found.
#     i.e. not in with the documents for security reasons.
ScriptAlias /cgi-bin/ "D:/WebRoot/Roger/cgi-bin/"

#     tell Apache where to put the script errors.
ScriptLog     "logs/script.log

Apache tree structure

The alternative configuration using Options ExecCGI might look like:

<Directory "D:/WebRoot/Roger/htdocs/scripts">
    AllowOverride None
    Options ExecCGI
    SetHandler cgi-script
    Order allow,deny
    Allow from all
</Directory>

Specifying ExecCGI enables files to be interpreted as CGI scripts, and the SetHandler then marks the whole directory as a CGI script location. You can, if you wish, tune your configuration file to specify files with specific extensions instead of those in a specific directory – this way is more prone to errors however. Another technique is to add .cgi (or .pl, .bat, .exe, etc) as being of an executable type with :

AddType     application/x-httpd-cgi  .cgi

The danger with this approach is that you make a configuration error, Apache does not recognise a URL as mapping onto a program and treats it as a file instead, and delivers to the user the source code of the program. This is not usually desirable from a security standpoint!

Whichever approach you use, one important task is to mark any CGI script as executable as far as the operating system is concerned. Under Windows, this is usually determined by the file extension: .bat or .exe being the usual indicators of executable. However, there are exceptions, for example if you have installed Perl on a Windows machine, then Perl scripts are executable so long as the first line points to the Perl interpreter.  For example:

#!c:\Apache\Perl\bin\perl.exe -Tw
print "Content-Type: text/html\n\n";
print "<HTML><HEAD><TITLE>A Minimal CGI Script</TITLE></HEAD>";
print "<BODY><P>Hello World</P></BODY></HTML>";

Linux is somewhat different in that any file can be marked as executable; you have to use chmod +x filename to change its status. There are more complex usages of chmod to cater for different classes of user (Owner, Group or World) and different permissions. Use the manual pages to find out, e.g. man chmod.

Passing data to a CGI program

To pass data to the CGI program, the web server needs to set up a number of environment variables containing the critical information. If there are parameters in the URL query string (i.e. after the "?"), the server will automatically detect the question mark and assign the remainder of the URL (after the ‘?’) to the environment variable QUERY_STRING, which can then be referred to by the script. If the method is POST, the parameters are output in such a way that the CGI script can read them from its STDIN stream.

The parameters are passed to the CGI program in their raw encoded form. It is the responsibility of the program to do the decoding. If you write your server-side programs in Perl, there is a Perl library called cgi.pm that will do this for you.

Apart from QUERY_STRING, other commonly needed environment variables are:

Arnold (2000, pp126-132) discusses the full set in some detail, though for an online reference see the full list required by the CGI specification at http://hoohoo.ncsa.uiuc.edu/cgi/ , Apache adds a few not required by the standard.

Debugging CGI scripts

Since CGI scripts are run through the server rather than directly, it is somewhat more difficult to debug them than an ordinary program, especially as they are often on a remote server not under your direct control. Assuming your are testing your CGI by requesting a web page with a form, filling in the form, and then clicking “submit”, then where do the error and debug messages go?

Usually the answer is the log files, either ScriptLog and ErrorLog, depending on what has gone wrong. If the error is in your configuration of Apache so the script is never actually executed, then the error message is likely to be in the ErrorLog. Once the script is in control and running properly, then the error messages will be directed to the ScriptLog. The extract on the next page is from the ErrorLog for a variety of configuration difficulties.

[Sun Mar 17 14:08:51 2002] [error] [client 192.168.27.58] attempt to invoke   directory as script: c:/apache/roger/cgi-bin
[Sun Mar 17 14:29:14 2002] [error] [client 192.168.27.58] Premature end of script   headers: c:/apache/roger/cgi-bin/mycgi.bat
[Sun Mar 17 14:39:41 2002] [error] [client 192.168.27.58] (2)No such file or   directory: script not found or unable to stat:   c:/apache/roger/cgi-bin/command.exe mycgi2.bat
[Sun Mar 17 14:45:56 2002] [error] [client 192.168.27.58] (2)No such file or   directory: script not found or unable to stat:   c:/apache/roger/cgi-bin/mycgi2.cmd
[Mon Mar 18 22:45:56 2002] [error] [client 192.168.27.58] couldn't spawn child   process: c:/apache/roger/cgi-bin/first.pl
[Mon Mar 18 22:50:08 2002] [error] [client 192.168.27.58] couldn't spawn child   process: c:/apache/roger/cgi-bin/first.pl
[Mon Mar 18 22:51:17 2002] [error] [client 192.168.27.58] Premature end of script   headers: c:/apache/roger/cgi-bin/first.pl

If you want to supplement the native error message with your own tracing or debugging messages then any output from the script to the STDERR stream would be added in to the ScriptLog files.

In many cases you would want to run the CGI scripts from a command line before trying to integrate them into a web site. The problem is that the CGI script is expecting to get input from the webserver, not your keyboard, and to direct output back to the server. This problem is solvable, but somewhat outside the scope of this unit. Any server-side programming unit later in your course would address the issue in whatever language they are using. Both Wainwright (1999) and Laurie (2003 and 1999) discuss the issue.

Languages for CGI

The original assumption when the CGI standard was first mooted was that CGI scripts would be Unix Shell scripts: Bourne, Bash, Korn etc. Since then the field has widened considerably. Perl remains a very popular choice for programmers, with C or C++ meeting the need for speed. More likely is that a modern developer has chosen a non-CGI approach to their web programming needs.

A few security points

CGI raises a number of security issues, mainly arising from the fact that it is very easy to misconfigure your system and open security loopholes.

One serious issue concerns the privileges that any CGI scripts have when running. By default, they will have the same rights as the server (Apache) and since Apache needs to bind to port 80 (or port 443 if using SSL) it must start as root/system administrator. In a well-configured server, once this initial binding is done, Apache will change its user id to a very low privilege user, e.g. "nobody" or "webuser". This provides a certain amount of protection and solves the root privilege problem. You don't want scripts to run as root (unless they really, really have to) in case a rogue or buggy script, or one compromised by a hacker, does damage to your system or provides access to things that you don't want users to see.

In view of the potential problems running CGI scripts with the same privilege as Apache, and even if this correctly drops to a low privilege, all scripts are the same, and this may not be desirable. This is especially true if the CGI scripts are being uploaded by a range of users – on an ISP server for example. A solution is to use a CGI wrapper. This is a layer of code that is inserted between the Apache web server and the user's CGI script – i.e. it wraps up the script in a security blanket. The CGI wrappers permit each CGI script to be configured as a different owner, group etc. thus isolating the effects of a poorly written script. The CGI wrapper can also subject the script to a stringent security test. The two main alternatives are suEXEC (which is bundled with Apache though needs to be explicitly enabled and set up) and CGIWrap, which is produced by an independent group. See the discussion of suEXEC in Laurie (1999, pp93-99 or 2003, pp346-352).

Another security concern would be the behaviour of editors that programmers use to edit CGI scripts. If they leave backups with well known, but non-registered extensions, then Apache may well serve up their source to interested crackers – another opportunity for them to identify weaknesses in your CGI scripts.   E.g. Programmers File Editor saves any edited file as a backup with the extra extension .$$$, the guard here is to block any suspect file extension with a Files configuration clause, e.g.

<Files *$$$>
    Order allow,deny
    Deny from all
</Files>

A frequent worry is a script injection attack. If data provided by the user is used (unvalidated) to form a system command or a database query, a malicious user may be able to contrive to get it to execute a destructive command such as "rm -rf /" on a Unix system, "del *.*" on a DOS one, or "DROP *" to an SQL database. See Wainwright (1999, pp217) for a list of known insecure CGI scripts that you might inadvertently download from the Internet, however the list now woefully out of date.

References

Ben Laurie and Peter Laurie
Apache: The Definitive Guide (2e)
O’Reilly, 1999
ISBN: 1565925289                    (in Library)

Mark Arnold, Jeff Almeida & Clint Miller
Administering Apache
McGraw-Hill, 2000
ISBN: 0072122919                    (in Library)

Peter Wainwright
Professional Apache
Wrox, 1999
ISBN: 1861003021                    (in Library)

Jennifer Niederst
Web Design in a Nutshell
O’Reilly, 1999
ISBN: 1565925157                    (in Library)

Ben Laurie and Peter Laurie
Apache: The Definitive Guide (3e)
O’Reilly, 2003
ISBN: 0596002033                    (in Library)

Scott Guelich, Shishir Gundavaram & Gunther Birznieks
CGI Programming with Perl (2e)
O’Reilly, 2000
ISBN: 1565924193                    (in Library)

CGI specification

http://hoohoo.ncsa.uiuc.edu/cgi/overview.html

 
 

Last updated by Prof Jim Briggs of the School of Computing at the University of Portsmouth