Web programming

Units WEB1P and WEB2P

Alternatives to Perl and CGI

Alternative web server architectures

Perl is not the only language for developing web applications. In principle, any programming language can be used to write CGI scripts, but there is another approach to server-side web applications that has advantages and disadvantages over CGI.

If you recall Lesson 1, we drew the following picture (Figure 1) to show the architecture of a CGI application:

This shows how the web server invokes the CGI program in response to an appropriate HTTP request, and then passes its output back to the browser as the HTTP response.

This is fine except that there is usually a bit of a delay in invoking the CGI program. The precise details depend on the operating system, but the typical scenario is that the web server has to ask the operating system to create a new process that will execute the program. Most operating systems are designed to make this as fast as possible, but a process starting up another process to do some work is almost always going to be appreciably slower than the first process doing the job itself. This is the Achilles heel of CGI programming.

One solution is to have the web server execute the program itself. You might think that this would need a highly developed server program that, as well as responding to HTTP requests, had to do any other program task that might be required of it. However, the solution is simpler than that. Because most languages that are used for web applications (like Perl) are interpreted, all that needs to be done is to build an interpreter for the language into the web server, and have it execute the programs written in that language.

This sort of structure is illustrated in Figure 2 below.

Text Box:
Figure 2 - Server module architecture

Now the execution of the program is part of the same process as the web server, so there is no start-up cost associated with creating a new process. The connection between the conventional part of the web server and the module that executes the program is an Application Programming Interface (API) - basically a set of procedure calls that each side can make of the other.

Most web servers (including Apache and IIS) now embody an API that allow server modules to be "plugged in". There are many modules that have been written for Apache. In fact plug-in modules provide most of Apache's advanced functionality. If you are interested in the details, see http://httpd.apache.org/docs/misc/API.html. One of the most commonly used modules for web application servicing is mod_perl - this embodies a Perl interpreter into Apache (see http://perl.apache.org/ for more details).

There is a third model, one used by the Tomcat Java servlet container. In this a separate auxiliary server exists to deal with particular requests - in the case of Tomcat, for those that map on to Java servlets and Java Server Pages (though Tomcat will serve HTML pages and other pages as well). The auxiliary server runs and listens on a separate TCP/IP port (the default is 8080 for Tomcat). The web server resends any relevant requests to the auxiliary server, which processes them and sends the response back to the web server for response to the client.

Figure 3 - Auxiliary server architecture

The advantages of this are that the auxiliary server (once started) can run continuously (a necessary requirement of the Java servlet specification) and therefore doesn't need to be restarted per request. There is obviously a communication overhead involved in resending the HTTP request (and response), but this is normally slight when the two servers are running on the same machine, and not onerous if they are on separate machines connected by a reasonably fast connection. Being able to have the two servers on separate machines obviously provides immense flexibility and, potentially, resilience.

Pros and cons of CGI, server modules and auxiliary servers

	CGI	Server modules	Auxiliary server
Characteristics	Web server creates a new process for each request that maps onto a program Data passed according to CGI Server reads output of program from program	Web server invokes interpreter via API for each request that maps onto a program Data passed via API Server gets output via API	Auxiliary server runs on a different TCP/IP port (and potentially on a different machine) Relevant requests forwarded by web server to auxiliary server Server passes response back
Pros	Independent of server - if program crashes it cannot affect the server The web server takes up less memory if it does not load any server modules Any memory (or other resources) used by the CGI program is released when the CGI program terminates	No need to create a separate process, therefore faster For programs that access databases, the server can maintain a persistent connection to a database, saving reconnection time	No need to create a new process for each request Can maintain state (if desired) including database connections Separate from the main web server
Cons	The time to create a new process to handle the CGI request is relatively long For programs that access databases, each new process must establish a new database connection	Server and program inextricably linked - a crash within the server module may crash the server The web server will occupy more memory because of the size of the server module(s) it loads If any server module needs a lot of memory, that memory will not be released (at least not until the server dies)	Overhead of resending HTTP requests and responses

Alternative language types

When one writes a web application in Perl, the output of the program must usually be an HTML document. Because of this, part of the programming has to be to output (via print statements or the like) all the necessary HTML tags. This doesn't just include paragraphs, tables and text formatting - it must also include the structure of the HTML document, including its head and body.

Tools, such as the CGI Wizard in Perl Builder, can make this easier to do, but there is another way to create dynamic content in web pages, and that is to embed code within an HTML page.

This way, the HTML page can be constructed using your favourite HTML editor (e.g. Dreamweaver) to design and layout your page, with the program code inserted within it to make it dynamic (Dreamweaver and some other editors give you help to do this). Languages that use this model include:

PHP - http://www.php.net/
ASP (Active Server Pages) - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbegvb/html/activeserverpages.asp
JSP (JavaServer Pages) - http://java.sun.com/products/jsp/

A PHP example

Let's look at an example in PHP.

While PHP is superficially similar to Perl (for example, like Perl, variable names start with a dollar sign) there are many subtle differences that programmers who switch between the two languages need to remember. However the most crucial difference is what a PHP program looks like. Here's an example (taken from http://www.php.net/manual/en/tutorial.firstpage.php).

<html>
<head>
<title>PHP Test</title>
</head>
<body>
<?php echo "Hello World<p>"; ?>
</body>
</html>

At first sight, this just looks like an HTML page. However, the sixth line has an unusual tag "<?php …", which is not part of HTML. The tag contains some PHP code. When this PHP program is executed, all the standard HTML text is simply copied to the output, while the PHP code is executed and its output inserted at that point in the text. So the output of executing this PHP program is:

<html>
<head>
<title>PHP Test</title>
</head>
<body>
Hello World<p>
</body>
</html>

There are various ways in which PHP code can be embedded in an HTML page, but what they all allow you to do is to embed programming language code inside what is to all intents and purposes an HTML page.

Pros and cons

What are the advantages of embedded code over generated code? Both techniques give the same result, but typically embedded code (like PHP) is better where the complexity of your output page is in the HTML, whereas generating code (like Perl) is better when the complexity is in the way you acquire or process your data.

Embedded code languages can be used either via CGI or as a server module. For reasons of speed and convenience, most are normally configured as a server module, but this does not have to be the case.

	Last updated by Prof Jim Briggs of the School of Computing at the University of Portsmouth
	The web programming units include some material that was formerly part of the WPRMP, WECPP, WPSSM and WEMAM units.