Webserver use, configuration and management

Unit WUCM1

Identifying Requirements

Requirements engineering

Requirements engineering is a general process needed by all computing projects. Much of what is set out as the needs of any software development project are equally applicable to a webserver/website development problem. Broadly, the activities set out by Sommerville (1997), are still needed in a web development project:

Requirements elicitation. The system requirements are discovered through consultation with stakeholders, from system documents, domain knowledge and market studies.
Requirements analysis and negotiation. The requirements are analysed in detail, and there should be some formal negotiation process involving different stakeholders to decide on which requirements are to be accepted.
Requirements validation. The identified requirements should be carefully checked for consistency and completeness.

In support the above activities some kind of management process is needed to oversee and control the timely delivery of outputs.

Kotonya (1998) gives the following illustrative example of some of the requirements for a library system.

"The system shall maintain records of all library materials, including books, serials, newspapers and magazines, video and audio tapes, reports, collections of transparencies, computer disks and CDROMs.
The system shall allow users to search for an item by author, title or ISBN.
The system’s user interface shall be implemented using a web browser.
The system shall support at least 20 transactions per second.
The system facilities which are available to public users shall be demonstrable in 10 minutes or less." Kotonya (1998)

Can you spot any problems with the 5 examples given? The above examples are reasonably typical of initial requirements and illustrate five different types of requirement, viz:

"General requirements – such as 1 above, which set out in broad terms what the system should do.
Functional requirements – such as 2, which define part of the system’s functionality.
Implementation requirements – such as 3, which state how the system must be implemented.
Performance requirements – such as 4, which specify a minimum acceptable performance for the system.
Usability requirements – such as 5, which specify the usability in a measurable way." Kotonya (1998)

How much of this general requirements engineering activity is applicable to the development of a webserver and website? A look through texts like Lynch (1999) that focus on site design, include a section on planning and problem definition that can be interpreted as requirements engineering, whereas texts that concentrate on building web systems are far more likely to include a formal requirements section, e.g. Conallen, (2000) who looks at using UML to develop web systems.

Is a webserver special in any way? Look for requirements engineering (or equivalent terms) in your recommended reading for other units, and ask yourself these questions of their points and advice. Jot any interesting findings below.

Reasons for the webserver/website

Before delving into the requirements engineering issues, it is worth pondering the reasons for providing the webserver with its mounted website(s) in the first place. This will help identify some of the stakeholders and requirements questions. Much of material in this section is ‘borrowed with permission’ from Paterson (2001). Can you add any broad categories to the following? Can you give illustrative examples?

To inform or educate.
To entertain.
To market, sell or persuade.
To stroke someone’s ego …

Most websites serve more than one purpose. Most are designed to make money in one way or another.

To inform or educate

Universities, schools, colleges
Charitable foundations
Non-profit organisations
Government
Business
Political organisations
...

To entertain

Magazines, E-Zines etc.
Galleries and museums
Media organisations

To market, sell or persuade

Businesses
Political organisations
Non-profit organisations
Universities, schools, colleges
Religious organisations

To stroke someone’s ego

Personal home pages
Opinion sites
Fanzines and fan clubs
Personal resumes
...

Owners and stakeholders

In the case of the library system used as an example by Kotonya (1998), the owners and stakeholders are reasonably easily determined. For a website the position is often less clear-cut. There is usually a clear indication of who owns the webserver - the hardware etc. is purchased and mounted at a particular location by a clear "owner", but what of the data, both content and configuration, and the software (CGI etc) that make it work?

Consider the University website. It is mounted on two servers located in Mercantile House and James Watson Building. List who you think are the owners and separately the stakeholders.

Audience

Audience motivation

In trying to identify your audience, it is a good idea to ask the following two questions:

Why is your information needed?
- Solve a problem?
- Make someone feel good?
- Get them involved?
- Tell them something new?
- Sell them something?
- Teach them a new way to do something
- …
What do you want the user to do?
- E-mail you?
- Fill out an order form?
- Phone you?
- Complete a survey or application form?
- Write a letter to someone?
- Vote?
- Join a mailing list?
- Come back on a regular basis?
- ...

Knowing your audience is important as it will colour many of your requirements. Whether the biggest impact is on the non-functional ‘artistic’ website type requirements, or on the more technical webserver requirements, depends on your basic goals for the webserver/website, perhaps as clarified by the questions above.

Audience types

Informational site	Entertainment site	Business site	Non-profit site	Ego site
Who is the audience?
Employees Students Information seekers The curious	Generally younger people Sophisticated web users Need to be ‘up-to-date’ Bells & whistles	Current customers Potential customers Investors Sales force Competitors	Activists Donors Information seekers	Creator Family & friends

www.port.ac.uk	www.disney.com	www.ibm.com	www.aidsquilt.org	www.yaboogie.com/~julie/index.html

Can you add to these broad observations?

How should you go about establishing information about your expected audience? Clearly asking them is best, but usually not possible. What sort of questionnaire might be worthwhile? Can you identify a few stereotypical users to build a model? Audience profiling using stereotypical users is a commonly used technique - see the more extended discussion in Powell (2000).

Audience questions

Questions from Powell (2000):

Basic questions about the user

Where are they located?
How old are they?
What gender?
What language do they speak?
How technically proficient are they?
What kind of connection would they have to the Internet?
What kind of computer would they use?
What kind of browser would they probably use?

What are they doing?

How did they get to the site?
What do you want them to do (from above discussion)?
When will they visit the site?
How long will they stay during a particular visit?
From what page(s) will they leave the site?
When will they return to the site (if ever)?

Where should the answers to these questions fit into your requirements gathering? How should results be recorded? Will there be an impact on the design? the content? the testing? the evaluation?

Content

Whilst identifying and structuring the content is important, we will be considering it in more detail in another session. What are the requirements issues concerning the content? There are a number of general points that should be considered under the heading of requirements, before moving on to the design issues.

In order to form some opinion on the capacity and bandwidth requirements to be discussed next, what content issues need to be addressed?

Volume of data

The volume of data to be served by the webserver clearly has a significant impact on both the hardware and software of the platform and the design issues to be addressed. It is important to get an estimate for the number of files and the average size of files, as well as the total size of the webspace. If the website will involve dynamic generation of pages, how big is the database to be used? All of these estimates will feed the capacity questions to be asked next. Another important estimate is the rate of growth of data to be served.

‘Churn’ of data

The churn is a measure of what proportion of the total data is changed per unit of time; e.g. hour, day or month. This aspect will have an impact on the number of repeat visits by users, and will raise issues of archival storage and version control. For what sort of site is it important to be able to back track through older versions of documents accessible via the webserver? Is this likely to be a part of your requirements documentation?

Number of ‘hits’

Whilst this is only peripherally a content issue, it is important in considering the capacity of the webserver and its network connection. For a new site, it will clearly be based on hope and expectation, rather than on any measure, but needs to be included. Collecting this data and consciously feeding it into the ongoing management process is vital.

In terms of requirements, this area might give rise to requirements such as:

The web server shall be able to handle at least 150 simultaneous user sessions.
The system shall require no more than 3 seconds to retrieve and respond to a client’s request for a static web page.
The system shall require no more than 8 seconds to respond to a dynamic page.

In the above examples (from Conallen 2000) the requirements specification concentrated on the server performance requirements. Does this make any presumptions about the intervening delivery system, a LAN based intranet or the more diverse Internet? In terms of auditability, numbering and tracking these requirements in subsequent documents would follow the conventional software engineering model.

Capacity

Capacity planning is necessarily done before any of the system has been assembled. Webserver tuning, in comparison, is done after the initial architecture has been implemented and released to the waiting world. At this stage you have real data on which to base any performance tuning exercises. Killelea (1998) makes the very cogent point that whilst perfect capacity planning would eliminate the need for performance tuning, it is impossible to achieve as you cannot predict the behaviour of the users, even if they are a well defied cohort (for example, your employees). The very fact of offering a new service will alter their behaviour; much the same as opening a new motorway alters the traffic flows on which the size/route of the motorway was planned.

It is vital to undertake some initial planning – i.e. do the sums based on your estimates, so as to have a view on expected webserver performance at the launch of the service, and into the future. Unplanned growth is liable to throw up significant expense, especially if any of your estimates cross a ‘scalability threshold’, and you need to completely replace a system (whether this is hardware or software).

Killelea (1998 and 2002) discusses in detail the capacity planning issues relevant to establishing a new web system. The following is drawn from that material, and if you can locate a copy do read through chapter 2. (The library has several copies.) Killelea sets out the initial planning as a set of questions, to which he provides considerable elaboration and example, viz:

How many HTTP operations per unit time (httpops) do you expect? This is a refinement of the "number of hits" question. Remember that HTTP is a connectionless protocol – in general each item on the page is the result of a separate connection and request (though HTTP v1.1 improves this very simple approach). This statistic is very much dependant on the time of day, see the sketch diagram below. Killelea makes the point that a million hit per day site is not a great load on a server, assuming it is evenly spread. If you assume an average 10Kb file is transferred this works out at 1.2 Mbps, well within the range of a T1 connection and not a significant load for a server. (Diagram after Killelea (1998), being the graph of typical Sprint NY NAP usage data from http://www.nlanr.net/)
What is the purpose of the website? This harks back to the question raised above; for example, a website used to support a class as part of a course will get all the hits at broadly the same time at the start of the session, thus invalidating all of your calculations about the average httpops.
Have you analysed your log files? This presupposes you have a webserver up, but does give a wealth of statistics. Are the hits spread around the globe? Are the hits spread in time? Is there a diurnal periodicity? Do you have access to a suitable range of log analysers? If not, should you acquire one?
How tolerant are your users? What are your throughput and latency goals? Should you try to satisfy 90% of requests for files under 10Kb in under 5 seconds or less? This does give a concrete start point for planning. User perceptions of the speed of response of your site may of course differ wildly from the actual measured data. Collecting "soft" psychological data via surveys of some kind will give you the alternative view.
What is the average size of transferred files going to be? Typical figures used to be about 10Kb, with text files being smaller and images bigger. The introduction of any significant multimedia would of course throw this assumption out of the window. Killelea (1998) offers a Perl script to find the average file size below a given directory, or get your calculator out. Don’t forget that headers and other overhead need to be included. That assumes that each file is equally likely to be requested. An examination of the log file would prove more useful, and Killelea gives a Perl script for that too.
Will you be providing any streaming media? The characteristics of streaming media are very different to standard web traffic - much longer lasting being the most obvious. In general, it is a good idea to stream media from a separate server with different optimisations for this reason.
Will the web pages spawn additional processes? Here the question relates to the use of CGI scripts of one form or another that potentially generate considerable server load for little actual traffic. As with streaming services, an external server may be the answer.
What sort of scalability do you need? When your current system has reached capacity can you gracefully add more hardware to meet the load, or do you need to redesign the whole architecture?
How available does your site need to be? What level of redundancy are you going to build in to the hardware to cover for the failure of individual components? Can this be achieved in the case of software failure?
How much bandwidth do you need? Killelea (1998) gives a table of different modes of data transfer, ranging from a fast typist (0.000035 Mbps) to an AT&T Sonet long distance fibre link (32000 Mbps). He also includes a table giving the number of transfers/sec for different sized files and network bandwidths.
How fast a server do you need? Generally the server performance is less critical than the network performance, though always worth measuring. Of all of the server performance indicators, available RAM is probably the most significant, closely followed by disc subsystem performance. Keep a careful track of both. This would link back to the performance requirements.

Conclusion

Greenberg (1999) discusses the full software engineering cycle for a typical small web system, addressing issues from the initial requirements issues we have looked at , the graphical and structural issues, to the programming and implementation issues, final ending up with the trauma of "going live" and its implications and issues. In respect of today’s topic Greenberg (1999) suggests that the requirements must:

Be well-defined. The requirement must be well-defined and unambiguous so that it is possible to derive a design from it and base the customer’s acceptance on it being fulfilled.
Be achievable. The requirement must be achievable using ordinary means unless extraordinary means are part of the requirements.
Be measurable and testable. There needs to be a way of determining whether the requirement has been fulfilled, especially if your fee is contingent on acceptance! If the requirement states "all functions should be easy to use" there is no clear measure. If, however, the requirement calls for the user to execute no more than two mouse clicks per function, then the requirement is measurable. User Acceptance Testing is tied to the requirements in the way that Unit and System Testing is tied to the design.

Larson (2000) gives a brief overview of the process of requirements determination in chapter 2, together with a few self-test questions -worth a few moments study.

References

Mark Arnold, Jeff Almeida, & Clint Miller Administering Apache, McGraw-Hill, (2000), ISBN: 0072122919 (Lib)	Eric Larson & Brian Stephens Administrating Web Servers, Security and Maintenance, Prentice Hall, (2000), ISBN: 0130225347 (Lib)
Patrick Killelea Web Performance Tuning (2e), O'Reilly, (2002), ISBN: 059600172X (Lib)	Ian Sommerville and Pete Sawyer Requirements Engineering: A good Practice Guide Wiley, (1997) ISBN: 0471974447 (Lib)
Peter Wainwright Professional Apache Wrox Press, (1999) ISBN: 1861003021 (Lib)	Gerald Kotonya and Ian Sommerville Requirements Engineering: Processes and Techniques Wiley, (1998) ISBN: 0471972088 (Lib)
Jim Conallen Building Web Applications with UML Addison Wesley, 2000 ISBN: 0201615770 (Lib)	Patrick J. Lynch and Sarah Horton Web Style Guide Yale University Press, 1999 0300076754 (Lib)
Pat Paterson, WDIEM Lecture notes. UoP, (2001)	Jeff Greenberg and J.R. Lakeland Building Professional Websites with the Right Tools Prentice-Hall, (1999) ISBN: 0130843172 (OoP, sorry)
Patrick Killelea Web Performance Tuning, O'Reilly, (1998), ISBN: 1565923790	Thomas A. Powell Web Design: The Complete Reference Osborne, 2000 ISBN: 0072122978

	Last updated by Prof Jim Briggs of the School of Computing at the University of Portsmouth