Research in Computer Science and Informatics at the University of Portsmouth

Additional information examples: Manchester


OtherDetails (Manchester)

Demonstration that Ambiguity Decomposition underlies several of the most successful methods for ensemble learning, leading to a new algorithm.The Ambiguity Decomposition is a theorem known in the ensemble learning literature since 1995; it is the most highly cited justification of why ensembles work well, but has found little practical application. This paper illustrates exactly how the theorem underlies several of the most successful ensemble methods developed since 1995.Unites several of the most popular ensemble techniques under a common umbrella; as a result, a gap in the literature is identified, which is filled by creating a new algorithm to learn the weights of a neural network ensemble.

Methods for mapping binary codes to M-of-N codes for on-chip networks.Demonstrates methods that are efficient to implement in hardware and systematic to apply for a variety of codes.Efficient mappings of binary codes to M-of-N codes are essential for commercial acceptance of a self-timed network-on-chip methodology. The 3-of-6 codes and binary mappings described are now used by spin-out Silistix (John Bainbridge, Chief Technical Officer) in a commercial product ( citations 22.

New conceptual building block for information quality (IQ) services.Addresses failings of previous approaches by combining declarative specification with procedural plug-ins for domain-specific aspects. Ability to reduce noise levels demonstrated by experimental evaluation in proteomics.Industry experts estimate costs of poor IQ at 10-15% of revenue per year. Practical and cost-effective solutions are desperately needed.Led to development of IQ plug-in for Pedro (a well-known data capture tool for proteomics scientists) and SNPit web resource for quality-oriented gene polymorphism studies.

Approach to testing database applications, in which the input database states are described declaratively rather than tuple-by-tuple (or randomly generated).Shows how databases can be prepared for each test-case automatically, thus increasing the ability to locate defects.Increased effectiveness is demonstrated through experimental evaluation.Inadequate testing of database applications results in significant costs for system owners and users. This technique reduces the time required to specify tests - a major barrier to adequate testing.

PerCo, a two-level performance steering system for distributed component-based applications.Malleable software components may be tuned to make the most of their local execution environment, or they may be redeployed to other execution environments, according to dynamic changes in the ambient conditions.Focusses on the design of PerCo, while other papers [Gurd-2, Gurd-3, Lujan-1] investigate application and performance of its implementation as Grid-based middleware.

Methodology for testing asynchronous-interconnect system-on-chip devices.Addresses the testability of asynchronous circuits, a key concern for the commercial acceptance of any circuit technology, and a particularly difficult issue for asynchronous designs.Presents an improved method of generating test sequences for the basic building block of asynchronous circuits, and exploits this in a partial scan method that provides effective testability and test pattern generation for a commercial asynchronous on-chip network.

Exploration of techniques for allowing visually-impaired users to explore chemical structure diagrams, using speech-based presentation.Proposes methods for providing structurally sensitive browsing - with particular attention to overview facilities - drawing on theoretical insights from research on visual perception.Presents results of an experimental evaluation using a fully developed software system, with and without advanced features, demonstrating a clear performance gain. Results are relevant to the general class of diagrams that are graphs.

First mathematical account of games played on graphs rather than trees; allows the notion of some positions being considered equivalent.Potential applications in modelling concurrency, programming languages and (in particular linear) logic, which will be aided by an account (under development) containing full proofs.

System for retrieving relational concepts from huge texts, integrating deep parsing, machine learning, and text indexing.Provides one of the first practical demonstrations of using semantic annotation of text to improve information retrieval. Large-scale experiments using a corpus of 15 million Medline articles demonstrate realistic retrieval times and good precision, significantly improved by the inclusion of deep-parsing.The MEDIE service, offered by the National Centre for Text Mining (, is based on the results of this research and allows users to search 16 million Medline articles for linked concepts (eg what causes colon cancer?).

Method for automatically establishing a dense correspondence across sets of shapes in 3D, as is required for building statistical shape models.Adopts a correspondence-by-optimisation approach building on [Taylor-2], presenting a full derivation of the information-theoretic objective function, and methods for manipulating correspondence for shapes in 3D.Algorithm tested on several real datasets, and shown to produce quantitatively better results than other methods.Method used by commercial spin-out imorphics (Graham Vincent, Chief Technical Officer) in a three-way collaboration with AstraZeneca (Rose Maciewitz, Senior Principal Scientist) to develop a method of measuring cartilage loss in osteoarthritis for use in clinical trials.Awarded Best Paper Prize. Google-Scholar citations 63(11pa).

New form of join point which allows for the first time true separation of parallelisation concerns using aspects that can target specific loop executions in scientific-style codes.The join point model provides context exposure that assists with the problem of loop selection.An implementation is presented and evaluated against schemes published in a precursor paper [Gurd-4], which restricted itself to using pure AspectJ and required significant refactoring of the base codes.Google-Scholar citations 23(15pa).

Solution to the problem of separating the concerns of parallel execution from those of code functionality using aspects to intervene in scientific-style Java programs.First demonstration that successful separation of these concerns can be achieved, albeit at the expense of some refactorisation of the base codes due to the nature of AspectJ. Proposes the extension to AspectJ that is later described in [Gurd-1].Includes a performance evaluation of various implementation options.Google-Scholar citations 35(10pa).

First practical demonstration that continued fractions can be used effectively to represent real numbers.Proves the key result that only the seven leading terms of a continued fraction (of the appropriate form) are required to perform on-line arithmetic, providing a practical (though relatively slow) approach to exact real arithmetic.Proof was machine assisted, as over 1000 cases needed to be considered.

Thorough performance evaluation of data structures for representing sparse matrices in Java. Important because sparse matrices are ubiquitous in scientific and engineering applications.Experiments cover 9 different data structures over nearly 200 different matrices, categorised by application domain and other properties, and show that the Java Sparse Array performs better than other storage formats for most matrices.

Modification of state-based model-checking techniques to generate a synthesis algorithm for constructing the weakest environment assumption that can be incorporated automatically in assume-guarantee reasoning.The combinatorial state-explosion problem is a major limiting factor in the application of model-checking techniques. This contribution increases their applicability for automated compositional verification of parallel processes.Demonstrates the technique on two NASA-based examples, tackling verifications previously not possible with the LTSA tool.The original 2005 ACM Automated Software Engineering conference paper won the ACM SigSoft prize and Best Paper in Conference. Google-Scholar citations 81(16pa), including conference version.

Generalisation of the type-theoretic interpretation of constructive set theory into Martin-Lof type theory, leading to a deeper analysis of the relationship between constructive type theories and constructive set theories.The generalisation involves a new type theory in which logic is treated as primitive so that reinterpretations of the logic become possible.An application to the double negation interpretation is given.

Constructive general version of the classical Galois adjunction between topological spaces and locales, based on new notions of constructive topological space and set generated locale. Demonstrates how both the point-set and point-free approaches to topology can be developed in constructive set theory (CST) and applies CST to obtain results concerning the formal points of a formal topology.

Category theoretic result that is a significant generalisation of the old result of Bloom and Tindal that infinite trees form a completely iterative theory over any given signature.Makes a significant and substantial contribution to the general area of Coalgebraic Theory, an area of increasing activity in theoretical computer science, which has become important in the mathematical modelling of computer systems.Published as a 'Fundamental Study'. ISI citations 21(5pa).

The Russell-Prawitz modality, an example of a lax modality; the reinterpretation of logic given by this new modality is essentially one prefigured by Russell in 1903 and introduced by Prawitz in 1965.Provides a new insight into the treatment of logic in the Coq proof system and fresh view of the use of any lax modality to give a reinterpretation of logic.Relatively new journal in theoretical computer science with a reputation for high standards of mathematical rigour.

Method for biomedical term classification, based on the selection of verb patterns combined with a domain ontology.Exploits terminologically-based automatic extraction of preferred verb patterns, using two machine learning techniques: a generalisation technique based on a partial order relation induced by the ontology and genetic algorithms.Also novel is the inclusion of a term similarity measure to link newly recognised terms to ontology classes.Results of the research have been used by the National Centre for Text Mining to provide the TerMine service, attracting 1,500 user visits per month.

Automatic classification of biomedical terms using case-based reasoning.Novelty lies in case-based approach (at the time) and the term similarity measure, which combines syntactic, ontology-based and lexical distance features.Approach is being used by the EC project BOOTStrep ( to handle large scale classification of biology terms, demonstrating both its scalability and adaptability.

Review of the state-of-the-art in computer-aided detection (CAD) in mammography, examining the strengths and weaknesses of published trials and looking at the pitfalls of clinical evaluation of this type of technology.There have been many trials of mammographic CAD, often giving conflicting results. This detailed review puts some of the methodological weaknesses into perspective and attempts to explain the differences.

Methods for detecting linear structures in mammographic images, and classifying them according to type (eg spicules associated with tumours, blood vessels, ducts etc).Introduces several new algorithms for line-detection - a generic problem in image analysis - and a machine-learning approach to classifying linear structures from their cross-sectional profiles.Presents the results of a large-scale evaluation using both synthetic and real mammogram images, comparing the line-detection performance of the new methods with several existing approaches and the results of type classification with expert annotation.

Method for measuring the performance of algorithms for computer-aided detection (CAD) of tumours in mammograms, and for measuring the performance of human users working with CAD systems.Introduces new approach to evaluation, based on using the transportation algorithm to compare automatically detected abnormality maps with ground-truth, dealing with missed/false detection and localisation error in a single framework.

Results of first major clinical trial directly comparing single reading with computer-aided detection and double reading in 10,000 women in the NHS breast screening programme.Shows that computer-based aids could increase detection of early cancers, but at the expense of a higher recall rate.Publication was reported in the national press. The trial's success has led to us obtaining funding for a prospective multi-centre trial, which is now underway.

Demonstration that retrenchment can easily handle the 'Balance Enquiry Quandary' of the Mondex Purse, in which balance enquiries in the abstract and concrete models of the Mondex refinement can exhibit different outputs, for atomicity reasons.First generalised forward simulation refinement for Mondex.First (1,1) forward refinement for Mondex, solving affirmatively the ten-year old open problem of its existence.Invited paper following an invited talk at Abstract State Machines Workshop 2005, Paris.

Demonstration of the advantages of retrenchment over refinement in the face of changing requirements.Displays the versatility of retrenchment in capturing model change when conflicting requirements, or backwards-incompatible requirements changes arise, using, as an example, evolution of a simple POTS telephony model to include modern telephony features. Ends with a general model for layered feature engineering.Precursor workshop paper won Best Paper Prize at the MBRE workshop at 2001 conference on Automated Software Engineering, San Diego.

Parallelisation of an explicit-state CTL* model checking algorithm for a virtual shared-memory high-performance parallel machine architecture.Shows how parallelisation can be used effectively in state-based model checking; is a major step forward in combating the state-space explosion problem.Novelty lies in the generalised dynamic load balancing (based on work stealing) and in the theoretical and experimental performance analysis.Demonstrates not only the practicality and effective speedup possible but also the pitfalls that beset parallelisation of such algorithms.

New approach to the development of models in model-driven architectures, introducing the concepts of roles, responsibilities and collaborations.Describes a new approach to role-modelling and a notation for model specification and design, providing a unifying view of different class-based programming design concepts, including subject-oriented programming and aspect-oriented programming.Demonstrates the approach using two different examples.

Dynamic web linking of biology resources through the application of an ontology, the Gene Ontology, and an open hypermedia system, COHSE, deployed on the Web, integrating computing infrastructure with life sciences content.Interdisciplinary work, illustrating the benefits of Semantic Web technologies and techniques (eg OWL Ontologies) within the setting of a real application domain.Work has been adopted by the European Sealife FP6 consortium (

Construction of static thesauri using description logic technology.Extends and improves upon earlier experiences through the use of a more expressive description logic language, and demonstrates the added benefits. Applies the techniques to thesauri as used in classification and description in museums and art history.Value of techniques has been demonstrated through a software deployment for the Getty Collection.Work stimulated a follow-on project funded by the EPSRC (GR/M75426, 'COHSE: Informed WWW link navigation using ontologies') and Sun Microsystems (two grants totalling 39,064).

Development of 750K transistor asynchronous low-power high-performance Digital Signal Processor.Large asynchronous system design is not well supported by commercial tools. Presents a design flow environment using automatic generation of asynchronous control from State Transition Graphs, as well as unique architectural features contributing to low power.Shows operation at over 100 MOPs, and an energy breakdown of system components.

First systematic survey of the transcriptional response of a non-model organism and a novel analysis strategy for working with the very large data sets generated (using the maxD infrastructure [Brass-1]).The analysis found a range of genes endowing thermotolerance and revealed a previously unrecognised scale and complexity of responses at the level of cellular and tissue function.Much of the detailed methodology is presented in supplemental data and has since been published in the high-impact journal Nucleic Acids Research.ISI citations 48(16pa); Google-Scholar citations 48(16pa).

New type of biologically-plausible associative memory, capable of storing and recalling sequences coded as 'rank-ordered' memories.Previous work on associative memories has stored sequences by hetero-association; this is the first to store sequences in a single read/write operation, solely with the use of binary weights.Relevant in computers as an efficient information storage device; it is also highly relevant to the neuroscience community, as rank order codes are the current best hypothesis of how memories are coded in the brain.

Model-based testing framework that combines automated test generation with run-time verification techniques.Novelty lies in the automated plan generation (the test-cases) via a model checker extended for symbolic execution, the generation of associated temporal logic properties, and run-time monitoring based on the Eagle logic [Barringer-3].Reports on a practical demonstration of formally defined model-based testing on a real NASA rover executive, finding seeded bugs as well as 'real bugs' overlooked by the developers.Google-Scholar citations 83(16pa) including VMCAI 2004 precursor.

Method for disentangling images of overlapping chromosomes in microscope images.Applies statistical modelling methods to disentangling overlapping chromosomes, disambiguating the possible configurations of overlapping chromosomes by combining shape and chromosome classification evidence.Provides framework for expressing shape variation in probabilistic terms allowing shape evidence to be combined with classification evidence using combination of probabilities.Quantitative comparison of shape-alone, classification alone and combined measures for disambiguating overlaps using simulated data.

New approach to modelling spots on 2D electrophoresis gels, overcoming limitations of previous modelling approaches.Uses statistical model of spot appearance to describe characteristics of spots. Also describes a new approach to training shape models automatically for image features that do not have recognisable structural landmarks.Provides quantitative evaluation of the accuracy of spot characterisation compared with existing models, and qualitative evaluation of specificity of detection. Work carried out in collaboration with AstraZeneca and now contributing to a commercial feasibility study for an automated gel-analysis system, with proof-of-concept funding.ISI citations 12.

Technique for robust feature-based non-rigid registration of sets of images. Demonstrated for 2D electrophoresis gels.Extends a widely used technique using robust statistical feature matching and a new distance metric to establish correspondence.Systematically evaluates different approaches to each component of the matching problem using synthetic data. Proposes and applies quantitative criteria in an empirical validation on real images to achieve near perfect registration.Work carried out in collaboration with AstraZeneca and now contributing to a commercial feasibility study for an automated gel-analysis system, with proof-of-concept funding.

First experimental demonstration of a high-level performance control system for distributed, component-based software.Multiple copies of a timestepping simulation code are distributed over a heterogeneous Grid of computers and are then migrated during run-time in such a way as to keep the executions of the copies approximately in step with one another.This kind of facility will be invaluable when it becomes routine for users to execute large applications over distributed heterogeneous computing resources.

Introduction to the notion of performance control for distributed execution of certain kinds of scientific coupled model.Introduces formal notation for describing coupled models.Includes a new algorithm for predicting the performance of an arbitrary deployment of individual scientific model software components onto heterogeneous execution platforms.

Critical contribution to sensor noise research, showing stochastic resonance in spin valve devices.First observation of stochastic resonance effects in magnetic sensors. Shows that achieving the peak in signal-to-noise ratio requires the injection of a small amount of white noise into the sensor, providing an approach to improving sensor performance in the non-linear regime.Culmination of many years work in refining measurement techniques to be able to see these interesting non-linear effects.

First experimental measurement of domain wall motion with sufficient resolution to observe pinning of magnetisation by the lattice.Understanding how the lattice influences the movement of domain walls is of fundamental importance in magnetic systems as diverse as high coercivity permanent magnets and magnetic data storage systems.The smallest observed steps in wall motion coincide with known inter-atomic spacing.Google-Scholar citations 20.

Method for increasing the accuracy of evolutionary distance estimates between pairs of DNA sequences, for application in phylogenetics.Provides a theoretical analysis of the errors in commonly-used fast methods for estimating evolutionary distance, showing that accuracy can be improved if the substitution rate matrix is known. Describes how rate matrices can be estimated from multiple sequences, leading to a fully implementable method.Validated experimentally, using both real and simulated data.

Theoretical and experimental study of the accuracy of signal directions estimated from data via the eigenvectors of sample covariance matrices (sample Principal Component Analysis).Extends the work of [Hoyle-2] on eigenvalue spectra, providing new theoretical and simulation results for finite samples drawn from a population containing multiple signal directions, using techniques from statistical physics.Results have been presented at invited talks in Europe and the US, and will be presented by invitation at the Issac Newton Institute in January 2008.

Study of the intensity distribution of spots in gene-expression microarrays.One of the first to propose that there may be useful information in the distribution of spot intensities (expression rates) in microarray data, as well as that normally extracted through spot-by-spot comparison under different conditions.Shows experimentally that microarray intensity distributions follow a standard law, irrespective of organism, experimental platform, condition etc, and that parameters of the distribution could be used in quality control and, possibly, for estimating the genome size.ISI citations 40(7pa); Google-Scholar citations 51.

Theoretical and experimental study of the eigenvalue spectra of sample covariance matrices - of direct relevance to Principal Components Analysis and other machine learning methods.First study to apply methods from statistical mechanics to the challenging problem of determining the eigenvalue spectrum of a sample covariance matrix for data containing many orthogonal signal directions. The theoretical results are validated via extensive simulation experiments.Used as a project text in Alan Edelman's graduate course 'Infinite Random Matrix Theory' at MIT in 2004 (

Software architecture and implementation of the MAVERIK system for creating highly dynamic, interactive virtual environments.First demonstration that complex, real-world models, such as offshore oil and gas platforms, can be rendered at real-time frame rates directly from an application database, as distinct from a customised graphical representation.MAVERIK has been downloaded over 20,000 times in total, with current downloads of 200 per month. Adopted by the Free Software Foundation as an official component of its approved tools (

Results of a substantial study of the effects of delayed feedback on haptic interaction for collaborative tasks operating over wide-area networks; first attempt to explore this topic systematically and quantitatively.Demonstrates that the level at which delays begin to impact negatively on human task performance is well below the latency present in a typical internet connection, and that task performance degrades at latency levels below those that users can detect.Also explores for the first time the cross-modal effects caused by decoupling haptic and visual feedback.

Method for overcoming the effects of limited field of view in head-mounted VR displays, in which head movements made by the wearer are amplified so that a greater degree of rotation is experienced via visual feedback.Evaluated experimentally using both a visual search task and a user questionnaire, demonstrating that the amplification technique gives superior results, both in terms of user experience and quantified task performance.Contrary to expectation, subjects reported that the amplified case was more natural than the normal one.

Decision support system for location analysis, integrating different factors influencing the choice of location for commercial outlets into a single framework.Improves location-allocation models by developing an allocation rule that reflects consumer choice processes more accurately, and an objective function that incorporates future change.Prompted theoretical work on hierarchical fuzzy and hybrid systems [Zeng-3, Zeng-4].

Demonstrator system that accesses diverse sources of health and social care information using an information broker approach.Contrasts data brokering, using a service-oriented architecture, with monolithic data warehousing (as implemented in the NHS National Programme for IT - NPfIT), highlighting innovative features of the demonstrator.Resulted in 4 keynote invitations and invited demonstrations to NHS Procurement (NPfIT), Royal Society of Medicine, Tameside Primary Care Trust, Nuffield Trust, North London Hospital, Leighton Hospital, Office for National Statistics, Greater Manchester Police and Fujitsu.Extended version of keynote paper at Medical Informatics Europe 2005.

Method for data clustering and model selection (determining intrinsic number of classes) for use in data mining applications.New approach to the problem, based on multi-objective evolutionary algorithms.Algorithm tested extensively on synthetic data sets of varying size, dimension, difficulty and number of clusters; compared with established and state-of-the-art clustering algorithms, including an ensemble approach.Google-Scholar citations 33 including conference version.

Multi-objective optimisation algorithm for problems where the evaluation of solutions is expensive (eg where evaluation involves performing a physical experiment).Proposes multi-objective extensions to the well-known Efficient Global Optimization (EGO) algorithm of Schonlau et al [Journal of Global Optimization, 1998], learning a Gaussian processes model of the search landscape, which is updated after every function evaluation.Evaluates the new approach experimentally for a range of problems drawn from the literature, demonstrating significantly better performance than the widely-used NSGA-II [Deb et al, IEEE Transactions on Evolutionary Computing, 2002].Google-Scholar citations 24 including conference version.

Critical survey of cluster validation techniques, and their place in algorithm evaluation and model selection, with an emphasis on bioinformatics applications.Classifies clustering and validation algorithms by criterion optimised (in contrast to the usual distinctions between partitioning, hierarchical, and density-based methods).Leads to new observations about the correct use of cluster validation in algorithm evaluation and analysis of novel data, supported by large-scale simulations and application to a real gene-expression dataset (with algorithms and data published electronically).ISI citations 26(12pa); Google-Scholar citations 45(21pa).

Method for maintaining a bounded number of representative non-dominated objective vectors for general use in multi-objective optimisation, with particular relevance to evolutionary algorithms.Proposes a method that provably converges to a set of solutions with a well-defined diversity property and proximity to the Pareto optimal set, subject to the optimisation algorithm satisfying certain weak assumptions.Winner of the journal's 'Outstanding Paper Published in 2003 Award' (, with 1000 USD cash prize.ISI citations 12; Google-Scholar citations 33.

Formal proof that the algorithms used in the implementation of an exact arithmetic package are correct.First use of a theorem-prover to establish the correctness of an exact real arithmetic implementation - a non-trivial exercise, since the work uncovered three errors that had not been discovered in testing and which could have resulted in arbitrarily large run-time errors.Work solicited by NASA Langley Research Center (Rick Butler, Formal Methods Team Leader), as part of their programme to transfer formal methods technology to industry (

Practical method for calculating the roots of almost any polynomial to arbitrary accuracy, requiring merely the computability of the coefficients and a non-zero leading coefficient.Demonstrates there is a way that the approximate solutions to a rational approximation of the polynomial can be re-assembled into the vector of approximations required to deliver a vector of computable reals.Method used by others (eg Edelat, Imperial College London) to reconstruct the exact reals from the approximations outlined in this work.

Method for optimising OoLaLa, an object-oriented linear algebra library.Describes OoLaLa and presents the results of initial performance evaluation, showing that overheads associated with high-level abstraction make the library uncompetitive.Presents a theoretical analysis, characterising the storage formats and matrix properties that allow implementations with high-level abstraction to be transformed automatically into implementations at a low level of abstraction, eliminating the performance penalty.Work with OoLaLa informed decisions of the Java Specification Request 083 - Lujan was a member of the expert group - and led to Lujan's employment by Sun Microsystems Research Laboratories for 16 months.

Technique for eliminating array bounds checking for multi-threaded languages, when element accesses are determined by values stored in other arrays - a common situation in scientific computing applications.Proposes and implements three strategies for eliminating bounds checking, extending the innovations that were proposed by the Java Specification Request 083 (Multiarray Package), for which Lujan was a member of the expert group.Extensive experimental evaluation is used to demonstrate the performance advantage of the approach, and identify which of the three strategies is most effective.

Automatic construction of fault trees, using retrenchment.Retrenchment is used to provide a formal model of fault injection, and the simulation relation of retrenchment is used to drive a structured and 'lazy' construction of fault trees. Combinational, clocked acyclic, and clocked feedback digital circuits are covered.Key ideas are being incorporated in the FSAP/NuSMV-SA Safety Analysis Platform at Trento IT ( European conference on Computer Safety, Reliability and Security.

Eagle, a highly expressive temporal logic for specifying trace properties of system run-time behaviours.Based on recursive rules over next, previous and concatenation temporal operators, coupled with parameterisation by both data and formulae, Eagle generalises previous temporal logics for run-time monitoring.Presents an evaluation algorithm for efficient on-line monitoring, which is shown correct wrt Eagle's logical semantics; provides formal complexity analysis, and practical demonstration on one NASA application.

Developments from Eagle continue in verification work at JPL (Klaus Havelund, Laboratory for Reliable Systems) for future space flight and ground control software.Google-Scholar citations 64(18pa).

Hypermedia authoring environment, COHSE, that combines the strengths of open hypermedia and conceptual hypermedia systems.Novelty lies in the combination of an Ontology Service with an Open Hypermedia Service, the former providing an explicit conceptual model, and the latter extending the generally closed conceptual hypermedia systems in existence at the time.One of the first examples of a Semantic Web application, making use of explicit semantics in order to improve dynamic linking of documents.Prototype implementation demonstrated the approach and resulted in long-term support from Sun Microsystems (three grants totalling 108k).Google-scholar citations 129(19pa)

New technique for implementing low-power Viterbi decoding based on an asynchronous design.Completely new approach to Viterbi decoding. Asynchronous design allows conventional arithmetic to be replaced by faster serial unary-arithmetic and multiple tracing back through path history to proceed independently of, and concurrently with, placing of data in history memory.Singled out in conference concluding remarks as 'thinking outside the box'.

Comparison of power and performance for different organisations of asynchronous multipliers using the original Booth's algorithm with the commonly used radix-4 Booth's algorithm.First asynchronous implementations of original Booth's algorithm - the designs enable data-dependent delays to be exploited within each iteration and in the number of iterations performed.Demonstrates performance improvements, but with power levels increased by a factor of more than 3.

Theoretical foundation and practical evaluation of new preconditioning method for linear stress analysis.Introduces an algebraic multi-grid (AMG) solver as a 'black-box' solver for the individual components of the stress, which is shown to be an optimal choice.Methodology is rigorously tested on a number of problems in 2D and 3D that derive from a practical industrial application (microfabrication technology).

New block direct solver for saddle point problems that arise in the solution of fourth-order partial differential equations (PDEs) using mixed finite element approximations.Describes an efficient, parallelisable, and accurate method of solution based on block Gaussian elimination, exploiting the sparsity patterns that arise in this application.The method is tested on a number of practically relevant and numerically sensitive benchmark problems, and shown to be more effective than black-box application of a sparse direct solver.

Rigorous theoretical foundation for a new preconditioning method for solving the biharmonic problem.Establishes theoretical bounds for the spectrum of the preconditioned operator.Numerical results show that the performance of the method, which can be implemented using a 'black-box' multi-grid solver, is competitive with that of specialised methods that have been developed for this problem.

Preconditioned parallelised iterative solver for the biharmonic problem, which is near-optimal and easy to implement using off-the-shelf library software.Extends the class of constraint preconditioning in a new and creative way, exploiting the block structure of the coefficient matrix for the biharmonic problem.Effectiveness of the preconditioning and scalability are evaluated using several practically relevant problems.Work led to development of software that is now part of the internationally-recognised scientific computing library, HSL, maintained by the Rutherford-Appleton Laboratory.

Formal classification of tasks in bioinformatics, and one of the first to introduce the idea of workflows to bioinformatics. Demonstrates that most bioinformatics requirements may be described in terms of filters, transformers, transformer-filters, forks and collections of data.First attempt to classify bioinformatics tasks - the ideas presented have subsequently been developed and expanded into the myGrid project and Taverna workflow system [Stevens-1].Google-Scholar citations 77(11pa).

Ontology-based measure of semantic similarity, and its application in inferring the relatedness of genes.Introduces the idea that the semantic similarity between annotations associated with biological entities (eg genes) could provide information on the relatedness of the entities, augmenting information such as sequence similarity.Presents results of large-scale validation experiments, using biologically important data sources, and exploring the relationship between semantic similarity and sequence similarity.The new measure has subsequently been used in the Bioconductor package, one of the most highly used tools for microarray analysis. The measure has also been adopted by other groups, including the University of Lisbon and the US National Library of Medicine, and has been used in function prediction, co-expression analysis and protein-protein interaction network analysis.ISI citations 63(15pa); Google-Scholar citations 157(37pa).

First application of an ontological description of a protein family for sequence-based classification.Protein function classification is a key task in bioinformatics and is traditionally seen as requiring human expertise to perform effectively. In a use-case based on phosphatases, the ontology classifier is shown to perform at least as well as human experts and is able to make new discoveries regarding fungal phosphatases.Related work addresses ontology tool construction [Rector-3].

Technique for establishing a dense correspondence between a set of shapes, allowing a statistical shape model to be constructed automatically.Practical framework for solving a known important problem. Takes radically new approach, posing the correspondence problem as an optimisation of the resulting model.Provides thorough experimental evaluation of the approach.Used widely in practical applications (see citing papers) including development of an image-based biomarker for osteoarthritis, since deployed in clinical trials with industrial partner AstraZeneca (Rose Maciewicz, Senior Principal Scientist).ISI citations 53(10pa); Google-Scholar citations 164(30pa).

Technique for decomposing facial appearance variation into subspaces for identity, expression, pose, lighting effects etc, based on an incompletely sampled training set.Presents a practical algorithm for iteratively refining subspace estimates from multiple training sets, each of which systematically explores some subset of types of variation, but with contamination from others.Extensive evaluation demonstrates that the improved separation achieved between sources of variation results in better face recognition performance.Technology is used by spin-out Genemation ( to generate and animate face images for computer game and mobile device applications.

Method for building 3D statistical models of the highly variable patterns of cortical folding (sulci) from a set of magnetic resonance brain images, supporting studies of brain development and function.Solves the difficult problem of establishing correspondence between sulcal patterns, making use of rich representations of local context to extract sulcal features that are common across individuals.Presents results for 22 brain images, systematically comparing the performance of different variants of the approach using a new performance metric that does not require ground truth.

Critical appraisal of Bayesian and frequentist approaches to the development of medical image analysis algorithms.Demonstrates the superiority of frequentist over Bayesian methods using practical examples from a range of common medical image analysis tasks.Invited publication resulting from a presentation at a Royal Statistical Society meeting. In the top 20 most downloaded articles from Image and Vision Computing in 2003 (

Model that predicts the magnetic resonance (MR) signals generated by extended structures such as muscle-fibre bundles, applying ideas from the theory of liquid-crystals.Provides a principled explanation of the complicated MR signals produced by muscle tissue, which are currently only understood empirically.Presents the results of a thorough experimental validation using MR images of human leg muscle.

Demonstration that there is net flow through most human brain tissue, based on combining a physiological model with image analysis of dynamic contrast magnetic resonance images (DCMRI).Undermines the assumption of no net flow that underpins standard methods of analysing DCMRI data, demonstrating that the resultant cerebral blood flow (CBF) maps are flawed, and suggesting a new framework for obtaining accurate estimates of CBF.Controversial paper that challenges conventional wisdom, and may take some time to be widely accepted. Led to invitation to write a chapter for a NHS-recommended research textbook.

Method for measuring regional loss of grey-matter in dementing disease, from magnetic resonance (MR) images of the brain, as a basis for automated diagnosis.Uses the regional distribution of cerebro-spinal fluid as a surrogate for grey-matter loss, achieving better measurement accuracy and precision than is possible for direct measurement of grey-matter volume.Presents results of a feasibility study based on MR imaging of 57 subjects, demonstrating good diagnostic accuracy, and overturning the clinically accepted view that MR imaging is not helpful in diagnosing dementia.Software ( has been downloaded by a small number of international groups.ISI citations 12.

Improved decision-tree classification algorithm, and its application to stroke data.Describes T3, a method of building classification trees that uses a new 'split' criterion designed to result in simple and accurate decision trees.Presents the results of a large-scale experimental evaluation using real medical data, from subjects on a stroke register, to predict outcome (stroke/control or stroke-death/survival), demonstrating a level of performance that could be clinically useful and showing that T3 performs significantly better than C4.5 - a state-of-the-art tree classifier.

Method of maximum-entropy (ME) model fitting that avoids over-fitting by redefining the training problem.Proposes a formulation in which a ME model is required to fit training data to some required accuracy (providing inequality constraints) rather than exactly, resulting in sparser models with better generalisation ability.Presents the results of a text classification case-study, using two large publicly-available corpora, demonstrating that the new method produces sparser models and achieves better classification results than established methods.Used to implement practical text mining tools (GENIA tagger etc) available through the National Centre for Text Mining.

Influential multi-centre review of text mining in biological research, setting directions for future research.Provides a coherent view of what had been until then a fragmented and unfocussed field, proposing two biologically important 'Challenge Problems' as foci for internationally coordinated activity.Created a framework for effective collaboration between natural language processing researchers and biologists, leading to the development of community resources (GENIA, University of Tokyo; BioIE, University of Pennsylvania; BioThesaurus, Georgetown University) and providing the impetus for competition-oriented workshops (BioCreative 2004, 2007; NLPBA 2004).ISI citations 69(14pa); Google-Scholar citations 189(39pa).

Robust algorithm for manipulating and finding the optimal dense correspondence across a set of 2D shapes.Previous optimisation schemes were neither efficient nor robust. Using both artificial and real data, new algorithm is shown to produce quantitatively better results, and to scale linearly with number of examples.Conference precursor awarded the BMVC 2001 'Model Based Vision' prize.Google-Scholar citations 31 including conference version.

Algorithm for analysing the results of non-rigid registration of medical images using diffeomorphic warps.Constructs a common representation for a set of diffeomorphic transformations, allowing analysis of shape variation using metrics on the diffeomorphism group. Evaluated experimentally using real and artificial data, showing a quantitative improvement in the ability to classify shape change when compared to conventional Euclidean metrics.A precursor paper was runner-up for the Erbsmann prize at the IPMI 2003 conference.Google-Scholar citations 20.

Limited resource strategy that allows a theorem-prover to adapt to the available computer resources.Presents the ideas of the strategy, its implementation in VAMPIRE [Voronkov-2], and experimental results. Many problems not solvable by any existing first-order theorem prover are solved using the strategy, and it is often possible to prove within a minute problems that otherwise require several hours.Key enhancement of the award-winning VAMPIRE theorem prover - see [Voronkov-2].

Survey of complexity and expressive power of logic programming and some database languages.Presents many known results in a new unified framework and proves new results.ISI citations 49(8pa); Google-Scholar citations 204(34pa).

Overview of VAMPIRE, a high-performance first order theorem-prover that has taken first place in the major categories of the CASC international theorem proving competition ( since 2002.Provides a detailed description of the functionality, architecture and implementation of VAMPIRE, highlighting the novel contributions it has brought to the field.Technology licensed to major companies, including Intel, in the area of hardware verification, and Microsoft Research, who plan to use VAMPIRE in their program development tools.ISI citations 30(5pa); Google-Scholar citations 121(22pa).

Efficient bottom-up proof procedures for sequent calculi, exploiting new redundancy criteria.Introduces new methods of proving redundancy, allowing efficient pruning of the search space; the resulting method turns out to be stronger than automata-based approaches, since the latter cannot capture some of the redundancy criteria exploited by the new approach.Presents results of an experimental evaluation, using a substantial public-domain problem set.Work has potentially important applications in efficient classification of ontologies based on description logics.

Proof-of-concept study of the overlap in content between different texts describing the same botanical species.Analyses data content of botanical species descriptions, showing that the distribution of information across multiple texts is much sparser than had previously been assumed, thus demonstrating the importance of drawing on multiple parallel sources.First result from the MultiFlora project, one of the few 'Biodiversity Informatics' projects funded by the BBSRC/EPSRC Bioinformatics Initiative.

Critical review of agreement statistics for use in evaluating coding schemes for discourse and dialogue analysis.Identifies flaws in many of the statistical metrics commonly used as agreement statistics, arguing that only one approach provides relevant information, and making general recommendations on the assessment of reliability.

Scheme for modelling partly-known systems in which mathematical models of some components of the system are known while others are unknown - a common modelling situation.Proposes a hybrid system approach which keeps (physically meaningful) models of known components and combines these with model-free approximation schemes (such as fuzzy systems, neural networks and spline approximations) to model unknown components.Proves that the approach is generally valid and applicable since resulting hybrid systems are universal approximators, laying the theoretical foundations for systematic hybrid modelling.

New learning method which combines human knowledge and machine learning in a theoretically justified way to obtain more accurate system models.First use of interval fuzzy system to represent inaccurate human knowledge.Develops knowledge-bounded least square (KBLS) method to combine human knowledge and data-driven learning.Provides mathematical proof that KBLS method guarantees more accurate models than without using human knowledge.Method developed when author worked in spin-out KSS Ltd; has since been used in all intelligent decision support systems by KSS (Philip Cook, Head of Research).

Proof that a hierarchical fuzzy system can approximate any continuous function to arbitrary accuracy, establishing the general applicability of the approach.Explores the properties of hierarchical fuzzy systems, establishing whether they are of general application and providing the first account of the circumstances under which they are better non-linear models than standard fuzzy systems.

New cryptographic primitive and its integration into two variant protocols to support certified e-mail delivery.Provides thorough security assurance analysis and experimental evaluation of approach.Approach is significant in that it achieves 'non-repudiation of origin' and 'non-repudiation of receipt' security properties (to protect communicating parties against insider false denials of e-mail message reception) with low computational and transmission cost.

Solution to the problem of autonomous and fair signature exchange in the context of agent-based privacy-preserving e/m-commerce applications, for which existing fair exchange protocols fail to deliver an appropriate solution.First to identify and address this problem.Novel ideas incorporated into design of the cryptographic protocol include: original use of misbehaviour penalisation; and security support for autonomous signature generation and exchange. Detailed analysis is performed to prove the protocol security.Related paper describing a novel framework for implementing the cryptographic protocol won Best Paper Award at ISCC 2004 (9th IEEE Symposium on Computers and Communications).

Anonymous accountable Public Key Infrastructure (PKI) certificates as a basis for unlinkable credentials.Provides new insight into the use of PKI certificates to achieve privacy in terms of a user's identity, location and transaction history, while still holding the user accountable for the actions (s)he performs on-line.Attracted one of two Vodafone Scholarships awarded that year by wireless telecommunication company, Vodaphone Ltd.

First cryptographic approach that truly offers both fairness and anonymity as essential security properties, employing an efficient method for off-line key recovery, and placing weak requirements on the security of third parties.Proposes new technique to achieve non-repudiation and fairness while protecting users' identity.Provides thorough theoretical analysis and experimental evaluation of approach.Received highest number of download hits on the publisher's website in 2003.

Method for analysing and solving the public transport driver scheduling problem.Provides new techniques and strategies for scheduling problem decomposition and composition, a heuristic framework for structuring and analysing scheduling problems, and a complex algorithm for solving such problems.Demonstrates the method in detail on a practical example and provides results of a thorough experimental evaluation.

Approach to 'speech acts' that assumes a very sparse set of very simple speech acts, and uses an extension of the theorem-prover described in [Ramsay-1] to draw complex consequences of these acts in a variety of settings.Avoids introduction of speech acts with identical surface forms but different effects, and hence eliminates a level of ambiguity.Exploits knowledge about the goals and beliefs of the discourse participants in order to support reasoning about novel and non-canonical uses of language.Contributes with [Ramsay-2] to a body of work which has led to invitations to write reference papers.

Constructive alternative to standard revision-based semantics for a property theory logic. Establishes soundness and consistency wrt the new semantics.Describes first published inference engine for this logic. Previous work on similar logics, eg by Miller and Nadathur, has concentrated on 'uniform proofs', which cover a subset of property theory.Using property theory allows for fine-grained analysis of propositional attitudes and similar phenomena, but the lack of an inference engine has hindered its widespread acceptance.Significance for the natural language processing community marked by an invitation to speak at the Fourth International Workshop on Inference in Computational Semantics, 2003.

Extension of standard approaches to answering natural language questions, to cover cases where the answer set is not simply a small set of constants.Extends incremental approach to generation of referring expressions to cope with arbitrary inferrable identifying properties, rather than relying on lists of feature:value pairs.Exploits theorem-prover [Ramsay-1] to reason over the system's beliefs about the user's knowledge and to retrieve appropriate identifying properties.With [Ramsay-4] and related publications has led to invitations to write reference papers for the Encyclopedia of Languages and Linguistics, Oxford Handbook of Computational Linguistics, and Linguistics Encyclopedia.

Extension of PHASE software [Rattray-2] to implement novel methods for analysing protein-coding genes, applied to mammalian mitochondrial genomes.Implementation of new evolutionary model for dealing with non-stationary processes in sequence evolution, which are a common feature of protein evolution.New model is shown to eliminate significant sources of bias observed in a comprehensive analysis of mitochondrial genomic data, leading to improved consistency in the results of phylogenetic analyses using different data sources.ISI citations 16(6pa).

Methods for phylogenetic inference over RNA sequence data, implemented in the PHASE software (, applied to an important biological problem.First work to apply the currently best probabilistic sequence evolution models specific to RNA genes - the most common genes used for molecular phylogenetic analyses - using a Bayesian approach to provide samples from the posterior distribution of tree topologies.Application to RNA sequence data provides evidence regarding the early evolution of mammals that tends to support results obtained independently using DNA sequence data.ISI citations 48(9pa); Google-Scholar citations 56(11pa) including papers describing numerous biological applications of PHASE.

New technique for low-level processing of data from high-density microarray experiments.Uses Bayesian methods to associate extracted data with technical error level.In the influential Affycomp II competition, comparing 87 methods (as of June 2007), method has 2nd highest number of top scoring criteria in 2 out of 3 competitions.Won 1000 US dollar best submission (runner-up) prize for poster describing this work at MGED conference (Norway, 2005).Technical error can be propagated through suite of published applications improving performance ( as implemented in the PUMA bioconductor package.

Technique for modelling temporal regulation of gene expression by transcription factors, allowing for efficient probabilistic inference over gene expression time-series data.Application of Bayesian state space model to a challenging problem characterised by short high-dimensional time-series and sparse network connectivity.Efficient variational Bayes inference algorithm allows practical application to large-scale connectivity and expression data.Formed basis of oral presentation at Computational Methods in Systems Biology 2006. Has been applied to systems-level modelling of growth regulation [Partridge et al, Journal of Biological Chemistry 282, 2007, 11230].

Treatise on the distinction between 'granularity' - based on size - and 'collectivity' - based on collective effect or action.Key to formalising why, for example, losing cells from the skin is not an injury whereas losing a finger from the hand is. Accounts for all but two of the published list of cases where the partonomic relations are not transitive - a recurring issue in biomedical ontologies.One of a series of publications seeking to achieve consensus on basic relations to be used in biomedical ontologies. Controversial, but being taken up by medical ontology community.

Summary of the contributions made by two major research programmes: PEN&PAD, which pioneered user-centred design and knowledge-driven clinical interfaces; and GALEN, the first large-scale use of description logics for clinical terminologies.Invited paper for international leaders in Artificial Intelligence in Medicine.Work pioneered use of expressive description logics and motivated development of the FaCT classifier.Led directly to major industrial collaborations with Siemens Medical Solutions (Sam Brandt, Chief Medical Informatics Officer) and Informatics-CIS Ltd (Martin Hurrell, Managing Director). Related tools have been applied to work in bio-ontologies [Stevens-3].

Control system for a pneumatically actuated robot for medical applications.Applies new control technique of pole-placement impedance control on a robot designed for robotic physiotherapy.Extensive experimental results demonstrate improved noise rejection and controller performance when compared to conventional approaches.The iPAM robotic physiotherapy system, which uses these impedance pneumatic controllers, has recently been approved by the Medicines and Healthcare products Regulatory Agency to proceed with clinical trials.

Position and impedance control of custom-made polymer actuators.Describes first work in the UK using ionic polymer metal composite, which is difficult to manufacture, exploring experimentally performance under different control strategies. Presents first results in the world of applying impedance control on ionic polymers, incorporating integral control to prevent drift - now standard in work on the control of polymer devices.

Self-tuning control strategy for low-friction pneumatic actuators.Amongst the first demonstrations that low-friction cylinders can be controlled with precision.Implements a reduced-order model scheme to minimise the number of system parameters that need to be estimated.Presents results of an extensive experimental evaluation.

Design and analysis of an ultrasonic linear motor.Adopts a new approach that allows a single variable voltage source to induce elliptical motion at the corners of the material, creating an 'inch worm' type motor. Other designs of this type are restricted to using two separate voltage sources.

Heuristic that allocates a query plan onto distributed heterogeneous machines in a way that exploits parallelism without sacrificing the efficiency of resource utilisation.First heuristic proposed for this type of problem, which is important in Grid computing.Provides results of extensive simulation-based evaluation.Part of broad Grid query processing activity (see [Paton-4]).

Solution to the resource allocation problem in Simultaneous Multi-Threaded (SMT) processor architectures.Shows how, by considering resource allocation as a quality-of-service problem, the SMT processor can be used to meet the specific requirements of the operating system.First solution to the relatively neglected problem of tuning an SMT processor on the basis of dynamically changing requirements, rather than statically predefined objectives.

First investigation of extending description logics with aggregation functions (eg max, sum).

Proves that such extension almost always leads to undecidability, leading to the 'common knowledge' that aggregation functions can only be supported in query language.Solves a recognised open problem, and provides rigorous proofs of results.Informed the design of the web ontology language OWL.Google-Scholar citations 48(12pa).

First investigation of combining description logics with concrete key constraints.Covers most variations and proves that this combination has a detrimental effect on the computational complexity; contains hardness results and algorithms.Has informed the design of the web ontology language OWL (which excludes inverse functional datatype properties).Google-Scholar citations 20.

Method of providing models for the observational precongruence of a concurrent language.Literature shows that this, unlike modelling bisimulation, is a difficult problem.Presents an approach based on domain theory; only one other attempt is known.Proofs are provided for all results.

Abstract explanation for most existing models of linear logic, which can be seen as specific instances of abstract constructions described in the paper.Provides a framework for such models and gives guidelines for obtaining new models with specific properties; mathematical proofs are provided for all the main results.ISI citations 12; Google-Scholar citations 27.

Simple method of building models for linear logic, which allows researchers with minimal knowledge of category theory to pick desirable properties and construct a categorical model.Importance of result established by two recent overview articles: [Restall, 'Relevant and Substructural Logics', in Handbook of the History of Logic, Volume 7: Logic and the Modalities in the Twentieth Century, Gabbay and Woods (eds.), Elsevier, 2006] and [Blute and Scott, 'Category Theory for Linear Logicians', Proceedings of the Linear Logic Summer School, Ruet et al (eds.), Cambridge University Press, 2004].

Practical decision procedure based on hyper-resolution, a refinement of resolution closely related to instantiation-based and tableaux methods used in current theorem-prover implementations.Presents new decidability results for natural fragments of guarded fragments, relevant to description/modal logics.Shows how results relate to, and carry over to, tableaux and model generation approaches; also generalises existing techniques and results outside of first-order logic.Since first publication as a workshop paper, the key ideas have been explored and developed further by others: [Baumgartner-Suchanek, PPSWR 2006], [Peltier, IC 2003], [Peltier, JAR 2004] and [Peltier, JAR 2005].Google-Scholar citations 19.

Translation-based resolution decision procedure for a multimodal logic, that can be used both as a satisfiability checker and as a model builder.Takes a radically new approach, based on selection refinement rather than ordering refinement, refuting misconceptions in the area and demonstrating the considerable potential of general-purpose methods based on first-order resolution for automated reasoning in description/modal logics for semantic web and multi-agent applications.Presents practical decision procedures for expressive description/modal logics, and practical minimal model generation methods, showing for the first time that resolution provers are usable as tableaux provers.Google-Scholar citations 35.

Family of multi-agent logics for reasoning about dynamic activities and informational attitudes of agents; more expressive than related dynamic logics, yet remaining decidable with 2ExpTime complexity.Contributions include: removing the need to assume independence of agents' actions; allowing quantification over actions; making reflective information expressible; and supporting more fine-grained modelling of agents' awareness of themselves and their surroundings via a new operator.Solves a deficiency of standard combinations of doxastic/epistemic logics and propositional dynamic logic.Provides new insight into much-investigated product logics (cf Gabbay-Kurucz-Wolter-Zakharyaschev book of 2003).

Fast, on-line algorithm for learning geometrically consistent maps using only local metric information from an exploring robot.One of the first on-line Simultaneous Localisation and Map-building (SLAM) algorithms; described as the '... first efficient algorithm for solving such problems' in a recent, major textbook on probabilistic inference in robotics ['Probabilistic Robotics', Thrun et al, MIT Press, 2005, p381].Provides both theoretical and experimental validation of the approach.

Unsupervised learning algorithm which adapts model structure to training data in real-time.Extends self-organising maps to learn the appropriate number of clusters and topology.Tested on artificial data of known topology and in robotic localisation.Used by others: in robot control [Li & Duckett, International Journal of Vehicle Autonomous Systems 4(2-4), 2006, 285-307], modelling reward-seeking tasks in basal ganglia [Khamassi et al, LNCS 4095, 2006, 394-405] and for neural imaging [Ferrarini et al, LNCS 3750, 2005, 451-458)]. The latter author states '...proved to be (1) more data-driven while growing and (2) faster in learning input representations when compared with previous models.'ISI citations 23; Google-Scholar citations 53.

Modified estimation of distribution algorithm (EDA), motivated by an analytic study of failure modes in this class of metaheuristic for optimisation.Amongst the first analytic studies of EDA; shows that the basic algorithm fails unless control parameters are set with problem-dependent scaling, which ranges from low-order polynomial to exponential in the number of variables, depending on the problem.Correct scaling is shown analytically for two simple problems, based on a hypothesised cause of the scaling failure, and a modified algorithm is proposed which avoids this type of failure.

Application of workflow tools to a use-case based on an analysis of the genomic micro-deletion that causes Williams-Beuren syndrome.First published example in bioinformatics of workflow tools achieving significant biological results.Automation of tasks in bioinformatics has long been a goal of the community. This paper shows that the Taverna and myGrid infrastructures are capable of supporting this ambition.ISI citations 26(7pa), Google-Scholar citations 54(15pa).

Experimental investigation of the magnetostrictive properties of soft magnetic films - important in the development of magnetic sensors and read heads in hard disk systems.Using Hill's method for growing films on very thin glass substrates has allowed the magnetostriction at the interface to be shown to be different from the bulk film.Work has attracted commercial interest, and hard disk manufacturer Seagate provided samples for this study.

System for extracting reliable information from parallel botanical texts.Integrates information from different sources describing the same plant species into an instance of a botanical ontology, using information extracted automatically from the raw texts. Reports the results of retrieval experiments (described more fully in [Wood et al, 'Using parallel texts to improve recall in botany', Recent Advances in Natural Language Processing, 2003]), showing a threefold improvement in recall using species descriptions obtained from multiple sources compared to single sources, and illustrates the instantiation of the ontology using a small case-study.

Culmination of a 15+ year strand of research investigating the role of order in modelling hard disk storage media.Demonstrates conclusively the important role of disorder in determining the performance of magnetic storage media, establishing the case for patterned media for future data storage systems.Shows that the magnetic storage industry must take a path that will require a paradigm shift.Work sponsored by the Information Storage Industry Consortium (INSIC), the US-based industry body that funds research in data storage and creates hard disk roadmaps.

Dynamic compilation into thread-based parallel bytecode with adaptive workload distribution.Describes first system that performs truly dynamic (as opposed to non-optimising JIT) adaptive loop-parallelisation at run-time using the Jikes RVM dynamic Java compiler. Targetted at a chip multiprocessor architecture with lightweight thread support.Includes detailed evaluation using the dynamic compiler system running on a cycle-accurate simulation of the target chip multiprocessor.

Asynchronous hardware pipelining technique that translates Java bytecodes into (ARM) processor instructions.A novel hardware unit is inserted into the asynchronous processor pipeline; the resulting pipeline can vary in length depending on the complexity of the required translation.Includes a comprehensive evaluation of the technique using silicon-level simulation.

Theoretical upper bound on the number of numeric operations that can be performed before accuracy is lost.Presents first use of continuous random variables with a theorem prover, and first use of Doobs-Kolmogorov theory in the context of arithmetic round-off and control theory.Addresses an important issue in control systems, but is of potentially broad application to understanding an important source of risk in engineering applications - with the potential to establish a new field.Work is being distributed by NASA Langley Research Center (Rick Butler, Formal Methods Team Leader), as part of their programme to transfer formal methods technology to industry (

Technique, based on kernel methods, for modelling the distribution of data in high-dimensional spaces. Allows construction of non-linear models of sets of shapes or sets of images.Shows theoretically that a previous, related approach is not generally valid, proposing a new approach that avoids these limitations.Demonstrates the effectiveness of the new approach experimentally, using images of hand-written digits as training data, showing how the resulting non-linear model can be used to extract digit information from noisy images.ISI citations 11; Google-Scholar citations 25 including conference versions.

A coherent Semantic Web-based approach to knowledge management for Grid applications and resources.Adapts systematic knowledge management methodology to a real Grid application. Previous proposals have been ad-hoc.Approach evaluated and validated through a fully implemented and deployed real world application case-study in aeronautical Engineering Design Search and Optimisation as part of the GEODISE e-Science project.

Method for preserving the 'write-once-run-anywhere' paradigm of application distribution through an application neutral distribution format, such as Java Byte Codes, extending 'Program Code Conversion' techniques [Rawsthorne-2, Rawsthorne-3, Rawsthorne-4] for network-server translations.Avoids need for every client to translate applications by persistent caching on translation server.Invention related to spin-out Transitive's prize-winning dynamic binary translator products (see RA5a), which allow legacy compiled software to be readily ported across diverse hardware platforms. Adopted under licence by Apple (12 million shipments expected by December 2007) and IBM (10,000 shipments per month on pSeries computers, from 3Q07).

Means for representing sequences of register assignment operations and arithmetic and logical operations using an 'expression forest', or plurality of trees.Key to fast translation (for responsiveness) and expressive representation (for reconfigurable translators and re-optimisation).Underpins spin-out Transitive's prize-winning dynamic binary translator products (see RA5a), which allow legacy compiled software to be readily ported across diverse hardware platforms. Adopted under licence by Apple (12 million shipments expected by December 2007) and IBM (10,000 shipments per month on pSeries computers, from 3Q07).

Process for efficient translation of code that uses overlapping CPU registers (eg x86 AL, AH, AX, EAX).Describes a way of efficiently representing registers which may contain overlapping subsets of addressable fields.Essential when translating machine code between machines which have different ways of handling bytes, short words and 32-bit words.Underpins spin-out Transitive's prize-winning dynamic binary translator products (see RA5a), which allow legacy compiled software to be readily ported across diverse hardware platforms. Adopted under licence by Apple (12 million shipments expected by December 2007) and IBM (10,000 shipments per month on pSeries computers, from 3Q07).

Methods for determining when a complex instruction can be partly translated in circumstances that are either predictable at translate time or when the omitted translations may be needed at a later execution time, in which case special-case guard code is also emitted.Important in maximising the responsiveness of a dynamic binary translator.Underpins spin-out Transitive's prize-winning dynamic binary translator products (see RA5a), which allow legacy compiled software to be readily ported across diverse hardware platforms. Adopted under licence by Apple (12 million shipments expected by December 2007) and IBM (10,000 shipments per month on pSeries computers, from 3Q07).

Culmination of many years work to develop systems capable of capturing transcriptome data and meta-data in a flexible and standards-compliant way. The major innovations of the system are the flexibility with which it can describe meta-data and the provision of a web services interface allowing it to be used with modern workflow management systems (see [Stevens-1]).The software has been taken up by a number of other groups and used to submit over 25 transcriptomics data sets into the international transcriptome repository at the European Bioinformatics Institute.

Use of OWL description logic to express the relationship between ontologies, coding systems based on those ontologies, and information models expressed in UML (or similar formalisms).Hot topic in health informatics because, in healthcare, terminologies and coding systems are developed independently. Lack of adequate means to specify the binding between them is a major source of incompatibility between systems that claim to conform to the same standard.Based on consultancy and tool development for NHS Connecting for Health, which has led to further work with both NHS and international standards bodies: CEN, ISO, and HL7. Updated version now accepted for publication in the Journal of Applied Ontology.

Extension of incremental view materialisation (IVM) techniques to object-oriented databases, increasing the range of applicability of the approach.Addresses significant technical challenges - since object-oriented models are more expressive than relational ones, with richer update semantics - providing a solution which is more comprehensive than any in the prior literature.Includes a thorough empirical evaluation.

Overview of the TeraGyroid Experiment.Outlines the distributed computing challenges involved in running computationally intensive applications on a transcontinental grid, and shows how they were met using a service-oriented approach.Work was awarded 'Most Innovative Data-Intensive Application' in the HPC Challenge competition at Supercomputing 2003, and the ISC Award in the 'Integrated Data and Information Management' category at the International Supercomputer Conference 2004.Returned paper contains many technical details, but related papers (eg [Scientific Programming 13(1), 2005, 1-18]) which contain fewer technical details have been more highly cited - aggregate Google-Scholar citations 25.

Architecture, design, and implementation of the RealityGrid computational steering library and toolkit - providing support for large-scale scientific simulation. Innovations include: checkpoint tree for provenance of long-running experiments; portability across all popular modular visualisation environments; and component-friendly service-oriented architecture.Associated software has been adopted by a significant number of computational science groups internationally for use in more than 20 production and teaching codes, including NAMD, LB3D, HEMELD, LAMMPS, VMD and AVS Express.

Computational steering methods for accelerating discovery in scientific simulations.Motivates the use of computational steering and presents practical examples from computational physics, introducing the RealityGrid computational library [Pickles-1] and highlighting discoveries in physics it has enabled. ISI citations 16; Google-Scholar citations 38.

Novel use of energy-filtered transmission electron microscopy to provide quantitative information on Cr distribution in CoCr thin films, and directly relate this to magnetic interactions.Conventional magnetic media consist of heterogeneous CoCr-based thin films with grain diameters

Application of a general coupling framework for multi-scale modelling, that facilitates flexible composition and deployment on a range of architectures, including computational Grids.Describes practical requirements for implementing a hybrid molecular dynamics and continuum fluid dynamics coupled model, representative of a generic class of coupled models whose algorithmic structure presents interesting opportunities for deployment on a range of architectures.Work in collaboration with a materials scientist demonstrates the applicability of the approach.

New algorithm for determining the risk of identifying individuals in a confidential dataset for which summary statistics are published.Uses the results of an exhaustive search for unique patterns with no unique subsets to predict the risk at record level.Resulting SUDA software is used by the UK Office for National Statistics (Jane Longhurst, Joint Head of Statistical Disclosure Control) and the Australian Bureau of Statistics (Geoff Lee, First Assistant Statistician); Statistics New Zealand and Statistics Quebec are currently evaluating.Extended by [Keane-1].

An exact distributed invalidation compiler algorithm for general array-based parallel programs.Distributed invalidation allows compilers access to hardware consistency mechanisms in order to improve performance by reducing invalidation traffic.Experimental results show that this algorithm outperforms purely hardware-based sequential and release consistency approaches over a broad range of programs.

Adaptive control framework for optimising the execution time of component-based coupled-model applications in a distributed and dynamically varying computing environment.Adopts an adaptive migration strategy, based on predictions of execution time provided by regression analysis and time-series analysis.Evaluated experimentally for a real coupled-model scientific application, demonstrating effective adaptation to varying ambient load.Fills a gap in the scientific literature, providing the first account of adaptive control for component-based applications in dynamically varying distributed environments, such as Grids.

Method for automatically extracting candidate specialist terms from texts, despite variation in their form.Identifies classes of variability typically encountered in the use of terms and proposes a set of 'normalisation' methods designed to transform occurrences to canonical form, improving the power of statistical methods for assessing the likelihood that a word or phrase is a term (corresponding to a domain concept).Presents results of a large-scale evaluation, using a public-domain corpus containing 2000 biomedical abstracts with over 75,000 manually annotated terms, showing that normalisation results in significant improvement in term recognition.

Critical review of the role of text mining and ontologies in managing and interpreting biomedical texts.Explores the state-of-the-art in natural language processing as applied to information retrieval, extraction and mining from biomedical texts, highlighting the relationship between ontologies and term recognition as a key but under-researched area.Sets a research agenda for moving beyond the relatively limited achievements possible using traditional text mining methods in the biomedical domain.

Analysis of three text corpora, providing evidence that syntactic context can aid term recognition.Addresses the important problem in text mining of identifying specialist terms (concepts), testing the hypothesis that terms are more likely to occur in some grammatical constructions than others.Presents results that support the hypothesis, showing that the frequency of term occurrence varies considerably between grammatical constructs, and that these effects are domain-specific - suggesting that syntactic context could be used to improve the accuracy of term recognition, and reinforcing the importance of using domain-specific corpora in learning-based analysis of natural language.

Critical review exploring the potential for text mining to make a major contribution to research in systems biology.Places text mining in the context of a holistic approach (as opposed to the traditional hypothesis-driven approach) to research in systems biology, providing a clear roadmap of the opportunities (eg using text mining results as evidence to support and enrich metabolic pathway models) and challenges.Only the second paper in the literature to address this topic.

Approach to acronym recognition that copes with cases where letter matching approaches fail.Novelty lies in the statistically-based approach to recognising possible expanded forms for an acronym.Achieves 82-95% recall with 99% precision, outperforming the best methods previously reported.Approach adopted and implemented by the National Centre for Text Mining as a national service (AcroMine) covering the whole of MEDLINE. 5,500 user visits per month. Used by Pfizer, AstraZeneca and a number of bioinformatics centres in the UK.

Demonstration that the mutual information (MI) objective function commonly used for image registration is in fact a biased maximum likelihood estimator, leading to an estimate of registration errors using the minimum variance bound.Provides additional theoretical insight into MI, and allows quantification of results from algorithms based on it.Rigorous evaluation via two large Monte-Carlo simulations.Incorporated into the TINA open-source software package ( average ~1000 downloads per month since 2004).

Validation of an equivalent electrical circuit model of blood and cerebro-spinal fluid flows within the skull, through testing predictions of the jugular venous outflow.The model allows measurement of parameters that are not routinely observable, such as vascular compliance, which may act as biomarkers for cerebral vascular disease.Demonstrates accurate prediction of the jugular flow on 16 normal subjects.

Reinterpretation of image subtraction as a hypothesis test, leading to a superior algorithm (compared with straightforward image subtraction) with predictable performance.Provides an example of the improvements gained through a principled statistical approach to algorithm design.Incorporated into the TINA open-source software package ( average ~1000 downloads per month since 2004), and used in a wide range of industrial and biomedical applications.

Instantiation-based method for first-order reasoning, including proof of completeness.Allows the use of industrial-strength propositional SAT solvers in reasoning with a much more expressive first-order logic. These results are expected to have significant impact on automated reasoning since they give a foundational basis for an alternative to well-studied resolution-based methods.iProver - an experimental theorem prover, based on this theoretical work (with extensions to support equational and theory reasoning) - was amongst the leading systems in CASC 21, the recent 'world championships' for automated theorem provers (, gaining a special prize for the most improved system.

Proof that the orientability problem for Knuth-Bendix orders is decidable, and an algorithm for solving it in polynomial-time.The orientability problem for simplification orderings has a number of important applications including program termination analysis, pruning search space for automated reasoners, and equational reasoning.The orientability algorithm has been implemented in a number of termination-checking tools including: Tyrolean Termination Tool (Innsbruck University, Austria) and the world-leading termination tool AProVE (University of Aachen, Germany).

Proof of the decidability and NP-completeness of the constraint solving problem for Knuth-Bendix orders, the decidability of which has been an open problem for a number of years.Techniques introduced in this paper have since been used by T. Zhang, H.B. Sipma and Z. Manna (Stanford University, USA) to prove results related to extensions of term algebras, including the decidability of the first-order theory of term algebras with the Knuth-Bendix ordering.Google-Scholar citations 19.

Method for automated composition of web services by mapping from OWL-S web service descriptions to a standard AI planning system.Provides a provably correct mapping from OWL-S to a hierarchical task network (HTN) planning system - covering more of OWL-S than previous mappings - and describes the first end-to-end HTN-based planning system for web service composition.Start of a larger program on semantic web service (SWS) composition, leading to membership of the OWL-S coalition, SWSI, and chairmanship of W3C's SWS interest group. Used by Lockheed-Martin (Kevin Newman, Research Scientist); related work used by Fujitsu Laboratories of America (Ryusuke Masuoka, Director, Trusted Systems Innovation Group).Google-Scholar citations 91(26pa).

Interpretation of sunspot record database (1874-2006), applying rigorous search in parameter space.Previous studies have not thoroughly examined the full parameter space and have not separated clearly the hypothesis to be tested from the methodology of exploring the phase space.Software published on the web to aid independent confirmation of research results.Latest in a series of analyses of sunspot records by the authors (1999-2006); Brooke's input is in theory of symmetries of non-linear dynamical systems.

First goal-directed decision procedure (including correctness proof) for the description logic SHOIQ, and thus the first such procedure for the web ontology language OWL-DL.Problem was known to be decidable, yet the existence of a goal-directed decision procedure was an open and actively researched problem for over 5 years.Procedure has been implemented in description logic reasoners, including Pellet and Racer, and is the first decision procedure for a NExpTime-hard logic to be implemented in a commercial system.Google-Scholar citations 84(33pa) for conference version (IJCAI 2005).

Experimental protocol using an immersive 3D computer graphics virtual environment for the treatment of 'phantom limb pain' (PLP) - a debilitating and often chronic condition experienced by many amputees.Novel approach to treating PLP, involving the presentation of a virtual limb, driven by movements of the remaining anatomical limb, in the space of the phantom limb. Subsequent trials using the protocol [Howard-2] have demonstrated patient benefit.Has potentially important clinical and social benefits for treating chronic PLP and has led to much interest from other researchers and the media.

System for tracking the human body in real-time, using images from multiple video cameras, but without the need for any special markers attached to the subject - with potential applications ranging from film production (virtual characters), through unencumbered human-computer interaction, to gait analysis for medical diagnosis.Uses a combination of intelligent preprocessing and recursive probabilistic estimation of the parameters of a 3D kinematic model to achieve accurate motion capture of whole-body pose/posture.Demonstrates the effectiveness of the approach using examples of complex and fast-moving human motion.

Rigorous formal approach to the description (definition and composition) of loosely coupled applications.Builds on earlier approaches [Freeman-4] by developing a formal framework for the description of general coupled models, and an associated CSP-based description language (NOAH).Demonstrates effectiveness of the approach through the description of two coupled applications: a prototype predator-prey model and a real bio-medical fluid-structure interaction problem.

Use of semantic web technologies to generate code for software components for the assessment of information quality (IQ), and to assist in their discovery and re-use.Key contribution is mapping between semantic world of the IQ ontology and technical world of web service implementations.Demonstrates practical value experimentally through a transcriptomics case-study.Resourcing issues are key in information quality, where often the solution is clear but prohibitively expensive. Re-use and adaptation of IQ components helps to maximise the return on investment.

First successful solution to the challenging problem of allowing two users to manipulate a common object in a virtual environment concurrently, with both visual and haptic feedback.Presents a generic peer-to-peer architecture that maintains synchrony between users' experience, whilst providing responsive feedback via local modelling of both discrete events (eg collisions) and continuous (Newtonian physics) motion.Evaluated experimentally in extensive trials involving a cooperative assembly task, demonstrating equivalent performance, both qualitatively and quantitatively, in high-latency (transatlantic) WAN and low-latency (intra-laboratory) LAN experiments.

Scheme for extending the expressiveness of Service Level Agreements (SLAs) by linking terms using appropriate analytical expressions to capture the constraints embedded in the SLA.Aims to minimise the overhead of renegotiation in service-oriented architectures by allowing service providers more flexibility in meeting the terms of SLAs. First to challenge the current predominantly static view of the terms of agreement in an SLA, paving the way for the specification of dynamically changing SLAs.Provides illustrative examples, and presents the results of extensive simulation-based evaluation.

Scheduling policy for executing workflows - represented as directed acyclic graphs - on heterogeneous systems where there are no accurate predictions for the execution time of tasks in the graph.Proposes a low cost approach that invokes rescheduling selectively at run-time, and is practical to apply in a Grid environment.Presents results of a systematic simulation-based evaluation, demonstrating that the performance gains are comparable with those obtained using a fully dynamic allocation strategy.Google-Scholar citations 23 including earlier conference version.

Theoretical foundations and implementation of retrenchment - a judicious weakening of the proof obligations of typical refinement theories, designed to accommodate a wider range of requirements issues.First of a series of reference papers: makes the case for retrenchment, describes key technical aspects, reviews achievements to date, and identifies further technical issues to be explored.Exemplifies retrenchment using two case studies: on digital control redesign, and on the Mondex Purse.

Low-power techniques applied to reducing switching activity at different levels of the design hierarchy: from architectural to logic.Provides insight into effectiveness of power-saving techniques at different levels and demonstrates that the most significant savings result at logic level. In particular, a large contribution from multiplexer switching is identified.Demonstrates a factor of two power improvement in power dissipation with the dominant contribution from optimising the logic.

Model-based approach to Principal Components Analysis, and its application to the analysis of microarray gene-expression data.Shows how the noise inherent in microarray data can be accounted for correctly, through a probabilistic model that considers the variances associated with each gene's expression level in each condition, leading to improved estimates of gene-expression profiles.Presents results of applying the method to data obtained in a study of human foetal development, demonstrating reduced profile noise and tighter, more coherent clusters of profiles.Software implementing the method is now available through the widely used Bioconductor package (

Probabilistic method for estimating the activity of transcription factors from microarray data - a vital step in decoding interaction networks in cells. First to combine ChIP-chip data with microarray data in a genome-wide manner, allowing temporal predictions of gene-specific transcription factor activities and, given the probabilistic formulation, their associated credibility intervals.Presents results of applying the method to two yeast cell datasets, showing that predictions from the model are consistent with the underlying biology, and offering novel quantitative insights into the regulatory structure of the yeast cell.Associated software has attracted 46 registered downloads since January 2006.

Method for inferring parameters for non-linear discriminant analysis.Provides a new interpretation of Fisher's discriminant, which allows a Bayesian treatment. Builds on breakthrough work [Lawrence & Schoelkopf, International Conference on Machine Learning, 2001] which shows how non-linear discriminant analysis can be achieved in the presence of corrupted labels.Associated software has attracted 1350 registered downloads since December 2004.

New method for non-linear dimensionality reduction, of broad relevance to machine learning applications. Advances both theory and practice by describing a true non-linear generalisation of probabilistic Principal Components Analysis.Practical impact of the work evidenced by 1183 registered software downloads since January 2004, and commercial licence to the University of Washington. Widely used in graphics and vision for human motion data (follow-up work published at SIGGRAPH, CVPR and ICCV).Has resulted in 22 international invited talks (including at Columbia, Berkeley, Microsoft Research, Intel Research, IBM Research, EPFL and Google Research).Google-Scholar citations 54(15pa) including conference versions.

Client for lightweight PDA devices that provides full access to the functionality of a large-scale computational Grid.The first client to provide pervasive access to Grid resources without the need for special portals; problems of implementing PKI security on PDAs are solved.Concept of a lightweight client for Unicore was later adopted by Intel (Hans-Christian Hoppe, Principal Engineer) in the UniGrids project. Unicore is a major Grid middleware used in the DEISA European Supercomputing project.

Architecture for combining local rendering of data on graphical processor units - using a novel rendering method - with high-fidelity rendering on HPC clusters.Supports selection of hardware for visualisation pipeline on the basis of data locality and functional requirements of visualisation, such as high-fidelity vs frame-rate.Enables the rapid increase in capacity and functionality of GPU cards to be utilised for processing very large data sets in e-Science.Assessment of methodology given by frame rendering rates against data size.

Methodology for systematic exploration of behaviour changes induced by trajectories in parameter space in simulations of stellar dynamos. Demonstrates a paradox, that two different solution behaviours believed to result from the same parameter can occur in two non-intersecting regions of parameter space, therefore a single parameterisation cannot explain both.Suggestions from visualisation of simulation results independently confirmed by Fourier analysis of time-series.Brooke invited to Isaac Newton Institute programme on Magnetohydrodynamics of Stellar Interiors (2004) to discuss issues following from paper.ISI citations 10.

Software system for generating artificial gene networks of arbitrary size, topology and dynamics, allowing objective evaluation of methods for analysing gene-expression data.Proposes a generic approach to modelling the kinetics of gene transcription, which conforms to the known properties of the phenomenon; adopts several different models of network topology.Datasets made publicly available ( have been used extensively to evaluate gene expression analysis methods, eg identifying differentially expressed genes [Genomics 89(4), 2007, 460-470], gene network inference [IEE Proceedings on Systems Biology 153, 2006, 247-256], simulating microarray data [BMC Bioinformatics 7, 2006, 349].Google-Scholar citations 51(11pa).

Comparison of global numerical optimisation algorithms for application in estimating the parameters of biochemical pathway models from experimental data.Uses a simulated data set obtained from a small biochemical system which has non-linear dynamics; the parameter estimation problem is in 36 dimensions and demonstrates practical issues common in systems biology inverse problems.Provides the results of extensive experimental evaluation, comparing seven different global optimisation methods, and concluding that the most successful approaches are based on evolutionary search strategies. ISI citations 50(13pa); Google-Scholar citations 80(20pa).

Systems biology markup language (SBML), designed to facilitate interoperability between data sources and software packages. Describes the main elements of the language and provides practical examples.SBML has been adopted by over 100 systems biology software packages (, and the SBML Forum has become a vibrant community of computational systems biologists. Mendes is one of the original proponents of SBML; he released the first simulator supporting SBML and contributed to its specification.ISI citations 209(46pa); Google-Scholar citations 395(87pa).

Architecture, algorithms and functionality of the COPASI biochemical pathway modelling and simulation software, illustrated with practical examples. Proposes objective criteria for deciding between stochastic or ODE-based representation and simulation of biochemical reaction networks, and describes an automated procedure for converting ODE-based models to a stochastic representation.Related software has been downloaded ( by over 5000 users (distinct IP addresses). There are currently three commercial licensees. COPASI is one of the major packages for computational systems biology.

Results from a chip designed to both exercise the Balsa asynchronous synthesis system [Edwards-4] and investigate the security implicit in certain asynchronous circuit implementations.First test of an asynchronous synthesis tool on this scale of design.Novel exploitation of an existing technique to investigate security issues.Design was manufactured and functioned with no post-layout simulation/analysis. Security issues were tested for commercial application and offered more security against differential power analysis attack than current smartcards.

Extension of earlier work in asynchronous cache and memory system design with the addition of important performance-enhancing features such as an asynchronous 'copy-back' mechanism, write-buffering and a victim cache.Earlier innovations, such as an independent line-fetch process, are incorporated; several other processes can run at their own rates whilst still guaranteeing functionally correct operation. This is the first description of such a system for an asynchronous processor.The work is supported by extensive simulation of the effect of the cache on real program operation and analysis of the cache behaviour.

Application of machine learning techniques to select an appropriate garbage-collection algorithm given a single profile run of an otherwise unknown (Java) program.Evaluation shows 5% average improvement in performance compared with using the default garbage-collection algorithm; an ideal selector could deliver 17% average improvement.Evaluations employ a wide variety of test-cases, including the highly demanding DaCapo benchmarks.

Comparative evaluation of two different methods for requirements analysis and specification, for use in developing reusable generic problem models.Provides new insight into the use of generic problem models in domain analysis and model development, a new model-driven approach to software development.Provides a thorough experimental evaluation of the two approaches using a real-world case-study.

Text-to-speech system for Arabic, which uses fine-grained linguistic knowledge to diacriticise the input text and to assign an appropriate intonation contour.One of the first systems to attempt to impose an intonation contour on synthesised Arabic.Contains reports of intelligibility tests that show it is at least as accurate in assigning diacritics as the leading commercial software (though over a more restricted vocabulary) and that the intonation contours are perceived as being appropriate for the utterance types.

Extension of SHIQ logic and associated reasoning technique to complex role inclusions (including correctness proofs); shows undecidability of stronger extension.SHIQ underpins the web ontology language OWL; it is extended to overcome problems reported by users in various papers - the solution is technically involved.OWL 1.1 (a W3C member submission, subject of new W3C working group) is based on this extension. Two leading description logic reasoners, Pellet and Fact++, support the new features and algorithm. Has influenced work on other reasoning techniques.Google-Scholar citations 45(16pa) including conference versions.

Application of description logic reasoning techniques in domain of heterogeneous biological pathway data, represented in the BioPAX standard community reference ontology.Shows how the OWL ontology can be used effectively to answer sophisticated queries not forseen by the designers. The BioPAX ontology was developed as a 'schema' for exchange; we show the semantic and modelling weaknesses in the ontology and their 'fixes' that add value, whilst retaining the integrity of the instance data.First to demonstrate this approach with this data.

Empirical evaluation of different algorithms for estimating the parameters of a homography from a set of noisy point correspondences between a pair of images - a common problem in computer vision.Demonstrates that the failure of simple methods to treat measurement errors and their correlations correctly results in 'unexpected' loss of accuracy in parameter estimates and an inability to reliably characterise performance.Provides practical guidance on the trade-off between accuracy and complexity, given a knowledge of the noise regime.

Analysis of transactional parallel programming style applied to Lee's routing algorithm for layout in electronic CAD.First known analysis of the behaviour of coarse-grain transactions viewed from algorithmic perspective; demonstrates that 'easy' programming comes with severe performance penalty.Proposes application-level approaches to improving performance, and evaluates their effect using a realistic test-case of circuit layout and a simple iterative model of transactional execution.

Design, implementation and trialing of an immersive 3D computer graphics virtual environment for the treatment of 'phantom limb pain' (PLP) - a debilitating and often chronic condition experienced by many amputees.Builds on [Howard-1] presenting the results of case-studies based on the experience of treating three amputees, each over a three week period. Initial results suggest a beneficial treatment effect.Several other institutions wish to share the technology we have developed, for both standalone replicative studies and future collaborations. Work has stimulated a large amount of media interest, leading to newspaper and magazine articles (The Times, Times Higher Education Supplement, The Guardian), and radio and TV interviews (BBC, Discovery Channel) and on-line media (EurekAlert, BBC Focus, Science Daily, PhysOrg).

System for automatically analysing video sequences to extract camera parameters, using these to guide a semi-automatic 3D computer graphics reconstruction of the scene.Provides practical examples of reconstructions obtained using a consumer-quality video camera.Related software (ICARUS) is publicly available, with more than 10,000 distinct downloads. Software was subsequently licensed to The Pixel Farm and incorporated in their products, which are now used extensively in film and TV post-production, including high-profile productions such as Cold Mountain and Harry Potter ( 'User Stories').Won 2nd Prize in Best Paper Awards for the journal. Google-Scholar citations 25.

Method for combining first-order logic theorem-proving with decision procedures, for axioms containing quantifiers.Addresses a fundamental problem that arises in using theorem-provers for software verification and program analysis - when some data types are axiomatised using first-order axioms containing quantifiers. The standard approach is for the theorem-prover to guess ground instances of such axioms. This is the first demonstration that a more direct approach to working with decision procedures and quantifiers is possible.

Technique for trajectory-based motion recognition, able to classify hand activity patterns associated with repetitive manual tasks.Provides a practical framework for analysing object handling tasks, using a combination of skin and hand-held-object segmentation with trajectory modelling.Thorough comparative evaluation using 6 different motion indexing schemes on two different datasets.Technology is currently being exploited as part of a KTP programme with industrial partner Quick Solutions Ltd (Robin Howie, Technical Director). Patent applied for.

New algorithm to find Minimal Sample Uniques (MSUs) - important to statistical agencies for estimating disclosure risk in published datasets.Extends [Ford-1], identifying new properties of MSUs, which are used to prune and traverse the search space in a recursive algorithm that exhibits better scaling with data dimensionality than existing methods.Evaluated experimentally using both real and synthetic data, demonstrating orders of magnitude speed-up compared to a state-of-the-art method.Resulting SUDA software (currently an earlier version) is used by the UK Office for National Statistics (Jane Longhurst, Joint Head of Statistical Disclosure Control) and the Australian Bureau of Statistics (Geoff Lee, First Assistant Statistician); Statistics New Zealand and Statistics Quebec are currently evaluating.

Experimental study of the potential of Graphene, an exciting new material, for use in spin electronics with application in new nanoscale electronic devices for data storage and processing.Provides first experimental evidence of spin injection from ferromagnetic electrodes into Graphene, a necessary requirement for spin electronic devices. Three recent independent publications have confirmed that spin injection occurs.

Outcome of a major collaborative project between Physics and Computer Science in the Centre for Mesoscience and Nanotechnology.Google Scholar citations 11(11pa).

Solution to a major open problem in translation-based automated reasoning for modal/description logics, allowing relational properties such as transitivity - which are difficult for first-order theorem-provers - to be handled efficiently.New axiomatic translation principle replaces 'difficult' relational properties by simpler theories, which are more readily automated and efficiently solved.Soundness and decidability are automatic, and the resulting proof of completeness sheds new light on standard logics and reasoning methods.Ideas led to resolution decision procedures for the description logics which underpin web ontology languages OWL-DL and OWL 1.1.

Efficient implementation of a biologically motivated model of associative memory.Takes an established model of associative memory - Kanerva's Sparse Distributed Memory (SDM) - and simplifies it to yield a much lower-cost implementation, based on N-of-M codes, that can be implemented using spiking neurons and binary unipolar synaptic weights with a simple Hebbian learning rule.The new SDM is analysed theoretically and experimentally (by numerical simulation) and is shown to be efficient, robust to input noise, and more 'biologically-plausible' than Kanerva's original SDM.

Detailed comparison of five implementations of the web services resource framework (WS-RF) and web services notification (WS-Notification) specifications.Identifies both commonalities and differences, and provides insight into effective implementation strategies.Important for restoring community confidence in the then developing WS-RF and WS-Notification suite of specifications, leading to subsequent ratification of these specifications in April 2006 as OASIS standards.Google-Scholar citations 35(14pa).

Simple model of the learning mechanism in animals for expected-time-to-reward in delayed-reward reinforcement learning, based on a spiking neuron model and the Sutton-Barto model.First attempt to explain, via an ab initio model, the 'scalar property' which occurs in a wide range of interval-timing experiments across many animal species, including birds, non-human mammals, and even fish.Successfully accounts for many aspects of the experimental data.

Efficient method for computing similarity between pixel scale-orientation signatures, applied to identifying structures of interest in mammograms.Shows that a variant of the transportation algorithm provides a natural but computationally expensive basis for measuring the similarity between pixel signatures, and presents a method for learning a non-linear transformation (for a particular class of images) that projects pixel signatures into a space where Euclidean distance approximates transportation distance.The method is evaluated on synthetic data and real mammograms.By dramatically reducing computational requirements, the method opens the way for more widespread use of pixel signatures.

Fundamental principles underlying the Automatically Tuned Linear Algebra Software (ATLAS) library.Instantiates new paradigm for high-performance library production and maintenance, termed AEOS (Automated Empirical Optimisation of Software), that has been created to allow software to keep up with the rapid hardware advancement inherent in Moore's Law.Applies this paradigm to linear algebra software, with emphasis on the Basic Linear Algebra Subprograms (BLAS), a widely used, performance-critical, linear algebra kernel.ISI citations 93(13pa); Google-Scholar citations 348(52pa). Related ATLAS software downloads currently exceed 3,000 per month.

Software tools, programming models and run-time services for scientific Grid computing, developed as part of a collaborative effort to simplify the use of distributed heterogeneous resources.Describes vision and strategies for base Grid software architecture, strategies and tools for construction of applications, and development of innovative science and engineering applications that exploit these new technologies.Provides a concise discussion of key issues in collaborative grid computing, especially barriers to uptake by ordinary scientific users.ISI citations 45(7pa); Google-Scholar citations 206(35pa).

Concepts and methods for automatic performance tuning for dense and sparse linear algebra computations.Describes general ideas (beyond those in [Dongarra-1]) for design of numerical software that can adapt to the execution environment.Discusses automatic performance tuning for direct and iterative sparse linear algebra problems which must occur partly at run-time because the structure of sparse matrices, not available at compile-time, crucially determines the performance of different storage schemes.Google-Scholar citations 36(13pa).

Multi-sensor image fusion algorithm that overcomes significant problems with artefacts and feature selection inherent in earlier methods.Uses a multi-resolution gradient-based approach which is simpler to implement and has lower memory requirements than state-of-the-art methods.Demonstrates the superior performance of the new method, using both psychophysical testing and objective performance measures on a large corpus of multi-sensor images.Work was carried out in collaboration with BAE Systems, and provides a practical approach for real image fusion applications. Google-Scholar citations 23.

Large-scale study of the effects of sensor noise on the performance of image fusion algorithms.Introduces objective quality measures for fused images derived from multiple sensors, and uses them to compare the performance of 17 existing image fusion algorithms in trials involving many thousands of images.Shows for the first time that objective evaluation of image fusion algorithms can be usefully applied to practical problems, offering a real alternative to costly subjective trials.Provides valuable insight into the performance of fusion algorithms under a range of noise conditions.New journal in information fusion. Google-Scholar citations 22.

Objective method for evaluating multi-sensor image fusion algorithms, exploiting knowledge of visual perception.Extends an existing method for evaluating image fusion algorithms [Petrovic-1], using a model of the human visual system to predict the visibility of fusion errors.Extensive experimental evaluation, using a broad range of scenarios, demonstrates improved agreement with the results of psychophysical testing.Makes a significant contribution to the important practical problem of generating perceptually optimal displays which fuse information from different sources, allowing real-time optimisation of fused displays.

Comprehensive framework for characterising the performance of multi-sensor image fusion algorithms.Provides an in-depth analysis of fusion performance, quantifying information contribution per sensor, fusion gain, fusion information loss and fusion artefact, significantly extending existing methods [Petrovic-1, Petrovic-3].Presents results of a large-scale comparison of fusion algorithms, providing insight into subtle but important effects not obvious by direct observation of the results.Provides a scientific basis for the evaluation and development of fusion algorithms, where currently qualitative methods dominate.

New statistic and series of conditions, sufficient to establish the global and local exponential stability of 'attractor' neural networks.Generalises existing results for the convergence of Hopfield-like neural networks - which solve problems by a dynamical process of convergence to fixed points - providing conditions for convergence and allowing the size of the attractor region to be calculated.The results are illustrated using practical examples.ISI citations 27; Google-Scholar citations 45.

Analysis of the stability of hybrid control for a bilinear system with discrete jumps, modelled as a Markov process with time delays. Relevant to many real-world applications.A closed-loop controller is shown to be exponentially stable in a mean-squared sense. Sufficient conditions are established to guarantee the existence of the desired robust controllers in terms of solutions to linear or quadratic inequalities.Extends established methods to include uncertainties in the parameters, unknown state time-delays and unknown non-linear deterministic disturbances.Theory is tested through simulation.ISI citations 23; Google-Scholar citations 40.

Unified treatment of two standard approaches to representing temporal dynamics in neural networks.Establishes invariant properties of the transformation between dynamical systems based on weighted sum of non-linear functions (eg Hopfield-Tank) and non-linear functions of weighted sums (eg recurrent backpropagation networks).Provides proofs of correspondence between properties of the two models, including global convergence and global attractiveness, providing a new approach to analysing the properties of such systems.Application of the approach is illustrated with an example.

Decomposition algorithm for solving the support vector machine (SVM) training problem for very large data sets.Presents an iterative algorithm for decomposing the constrained quadratic programming problem involved in SVM training, which has provably lower computational cost and faster convergence than previous decomposition algorithms.Important in real-world applications of SVM, where it is often not practical to optimise over the whole training set simultaneously.

System for tracking complex human motions from multiple views.Tracking is achieved by propagating candidate poses with local and global dynamics in a low dimensional latent space learnt using a Back-Constrained Gaussian Process Latent Variable Model.Primitive units of motions are discovered from the latent space representation of training motion sequences by using a clustering algorithm that takes into account the uncertainty associated with the low dimensional embedding.Provides thorough experimental evaluation.

Overview of the application of the web ontology language OWL to biomedical ontologies and research, summarising cross-disciplinary work with the biological sciences community.Provides examples of the practical use of OWL, and related tools, to support significant biological research.Brings together unusual design patterns, published separately, for representing sequences and lists, exceptions, and data types.Work that contributed to the OWL 1.1 specification, which was a specific response to user experience with OWL 1.0.

Method for assessing bone-mineral density (BMD) by image analysis of panoramic dental radiographs, with potential application in screening for osteoporosis.

Uses a modified Active Shape Model to solve the difficult image analysis problem of measuring cortical bone thickness in radiographs of the jaw, as a surrogate for BMD measurement using dual energy x-ray absorptiometry.

Presents the results of extensive experimental evaluation, demonstrating that widths measured automatically are significantly correlated with BMD and exhibit less variability than manual measurements made by different experts.

Currently being trialed in a clinical setting by two dental equipment suppliers.

Reference architecture (S-OGSA) for the development of Semantic Grid applications, with examples of how it can be used.First comprehensive Semantic Grid reference architecture where semantics is treated as another Grid resource. Inspired the Service-Oriented Knowledge Utility initiative in FP7.Has attracted international attention, with several invited talks, keynotes, tutorials and seminars around it organised by third parties.

Demonstration that e-Science tools are sufficiently mature to systematically process complex biological data, providing a framework for the explicit declaration and re-use of experimental methods.Highlights problems suffered by manual analysis of microarray and quantitative trait loci (QTL) data for the discovery of candidate genes underlying complex phenotypes. Then shows how automated approaches provide a systematic means to investigate genotype-phenotype correlations.Methodology is applied to a use-case of resistance to African trypanosomiasis in the mouse, identifying Daxx as a strong candidate gene within the QTL.

Investigation of the impact of solving systems of linear equations using Gaussian elimination in single-precision then using iterative refinement of the solution to double-precision accuracy.Experimental results on IBM Cell processor demonstrate substantial performance gains. The technique is likely to be equally effective for other modern processors which offer single-precision performance that is much higher than double-precision performance.Precursor for research into other uses of mixed-precision numerical computing.

Study of methods for improving website accessibility for visually impaired users.Proposes a set of principles for effective web mobility, based on an analysis of the problems faced by blind websurfers, and the strategies they use to overcome them - using analogies from navigation in the physical world.Presents an experimental system based on these principles, using manual mark-up of 'mobility objects' together with a browser plug-in. Evaluates the approach with both sighted and visually impaired users, showing significant improvement in the navigation performance of visually impaired (but not sighted) users.Establishes a research agenda for automated enhancement of accessibility.

Comparison of different strategies for providing webpage summaries to aid navigation for visually impaired users.Reviews existing approaches to summarisation and outlines the results of a formative evaluation of different strategies, involving 42 subjects, leading to a proposal for a simple approach based on initial sentences.Describes an implementation of the approach as a browser extension, and presents the results of a small summative evaluation which demonstrates a clear advantage for the new approach when compared with existing methods.

Transcoding method for improving the accessibility of websites for visually impaired users, without compromising design or incurring significant authoring overhead.Describes an approach to inferring the 'structural semantics' of a website, by linking the elements defined in its cascading style sheet to a generic ontology describing functional significance. This allows a browser application to present pages from the site in a manner appropriate for, and under the control of, the visually impaired user.Presents two case-studies, and the results of a technical evaluation using 40 sites of different types.Led to collaboration with IBM Tokyo.

Ontology representing the concepts and relationships necessary to model the structural and navigational organisation of web pages, as a basis for creating accessible websites.Published in the journal's 'Ontology' section, the ontology itself - comprising 2248 lines of OWL RDF, available at - forms part of the peer-reviewed material. Offers the potential to provide a flexible and scalable basis for transcoding webpages to improve accessibility for visually impaired users. Ontology and related tools have been adopted by several European groups, including De Troyer (VUB), Rammer (UInnsbruck/DERI) and Paliza (U Carlos III de Madrid).

Method for efficient look-up of terms in large biomedical dictionaries (eg BioThesaurus) allowing for variability in term usage.Uses logistic regression to learn a 'soft' measure of term similarity, rather than adopting one based on design.Tested on two large, publicly available datasets; the method outperforms other soft matching methods (Levenshtein distance, Jaro-Winkler measure, Hidden Markov Model, SoftTFIDF) improving, for example, top-rank lookup performance from 62.5% to 69.3%.Approach adopted by the BOOTStrep project (, as part of its user interface for accessing Biolexica, and will be also offered as part of the services of NaCTeM.

Automated approach for configuring component-based systems, using a genetic algorithm to optimise system performance.Takes a holistic approach to complex system design, treating human-computer interaction as part of the system, defining objective measures of performance, and optimising under specific 'mission-critical' scenarios.Method is evaluated using a case-study based on a naval command and control system, demonstrating the feasibility of the approach, quantifying the improvements in performance, and exploring the trade-off between different aspects of performance.

Further development of [Neville-2] and [Neville-3], showing how the idea of information reuse in neural nets can be extended to generate a richer set of generalisations.Introduces a framework for categorising 'orders' of generalisation, showing how a general form of second order generalisation can be achieved via symmetry transformation of the weights of a trained network, and third order generalisation by using data generated by such derived networks to train networks to perform their inverses.Validates the approach experimentally, measuring the accuracy and isometry of the functions approximated by the derived networks.

Enhanced data recovery in patterned-media magnetic storage systems, using advanced coding techniques.Investigates, via detailed end-to-end simulation, the influence of pattern irregularity on read error rate - an important practical and commercial issue, since the limitations of current methods for fabricating patterned media are seen as a barrier to realising their potential for delivering very-high-density storage.Demonstrates that, under realistic manufacturing and operating conditions, the effects of pattern jitter are significant, but can be alleviated using low-density parity-check coding - achieving acceptably low error rates at recording densities up to 1.6 Tbit per square inch.

Solution to the problem of linking OWL ontologies using the well-understood notion of e-connections. Provides first integrated algorithm for (a refinement of) e-connections, including extensions of OWL-DL, an OWL reasoner (Pellet) and ontology editor (Swoop).Tackles an important issue in the flexible design and reuse of ontologies, improving significantly on the owl:imports mechanism for linking ontologies.Has stimulated research on linking ontologies, eg distributed description logics and package-based description logics.Google-Scholar citations 65(22pa) including ISWC 2004 version.

Theoretical analysis of 'diversity' in ensemble learning, leading to a practical algorithm for managing diversity.Provides proofs of how existing theoretical frameworks are related, using this to develop a practical algorithm for explicitly managing diversity by including a penalty derived from the bias-variance decomposition.Demonstrates, though a large-scale experimental evaluation, that the performance of the new method is competitive with state-of-the-art ensemble learning methods.First to make clear the distinction between implicitly encouraging diversity and explicitly managing it. Paper was main contribution of Brown's PhD, which won the BCS Distinguished Dissertation Prize, 2004.

Survey and taxonomy of 'ensemble' learning machines.Provides theoretical insight by unifying two previously disparate models, and uses this to present a partial solution to the major open question of how to measure 'diversity' in ensemble learning.Headline article for a journal special issue on this topic.Has wide applicability, as a 'meta-learning' system; the concepts apply to any learning technique, including feature selection as a special case.Google-Scholar citations 48(19pa).

Biologically plausible neural network model for visual perception of geometrical patterns.Uncovers, for the first time, the intrinsic relationship between two well-known visual perception tasks: the spiral problem and the inside-outside relation problem.Provides a thorough experimental evaluation of the approach, using various stimuli, demonstrating that the performance of the computational model matches, qualitatively, that of human subjects.

Modification of a biologically plausible neural network, introducing dynamic links to encode perceptual grouping rules for image segmentation.Provides a simplified algorithm for fast computation which enables the approach to be applied in a real-time image segmentation system.Provides a formal analysis of behaviours of the dynamically coupled neural network.Provides a thorough experimental evaluation of the approach in various benchmark and real-world tasks.ISI citations 13; Google-Scholar citations 20.

Methods for threshold setting and on-line threshold update in a real speaker verification system.Provides a thorough experimental evaluation of the techniques under different conditions, using public-domain speech corpora. The techniques were deployed in ISIS - a multilingual spoken dialogue system - as a prototype of the Hengseng on-line banking and financial system ( citations 21.

New technique for robust adaptive smoothing using two different discontinuity measures simultaneously.Practical approach for solving a known termination problem in adaptive smoothing; our technique is insensitive to the number of iterations of adaptive smoothing.Provides a formal analysis of stability and behaviour of our algorithm, theoretical connection to the anisotropic diffusion framework, and a thorough experimental evaluation of the approach for benchmark and real tasks.

Statistical model of face shape and appearance, for tracking faces in single or multiple views whilst they undergo large changes in pose.Shows that a small number of coupled linear models of 2D shape and appearance can capture face appearance over a wide range of viewpoints and provide a basis for: tracking in one or more simultaneous views, estimating 3D orientation, and predicting new views of a given face.Used by spin-out Genemation (John Bickley, Chief Executive Officer) to predict facial appearance of an individual in 3D from a single 2D photograph.ISI citations 14; Google-Scholar citations 35.

Statistical model of appearance that captures the way faces change as they age.First to demonstrate that using a low-order polynomial model it is possible both to predict age from a face image, and synthetically age a face.Results of experiments demonstrate potential utility of modelling ageing effects, including improved recognition performance.Used by spin-out Genemation (John Bickley, Chief Executive Officer) to generate synthetic faces.ISI citations 13; Google-Scholar citations 43.

Efficient algorithm for matching a statistical model of appearance for objects of a given class to new images of the same class.Shows how models with possibly hundreds of parameters can be matched rapidly and accurately, by learning off-line to estimate the shape of the search space.Widely adopted in practical applications, particularly for faces and medical images; used by two spin-out companies - imorphics (Graham Vincent, Chief Technical Officer), for medical image analysis, and Genemation (John Bickley, Chief Executive Officer), for face modelling.ISI citations 142(22pa); Google-Scholar citations 1075(172pa). Virtually every major computer vision conference since publication has included papers developing or using the method.

System for lip reading from video sequences using a range of derived features, including the parameters of an Active Appearance Model (AAM) tracking the mouth.Demonstrates experimentally that the visual information improves speech recognition performance, with the AAM parameters providing particularly strong cues.ISI citations 19; Google-Scholar citations 61(11pa).

Balsa language and design framework for asynchronous circuits, and a case-study demonstrating its use in synthesising a DMA (direct memory access) controller.Balsa is the leading asynchronous synthesis tool available under a GPL licence. It comprises a complete tool-chain: hardware description language, compiler, simulation tools and netlisters interfacing to commercial CAD tools with routes to silicon fabrication foundries.Attracts 100 downloads/month, 300 for new releases of the software, and is widely used internationally. It underpins several funded EPSRC and EU research projects, often in collaboration with other institutions both in the UK and overseas.Google-Scholar citations 34.

Invited review of the state-of-the-art in asynchronous network-on-chip design, including original work as detailed below.Major contribution is a detailed case-study of the ASPIDA network-on-chip, addressing architecture, testability and performance.Demonstrates the viability of the CHAIN approach/architecture, by measurements of a test chip that was fabricated.University spin-out Silistix (John Bainbridge, Chief Technical Officer) is exploiting the CHAIN architecture. The company received samples of the ASPIDA chips and cite the results in their sales publicity.

First application of database repair techniques for reconciliation of data-level heterogeneity. Data reconciliation is a major challenge in data integration. Reliable and cost-effective tool support is not yet available. Paper proposes a novel form of support for one particular class of data reconciliation problems.Thorough empirical evaluation of these techniques, identifying incompleteness in existing repair techniques when applied in practice, and outlining a preliminary solution.

Comprehensive model for personalisable hypermedia and a system that instantiates it. In contrast with related work, the model is both abstract and rigorous.Explains why and how hyperlink-based systems (HLBSs) are tributary to retrieval and rendering engines (database systems being an example of the former, and browsers of the latter).Brings out the unique characteristics of HLBSs at a higher, more formalised level of abstraction than previous work.

General technique for monitoring execution of query plans using self-monitoring operators.Previous work was ad-hoc and problem-specific, requiring extra tooling to implement monitoring. The new approach subsumes most previous monitoring schemes by seamlessly encapsulating the additional functionality into physical-algebraic operators. This general result has wide applicability in adaptive query processing.Provides a thorough experimental evaluation.

Comprehensive and formal characterisation of a spatio-historical data model.Spatio-historical databases were previously formalised using logics or constraint theories. While these offer rigour, they do not lead naturally to database implementations. The algebraic approach adopted here is both rigorous and informative in this sense, as demonstrated by the implementation described.Provides an extended example application.

Theoretical development that provides a mathematically rigorous proof of convergence of the Feedback Guided Dynamic Loop Scheduling (FGDLS) algorithm.Uses techniques of applied analysis to establish necessary conditions for the convergence of the FGDLS algorithm, an open problem in the literature.

Generative probabilistic model of context-free grammar (CFG) parse trees.Approach is an extension of probabilistic CFG (PCFG), with non-terminal symbols augmented by latent variables. Instead of manually crafted non-terminals, fine-grained symbols are automatically induced from a parsed corpus.Tested on a large standard corpus, showing that the automatically induced model gives a performance of 86.6% (F-Score), which is comparable to that of a PCFG parser created using extensive manual feature selection.The work has already been followed up by influential groups in statistics-based natural language processing, including UC Berkeley, Carnegie Mellon, Johns Hopkins and University of Amsterdam.

Computer-assisted assessment system for marking short text answers.Describes the use of agglomerative clustering to group similar free-form text responses to summative assessment questions, for presentation to a human marker.Presents experimental results using real examination scripts, demonstrating the potential for improved consistency and significant reduction in marking effort, whilst noting some of the limitations of the approach.System is in routine use for selected examinations across the University, and is being commercialised by spin-out Assessment21, which has seed-corn funding and commitment in principle from its first major user - a prestigious professional certifying institute.

Proof that a multilayer neural network with one common neuron suffices to approximate any function on a discrete space to any accuracy.Main result shows that creating neural networks for function approximation on discrete spaces is much simpler than for function approximation on continuous spaces.Behind this surprising result lies the discovery of a discrete version of Kolmogorov's theorem: for any n-dimensional function f(x1,x2,...,xn) on a discrete space, there exists a one-dimensional function g(y) such that f(x1,x2,...,xn) = g(w0+w1*x1+...+wn*xn).

Formalism for studying and characterising the relationship between programming features and design patterns, based on group theory and associated concepts.Establishes a foundation for pattern taxonomies, and for differentiating patterns from heuristics, rules, and arbitrary micro-architectures.Demonstrates the formalism on many well-known software patterns.Collaboration with James O. Coplien of Bell Labs, well-known for his work on C++ idioms, multi-paradigm design and organisational patterns.Paper extending this work has recently been accepted for publication in Communications of the ACM.

Architecture and implementation of GCF (General Coupling Framework) - a lightweight and flexible approach to the development of coupled models.Identifies a new paradigm for the description of loosely coupled applications, based on a hierarchical separation of concerns: description of individual models; composition of these models to form a coupled model; deployment of this coupled model on suitable hardware.The approach has been adopted by the CIAS Integrated Modelling group at the Tyndall Centre, UEA (Rachel Warren) and by the GENIE project group (co-ordinated by Tim Lenton at the Tyndall Centre, UEA), is being evaluated by other research groups, and has strongly influenced the software strategy of the UK Met Office (Mick Carter, Manager, Climate Research IT Applications).

New circuits that implement on-chip communication using ternary (3-state) logic, which is particularly suited to self-timed communication as it allows 0-null-1 to be transmitted on a single wire.Shows that the problems of complexity and high power associated with the earlier application of ternary logic to combinatorial functions can be avoided in on-chip communication, where there are potential advantages in terms of reduced wiring and power-saving. On-chip communication increasingly dominates the performance and power-consumption of modern chips.

Overview of an innovative self-timed Network-on-Chip (NoC) technology - the world's first fully-developed self-timed NoC fabric, supporting packet-switched on-chip communications using asynchronous circuits.Describes the technology behind Silistix Ltd (John Bainbridge, Chief Technical Officer), a company spun-out to commercialise tools for self-timed NoC design. Since its launch in December 2003 Silistix has attracted over 6M of venture capital funding from a consortium led by Intel Capital, and currently employs 14 engineers in Manchester with a sales office in California.ISI citations 11; Google-Scholar citations 55(11pa).

Methodology for the design, analysis and implementation of asynchronous communication systems, using a new inter-chip communications protocol (devised by Furber) as an example.Proposes direct translation from a Petri-net specification to a logic implementation, resulting in a provably correct and efficient implementation.Demonstrates that the protocol solves the problem of an idle system (inactive for power-efficiency reasons) being woken by both ends simultaneously, creating a difficult asynchronous arbitration problem. Efficient implementations of the protocol have proved elusive.Presents the results of extensive simulation-based testing showing significant performance gains over conventional implementation methods.

Novel prediction and evaluation methods for a particle filter-based human body tracker using multiple cameras.An advantage over previous work is that the evaluation of particles is performed in 3D instead of 2D, using volumetric reconstruction and Gaussian blob fitting, thereby achieving real-time performance.Experimental evaluation addresses important issues such as tracking long sequences and tracker self-recovery.

Real-time hand tracking system.Main contribution is the integration of annealed particle filtering with a variable length Markov model (VLMM). This is the first time VLMMs have been applied for visual tracking and is a substantial extension of previous work on VLMMs [Galata-1].Provides a thorough experimental evaluation, demonstrating significantly more robust performance than competing methods. Also presents a study of the effects of latency on the user interface for a video-based hand tracking system, a problem that has received little attention in previous research.Initial version presented at IEEE V4HCI Conference, 2005, receiving Best Paper Award.

Method for automatically acquiring stochastic models of complex and semantically rich behaviours such as ballet dancing or sign language. These models are used for recognition, prediction and behaviour synthesis.First paper to propose the use of variable length Markov models in the field of computer vision as a powerful and efficient mechanism for capturing behavioural dependencies and constraints.Provides thorough experimental evaluation.ISI citations 30(5); Google-Scholar citations 91(14pa).

Description and evaluation of several techniques particular to asynchronous microprocessors to yield very energy-efficient processing.Introduces several new micro-architectural techniques, such as the 'counterflow colour' to reduce speculation penalties. Presents discussion and analysis of asynchronous cache and memory system design; Amulet2e was the first chip in the world to include asynchronous on-chip cache. Describes patented asynchronous 'halt' technique.Evaluation is supported by measurements from manufactured silicon.

Description and analysis of 'SPCAM', an invention by the authors which reduces the energy cost of associative look-up in caches etc. Power-efficiency is increasingly important in current, highly integrated devices. The technique is also adaptive, trading power for speed.Results are generated through rigorous simulation and compared to other approaches to the problem.

Design and use of the Taverna workflow workbench, developed by the myGrid e-Science pilot.Introduces three-tiered architecture and functional model for open services without a common type system. Shows how the design is influenced by practical, environmental and user demands.38,000+ downloads, adopted by 200+ sites, 1000+ users: industry (Kooprime, Pfizer, Chimatica), projects (AstroGrid, EGEE2, EGEE3, NBIC, VLe), institutes (Roslin, EBI, NEBC, VBI) and teaching worldwide. Core component of OMII-UK.Google-Scholar citations 46(42pa).

Complete description of TAMBIS, a multi-source ontology-based integration system for the Life Sciences that made access to bioinformatics sources transparent by placing a mediator between the user and the sources.First Global-as-View ontology-based mediator system in the life sciences. The underpinning ontology has been widely adopted independently of the software.Highly influential: commercially redeveloped by Insilico Discovery (Brian Donnelly, Chief Executive Officer); academically redeveloped in the ComparaGrid e-Science project; led to the myGrid e-Science pilot, which built the Taverna workflow workbench.ISI citations 36(5pa), Google-Scholar citations 119(18pa).

Approach to supporting web-service discovery by life scientists.Proposes new suite of ontologies, represented in a modern ontology language, required to discover and adopt services by life scientists undertaking in silico experimentation. Developed and demonstrated in the practical context of the myGrid workflow middleware platform.One of the first papers to systematically describe an ontology for annotating real web services for a real application.A central component of the widely used Taverna and BioMOBY Life Science platforms.ISI citations 21(5pa); Google-Scholar citations 128(30pa).

Theoretical study of magnetisation reversal processes in hard disk storage media.Shows that both collective and incoherent modes can occur and that these significantly affect the switching speed (and thus the attainable data rate).Explains experimental observations of different values of damping constant in different experiments, and shows that rotating fields can produce faster switching than axial fields.Undertaken in partnership with Hitachi Global Storage Technologies during a funded sabbatical.

A 3D dynamic hierarchical micromagnetic model for predicting the performance of perpendicular magnetic recording media.First demonstration that vector models are required to predict the performance of perpendicular media, showing that writing data is easier and faster than was predicted by 1D models.Overturned the conventional wisdom (based on 1D modelling) that writing was difficult on perpendicular media, which had delayed their adoption by industry.Undertaken in partnership with Hitachi Global Storage Technologies during a funded sabbatical.

Analytical and numerical techniques for modelling the storage media in hard disks.Rigorous demonstration that hard disks can store data at densities of 1Tbit per square inch.Funded by Hitachi Global Storage Technology (HGST) and by the Information Storage Industry Consortium (INSIC). The work was presented to HGST and INSIC, and formed a central part of the latest INSIC hard disk drive roadmap (2003/4).ISI citations 24(5pa); Google-Scholar citations 23.

Unsupervised learning method for classifying object trajectories in video sequences, applied to learning motion activity patterns and anomaly detection in visual surveillance.Overcomes two main drawbacks with previous approaches to trajectory classification: reduced feature space makes on-line learning of motion patterns feasible even for long sequences; unsupervised approach removes need for training sequences often difficult to obtain in practice.Extensive experimental evaluation on three different datasets.

Method for extracting 3D measurements from structured light images of the face, for use in analysing the effects of orthodontic treatment.Combines structured light stereo with active shape model identification of landmarks to obtain a canonical description of patients' midline profiles; statistical shape analysis of the profile is used to detect significant differences/changes.Evaluated experimentally for a large sample of orthodontic patients, demonstrating the ability to detect subtle differences in profile due to the presence of a bite block, and suggesting sufficient sensitivity for use in auditing the outcomes of orthodontic treatment.

Scene interpretation system for traffic lane detection and classification using analysis of vehicle trajectories.Provides first known integrated approach for combining road detection and lane classification, building a scene classification model using a minimum of a priori knowledge.Presents results of experimental evaluation using image sequences obtained from uncalibrated pan-tilt-zoom cameras and information fusion from multiple views.Spin-out company PsiVision (Jose Melo, Founding Director) has been formed to exploit the technology.

Critical review of research on term identification - a key problem in mining the biomedical literature.Proposes a modular framework for term identification, clearly separating functional subtasks.Maps existing work on biomedical term identification into this framework, identifying the main challenges and areas for future work. Also relates the framework to emerging community challenges, such as BioCreAtIvE.ISI citations 16(5pa); Google-Scholar citations 41(15pa).

Machine-learning approach to inferring protein function (there are many proteins whose function is unknown) using hidden and weak evidence from the literature.Uses a set of SVM classifiers to recognise patterns of textual co-occurrence between proteins and terms indicative of their function.Describes the first known integration of SVM classification with a relational database, allowing efficient evaluation of decision functions.Method used in the BioCreAtIvE international challenge to predict protein function from text (20 groups worldwide, 2 from UK), demonstrating some promise in tackling an exceptionally difficult task.

Terminology-driven methodology for developing text mining applications using terms from the domain as an integrative means to access literature.Provides new conceptual framework by integrating recognition, normalisation and unification of terminological variants and acronyms, with extraction of relationships through term clustering.Implemented and extensively tested using a large corpus of Medline abstracts, improving both precision and recall of term and acronym identification when compared with existing methods.Adopted by industrial partner Lion Bioscience (Dietrich Rebholz-Schuhmann, Research Scientist).ISI citations 11; Google-Scholar citations 30.

Method for extracting and predicting relationships between entities based on their behaviour in literature, using a hybrid approach that combines lexical, syntactic and contextual similarities.Improves on previous approaches by identifying implicit and latent links between related terms through 'contextual pattern mining'. Evaluated experimentally using a large corpus of biomedical abstracts, demonstrating significantly improved recall when compared to a baseline method using term-co-occurrence.

Extension of the result of [Neville-2], showing how the function approximated by a complex multi-layer network can be transformed by transforming its weight matrix.Demonstrates theoretically that vertical and horizontal reflection and scaling of the function learnt by a network can be achieved by transforming its weight matrix and applying it to a network of the same architecture.Validates the approach experimentally.

Mathematical framework for transforming the function approximated by a simple neural network, by transforming its weight matrix.Demonstrates theoretically that vertical and horizontal reflection and scaling of the learnt function can be achieved by simple matrix manipulation of the weight matrix, without further learning, and validates the result experimentally.Provides a new approach to studying generalisation - a citing paper states that the principles will '... expand the generalization capabilities of traditional connectionist models' [Valle-Lisboa et al, Neural Networks, 2005].

Micron-scale cold-atom trapping and manipulation, using laser patterning of magnetic thin films.Outlines work using microscopic patterns of permanent magnetisation to overcome the limitations of other common approaches to cold-atom trapping - made possible through the use of magnetic thin film media produced at Manchester.Demonstrates the creation of micron-scale trapping structures with potential use in atom-trap arrays suitable for quantum information processing.ISI citations 13.

First systematic investigation of the effect of media geometry on the reliability of recovering stored data from very-high-density patterned-media magnetic storage systems.Uses extensive simulation, based on a 3D replay model, to explore the dependency of recovery performance on a range of media characteristics, providing a basis for optimising the performance of future storage systems.Has attracted industry attention, leading in 2006 to an invited presentation at a closed meeting of the Extremely High Density Recording group of the Information Storage Industry Consortium (INSIC), USA.

New modelling approach for predicting replay waveforms from patterned magnetic media for very-high-density storage, taking account of 3D geometry.Provides detailed analysis of the role that the media geometry plays in shaping the readout waveform in patterned-media systems, demonstrating, via extensive simulation, the importance of taking a fully 3D approach as opposed to conventional 2D modelling.Contributes to a body of work on patterned media for future storage devices that has attracted industry attention - see [Nutter-1].

Swoop, a web-oriented ontology editor which pioneered a hypertext/web-browser user-interface model, particularly well-suited to the web ontology language OWL.Introduces a browser-like interaction paradigm with systematic exploitation of hyperlinks, view source, history etc, and which serves as a test bed for effective interaction with new services (eg, explanation, profiling, modularisation, ontology diffing, visualisation, query). While Protege has dominated traditional ontology editors, Swoop explores a radically different approach exploiting the 'webiness' of OWL and the interaction with reasoners; many of its innovations have been adopted by mainstream editors, such as the commercial product TopBraid Composer by TopQuadrant (Dean Allamang, Chief Scientist).Google-Scholar citations 47(18pa).

Algorithms and user interface to support debugging of description logic (DL) ontologies.Describes practical methods for explaining and localising unsatisfiable concepts in DL ontologies - rather than simply detecting them using a reasoner - addressing a recognised problem previously believed to be intractable.Presents results of a user study, demonstrating the practical benefits of the approach and providing further insight into user needs.Work awarded best paper prize at the European Semantic Web Conference 2006, and adopted by the National Cancer Institute Center for Bioinformatics (Frank Hartel, Director, Enterprise Vocabulary Services), funding a production version, IBM Research (Edith Schonberg, Research Scientist), in their latest reasoner, and TopQuadrant (Dean Allamang, Chief Scientist), in their TopBraid Composer.Google-Scholar citations 51(20pa) including conference versions.

Model-driven approach for capturing and sharing proteomics experimental data.Describes a data model and model-driven toolset that have been adopted widely by the proteomics community, with over 3000 downloads of the software.Many authors from different sites, since Nature Biotechnology wanted a demonstration of wide community support. Model and toolset developed at Manchester, and refined in the light of feedback from other authors.ISI citations 123(27pa); Google-Scholar citations 163(36pa).

Extension to UML, allowing user interface design to be captured alongside other aspects of an application.Provides conservative extensions to UML to support task modelling, and both abstract and concrete presentation modelling.Presents results of a case-study, using design metrics to demonstrate a significant reduction in structural complexity compared with vanilla UML.Google-Scholar citations 81(19pa) including conference version.

Teallach, a model-based user interface development environment for data-intensive applications.Contributions include: close integration of the models describing different aspects of the interface; support for automatic generation of fully functional interfaces from the models; and a methodology in which the models can be designed and composed in any order.Illustrates the approach using a practical example, and provides initial results on usability.Google-Scholar citations 49.

Integration of distributed query processing (DQP) with Grids - the benefit for DQP is dynamic access to computational resources, the benefit for Grids is declarative data integration.Describes an architecture, supporting pipelined and partitioned parallelism for query evaluation over grid middleware, detailing the implementation of a prototype system.Provides experimental results that demonstrate a speedup for tasks combining database access with computational analysis, using a practical bioinformatics application running on a real Grid.Related papers address parallel scheduling of queries on heterogeneous resources [Sakellariou-1] and monitoring of query progress for adaptive scheduling [Fernandes-1].Google-Scholar citations 72(13pa) including conference version.

Experimental study of adaptive and selective prototype optimisation schemes in vector spaces, taking ideas from nearest-neighbour classification and applying them in a novel setting.Demonstrates that, when adaptive prototype methods are combined with linear dissimilarity-based classifiers, they provide the best trade-off between small condensed prototype sets and high classification accuracy.Supported by a comprehensive experimental evaluation to establish reduction rates and classification performance for several different prototype optimisation methods, using real data.

Normal-density parametric classifier for pair-wise dissimilarity data.Shows that summation-based dissimilarity measures are approximately normally distributed, suggesting that a Gaussian parametric classifier should perform well using dissimilarity data.Demonstrates experimentally that the performance of such classifiers can improve significantly on nearest-neighbour classifiers applied to the same data.

Related software has been incorporated in the Matlab PRTools toolbox and licensed to industrial partners including Unilever, Philips Medical and DeBeers.

ISI citations 19; Google-Scholar citations 30.

Prototype selection for dissimilarity-based linear and quadratic classifiers.Evaluates experimentally the usefulness of several different prototype selection techniques in building simple classifiers for metric and non-metric dissimilarity data.Demonstrates that dissimilarity-based linear and quadratic classifiers can be both accurate and computationally efficient when defined on small but suitable prototype sets.First dedicated study of the problem, suggesting that there may be useful alternatives to kernel-based methods.

Generalisation of kernel-based classification methods to deal with data presented as sets of pair-wise dissimilarity measurements between objects.Proposes simple but effective statistical techniques, based on generalised kernels, that learn from arbitrary metric or non-metric proximity data.Amongst the first to address this problem and the related topics of indefinite kernels and pseudo-Euclidean spaces.Demonstrates practical use of pseudo-Euclidean and dissimilarity spaces for classification, backed up by extensive experimental evaluation.

Related software has been incorporated in the Matlab PRTools toolbox and licensed to industrial partners including Unilever, Philips Medical and DeBeers.

ISI citations 17; Google-Scholar citations 84(15pa).

Experimental evaluation of a virtual reality (VR) environment for developing and testing manual skills, using a fairground 'hoop and wire' game.Compares users' performance in identical real and virtual versions of the game, finding error rates up to two orders of magnitude higher for the virtual game.Provides important insight which should inform the development of safety-critical applications of VR such as virtual surgery and surgical training.

Experimental evaluation of the use of virtual reality (VR) as a training aid for real-world route learning - often claimed to be an important application of VR.Provides evidence that a paper map is much more efficient than hours spent in a VR simulator.

Hybrid network architecture for collaborative virtual reality.Critically evaluates existing network architectures for collaborative virtual reality, using performance data from real wide-area networks, and proposes a new hybrid 'roaming-server' architecture that addresses weaknesses of existing schemes.Demonstrates the feasibility of the approach using both simulated and real wide-area networks.

Computational complexity results for determining logical relationships between sentences in various fragments of English.Applies techniques from model-theory, proof theory and complexity theory to natural language.Extends earlier work [Journal of Logic, Language and Information 13, 2004, 207-223], where techniques of complexity theory were applied to semantics of fragments of natural language for the first time.

Investigation of the logical power of English expressions involving temporal prepositions, using a formal semantics to map sentences featuring temporal prepositions into an interval temporal logic.Shows that the satisfiability problem for the resulting temporal logic is NExpTime-complete.

Proof that satisfiability and finite satisfiability problems for the two-variable fragment with counting are both in NExpTime, even with succinct coding of counting quantifiers.Settles open problem first mentioned in print by Pacholski et al in 1997.Results used by others to establish new complexity-theoretic results for various expressive description logics (see citing papers).One of a series establishing the complexity of various logics with counting quantifiers including: finite satisfiability problem for the guarded two-variable fragment with counting is in ExpTime [Journal of Logic and Computation 17, 2007, 133-155]; one-variable fragment with counting is in NP [submitted].

Analysis of the expressive power of a spatial description language, where variables range over polyhedral regions of three-dimensional Euclidean space.Shows that this language is expressive enough to characterise any tuple of polyhedral regions up to topological equivalence.Last in a series of papers on spatial description languages interpreted over the Euclidean plane, which form the basis of an extended treatment of the subject in the 'Handbook of Spatial Logics' edited by Marco Aiello, Ian Pratt-Hartmann and Johan van Benthem, Springer, 2007.