Download Algoval - University of Essex

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Algoval: Evaluation Server
Past, Present and Future
Simon Lucas
Computer Science Dept
Essex University
25 January, 2002
Architecture Evolution
• Version 1: Centralised evaluation of
Java submissions (Spring 2000)
• Version 2: Distributed evaluation using
Java RMI (Summer 2001)
• Version 3: Distributed evaluation using
XML over HTTP (Spring 2002)
Competitions
• Post-Office Sponsored OCR Competition
(Autumn 2000)
• IEEE Congress on Evolutionary Computation
2001
• IEEE WCCI 2002
• ICDAR 2003
• Wide range of contests – OCR, Sequence
Recognition, Object Recognition
Sample Results
Statistics
Details
More Details
Parameterised Algorithms
• Note that league table entries can include the
parameters that were used to configure the
algorithm
• This allows developers to observe the results
of different parameter settings on the
performance measures
• E.g.:
problems.seqrec.SNTupleRecognizer?n=4&g
ap=11?eps=0.01
Centralised
• System restricted submissions to be
written in Java – for security reasons
– Java programs can be run in within a
highly restrictive security manager
• Does not scale well under heavy load
• Many researchers unwilling to convert
their algorithm implementations to Java
Centralised II
• Can measure every aspect of an
algorithms performance
– Speed
– Memory requirements (static, dynamic)
• All algorithms compete on a level
playing field
• Very difficult for an algorithm to cheat
Distributed
• Researchers can test their algorithms against
others without submitting their code
• Results on new datasets can be generated
immediately for all clients that are connected
to the evaluation server
• Results are generated by the same
evaluation method.
• Hence meaningful comparisons can be made
between different algorithms.
Distributed (RMI)
• Based on Java’s Remote Method Invocation
(RMI)
• Works okay, but client programs still need to
access a Java Virtual Machine
• BUT: the algorithms can now be implemented
in any language
• However: there may still be some work
converting the Java data structures to the
native language
Distributed II
• Since most computation is done on the
clients' machines, it scales well.
• Researchers can implement their algorithms
in any language they choose - it just has to
talk to the evaluation proxy on their machine.
• When submitting an algorithm it is also
possible to specify URLs for the author and
the algorithm
• Visitors to the web-site can view league
tables then follow links to the algorithm and
its implementer.
Distributed (RMI)
UML Sequence
Remote Participation
• Developers download a kit
• Interface their algorithm to the spec.
• Run a command-line batch file to invoke
their algorithm on a specified problem
Features of RMI
• Handles Object Serialization
• Hence: problem specifications can easily
include complex data structures
• Fragile! – changes to the Java classes may
require developers to download a new
developer kit
• Does not work well through firewalls
• HTTP Tunnelling can solve some problems,
but has limitations (e.g. no callbacks)
<future>XML Version</future>
• While Java RMI is platform independent
(any platform with a JVM), XML is
language independent
• XML version is HTTP based
• No known problems with firewalls
XML Version
• Each client (algorithm under test)
– parses XML objects (e.g. datasets)
– sends back XML objects (e.g. pattern
classifications) to the server
Pattern recognition servers
• Reside at particular URLs
• Can be trained on specified or supplied
datasets
• Can respond to recognition requests
Example Request
• Recognize this word:
• Given the dictionary at:
– http://ace.essex.ac.uk/viadocs/dic/pygenera.txt
• And the OCR training set at:
– http://ace.essex.ac.uk/algoval/ocr/viadocs1.xml
• Respond with your 10 best word hypotheses
Example Response
1. MELISSOBLAPTES
2. ENDOMMMASIS
3. HETEROGRAPHIS
4. TRICHOBAPTES
5. HETEROCHROSIS
6. PHLOEOGRAPTIS
7. HETEROCNEPHES
8. DRESCOMPOSIS
9. MESOGRAPHE
10.DIPSOCHARES
Issues
• How general to make problem specs
– Could set up separate problems for OCR
and face recognition, or a single problem
called ImageRecognition
• How does the software effort scale?
Software Scalability
• Suppose we have:
– A algorithms implemented in L languages
– D datasets
– P problems
– E algorithm evaluators
• How will our software effort scale with
respect to these numbers?
Scalability (contd.)
• Consider server and clients
• More effort at the server can mean less
effort for clients
• For example, language specific
interfaces and wrappers can be defined
• This makes participation in a particular
language much less effort
• This could be done on demand
Summary
• Independent, automatic algorithm evaluation
• Makes sound scientific and economic sense
• Existing system works but has some
limitations
• Future XML-based system will overcome
these
• Then need to get people using this
• Future contests will help
• Industry support will benefit both academic
research and commercial exploitation