Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Algoval: Evaluation Server Past, Present and Future Simon Lucas Computer Science Dept Essex University 25 January, 2002 Architecture Evolution • Version 1: Centralised evaluation of Java submissions (Spring 2000) • Version 2: Distributed evaluation using Java RMI (Summer 2001) • Version 3: Distributed evaluation using XML over HTTP (Spring 2002) Competitions • Post-Office Sponsored OCR Competition (Autumn 2000) • IEEE Congress on Evolutionary Computation 2001 • IEEE WCCI 2002 • ICDAR 2003 • Wide range of contests – OCR, Sequence Recognition, Object Recognition Sample Results Statistics Details More Details Parameterised Algorithms • Note that league table entries can include the parameters that were used to configure the algorithm • This allows developers to observe the results of different parameter settings on the performance measures • E.g.: problems.seqrec.SNTupleRecognizer?n=4&g ap=11?eps=0.01 Centralised • System restricted submissions to be written in Java – for security reasons – Java programs can be run in within a highly restrictive security manager • Does not scale well under heavy load • Many researchers unwilling to convert their algorithm implementations to Java Centralised II • Can measure every aspect of an algorithms performance – Speed – Memory requirements (static, dynamic) • All algorithms compete on a level playing field • Very difficult for an algorithm to cheat Distributed • Researchers can test their algorithms against others without submitting their code • Results on new datasets can be generated immediately for all clients that are connected to the evaluation server • Results are generated by the same evaluation method. • Hence meaningful comparisons can be made between different algorithms. Distributed (RMI) • Based on Java’s Remote Method Invocation (RMI) • Works okay, but client programs still need to access a Java Virtual Machine • BUT: the algorithms can now be implemented in any language • However: there may still be some work converting the Java data structures to the native language Distributed II • Since most computation is done on the clients' machines, it scales well. • Researchers can implement their algorithms in any language they choose - it just has to talk to the evaluation proxy on their machine. • When submitting an algorithm it is also possible to specify URLs for the author and the algorithm • Visitors to the web-site can view league tables then follow links to the algorithm and its implementer. Distributed (RMI) UML Sequence Remote Participation • Developers download a kit • Interface their algorithm to the spec. • Run a command-line batch file to invoke their algorithm on a specified problem Features of RMI • Handles Object Serialization • Hence: problem specifications can easily include complex data structures • Fragile! – changes to the Java classes may require developers to download a new developer kit • Does not work well through firewalls • HTTP Tunnelling can solve some problems, but has limitations (e.g. no callbacks) <future>XML Version</future> • While Java RMI is platform independent (any platform with a JVM), XML is language independent • XML version is HTTP based • No known problems with firewalls XML Version • Each client (algorithm under test) – parses XML objects (e.g. datasets) – sends back XML objects (e.g. pattern classifications) to the server Pattern recognition servers • Reside at particular URLs • Can be trained on specified or supplied datasets • Can respond to recognition requests Example Request • Recognize this word: • Given the dictionary at: – http://ace.essex.ac.uk/viadocs/dic/pygenera.txt • And the OCR training set at: – http://ace.essex.ac.uk/algoval/ocr/viadocs1.xml • Respond with your 10 best word hypotheses Example Response 1. MELISSOBLAPTES 2. ENDOMMMASIS 3. HETEROGRAPHIS 4. TRICHOBAPTES 5. HETEROCHROSIS 6. PHLOEOGRAPTIS 7. HETEROCNEPHES 8. DRESCOMPOSIS 9. MESOGRAPHE 10.DIPSOCHARES Issues • How general to make problem specs – Could set up separate problems for OCR and face recognition, or a single problem called ImageRecognition • How does the software effort scale? Software Scalability • Suppose we have: – A algorithms implemented in L languages – D datasets – P problems – E algorithm evaluators • How will our software effort scale with respect to these numbers? Scalability (contd.) • Consider server and clients • More effort at the server can mean less effort for clients • For example, language specific interfaces and wrappers can be defined • This makes participation in a particular language much less effort • This could be done on demand Summary • Independent, automatic algorithm evaluation • Makes sound scientific and economic sense • Existing system works but has some limitations • Future XML-based system will overcome these • Then need to get people using this • Future contests will help • Industry support will benefit both academic research and commercial exploitation