* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MSc_2011 - University of Alberta
Survey
Document related concepts
Transcript
Cindy M. Wong, August 2011 Electronic Imaging Lab University of Alberta HUMAN-BASED COMPUTATION FOR MICROFOSSIL IDENTIFICATION Outline Introduction Evolutionary Prototyping Human Interaction Computation Algorithms Conclusion Cindy Wong Aug-11 2 Introduction Cindy Wong Aug-11 3 Introduction: Motivation Image understanding is considered an artificial intelligence (AI) complete problem Human-based computation is gaining popularity as a method to solve AI-complete problems Progress in this area may be made with a concrete application of sufficient importance Microfossil identification is one such application, which is the focus of this work Cindy Wong Aug-11 4 Introduction: Crowdsourcing Cindy Wong Aug-11 5 Introduction: Crowdsourcing Humans Computers Cindy Wong Aug-11 6 Introduction: Foraminifera Microfossils help to locate hydrocarbon deposits via biostratigraphy and to study prehistoric environmental conditions via geochemistry Foraminifera (forams) – single-celled protozoa with shells (~1 mm) that live in bodies of water Acarinina Subbotina Morozovella Identified manually by experts at present Research has been performed on automated identification with limited success Cindy Wong Aug-11 7 Introduction: Automated Identification Rule-based approaches need a person to input features Require experts to manually view and manipulate specimens (example: VIDES) Artificial Neural Network based approaches involve training a system Need high quality SEM images (COGNIS), generate high incorrect rates (COGNIS Light), or are difficult to understand (SYRACO) Cindy Wong Aug-11 8 Evolutionary Prototyping Cindy Wong Aug-11 9 Evolutionary Prototyping: Design Cycle Requirements Refinement Testing and Validation Prototype Modification Ideal design cycle because: Exploratory research requires validation Crowdsourcing is unpredictable Modifying old prototypes saves time Cindy Wong Aug-11 10 Evolutionary Prototyping: Prototypes Prototype timeline (year 0 is Jan. 1, 2006): CASSIE 1 (0 to 1 1/12 ) CASSIE 2 (1 1/12 to 3 1/2 ) Microfossil Quest (3 1/2 to 5 2/3 ) Cindy Wong Aug-11 11 Evolutionary Prototyping: First Prototype Specimen Acquisition Computation Algorithms Human Interaction Computer-aided system for specimen identification and examination (CASSIE) 1 prototype (Jan. 2006–Feb. 2007) Requirement: reduce expert workload Modification: clustering using image correlation to compare similarity Validation: identifications obtained via Microfossil Wiki for analysis Cindy Wong Aug-11 12 Evolutionary Prototyping: Second Prototype Specimen Acquisition Computation Algorithms Human Interaction Specimen Dissemination CASSIE 2 prototype (Feb. 2007–Jun. 2009) Requirement: improve digital representations to account for illumination variability Modification: automatic video capture Validation: difficulty obtaining ground truth identifications but variability addressed Cindy Wong Aug-11 13 Evolutionary Prototyping: Third Prototype Specimen Acquisition Human Interaction Specimen Dissemination Computation Algorithms Microfossil Quest prototype (Jun. 2009–Aug. 2011) Requirement: transition from computer-aided to crowdsourcing system Modification: leverage crowdsourcing Validation: individual components validated Cindy Wong Aug-11 14 Evolutionary Prototyping: Languages and Architectures Quest code organization, execution location, inter and intra-component interaction, and programming languages Cindy Wong Aug-11 15 Human Interaction Cindy Wong Aug-11 16 Human Interaction: Overview Created the Microfossil Quest website to interact with volunteers and inform users For this human-based computation system, the human interaction part incorporates citizen science in its design Cindy Wong Aug-11 17 Human Interaction: Organization Microfossil Quest site is navigated using a menu for non-linear navigation Layout goes left-to-right from more specific information to more general information Specific General Cindy Wong Aug-11 18 Human Interaction: Home Users search the database for a subset of specimens or use the default search Users update captions to update specimen identifications Website demo (http://www.ece.ualber ta.ca/~imagesci/microf ossilQuestO865) Cindy Wong Aug-11 19 Human Interaction: Tutorial Training for volunteers and information for other users Focus is placed on teaching features Organization of topics top-to-bottom based on requirement of least to most knowledge Cindy Wong Aug-11 20 Human Interaction: System Gives an overview of the Microfossil Quest system Users Specimen Acquisition Knowledge Base Users are able to click on the different modules to get more details Computer Intelligence Human Intelligence Cindy Wong Aug-11 21 Computation Algorithms Cindy Wong Aug-11 22 Computation Algorithms: Overview Dynamic hierarchical identification (DHI) Unsupervised learning Supervised learning Dynamic learning Experimental results Cindy Wong Aug-11 23 Computation Algorithms: Unsupervised Learning Generates clusters to increase thoroughness Does not require user input Uses agglomerative hierarchical clustering Formation of clusters visualized with trees Cindy Wong Aug-11 24 Computation Algorithms: Unsupervised Learning 0.4118 2104 0.5027 0.5854 0.9141 0.4104 0.2458 2105 1472 1205 1633 0.9 0.7 0.3122 0.5 0.7087 0.2474 0.2 0.3066 Cindy Wong Aug-11 25 Computation Algorithms: Unsupervised Learning 0.4104 2104 2105 1472 1205 1633 0.5027 0.9 0.5854 0.2458 0.7 0.5 0.7087 0.2 0.3066 Cindy Wong Aug-11 26 Computation Algorithms: Unsupervised Learning 0.4104 2104 2105 1472 1205 1633 0.5027 0.9 0.2458 0.7 0.5 0.2 Cindy Wong Aug-11 27 Computation Algorithms: Unsupervised Learning 2104 2105 1472 1205 1633 0.9 0.2458 0.7 0.5 0.2 Cindy Wong Aug-11 28 Computation Algorithms: Unsupervised Learning 2104 2105 1472 1205 1633 0.9 0.7 0.5 0.2 Cindy Wong Aug-11 29 Computation Algorithms: Supervised Learning Propagates identifications reliably Assumes only some specimen identifications are known (direct identifications) Uses the trees to propagate identifications (indirect identifications) Propagates identifications according to majority identification in the cluster Assigns confidence level for indirect identifications according to merge level Cindy Wong Aug-11 30 Computation Algorithms: Supervised Learning M. subb M. subb 0.75 M. subb M. subb 0.51 M. subb M. subb 0.9 M. vela M. M. vela M. vela M. vela M. M. vela 0.35 M. vela M. vela M. subb M. subb M. vela 0.108 M. vela M. vela M. subb M. subb M. vela Cindy Wong Aug-11 M. vela 31 Computation Algorithms: Dynamic Learning Serves to increase throughput with priority generation algorithm Assumes users are only able to identify a small number of specimens at a time Encourages users to identify specimens according to what increases the average confidence of the dataset the most Calculates distance, or amount of improvement if identified, to determine priority (one minus merge level equals new priority) Cindy Wong Aug-11 32 ∞ Computation Algorithms: Dynamic Learning ∞ ∞ ∞ −∞ ∞ ∞ 2011 2012 2013 2014 2015 2016 2017 ∞ 0.1 0.8 ∞ 0.2 0.6 0.4 0.2 −∞ 0.5 0.4 0.2 −∞ 0.9 =1-0.9 0.5 0.3 0.7 0.1 0.4 0.2 −∞ 0.5 0.2 0.7 0.1 0.4 0.2 −∞ 0.5 0.8 priority (2) (6) (4) (5) (3) (1) Cindy Wong Aug-11 33 Computation Algorithms: Multiple Trees Order Genus Species - unknown - known Computation algorithms depend on taxonomic detail available for specimens in the tree Run algorithms with different trees using specimens from the top to the bottom of the table Cindy Wong Aug-11 34 Computation Algorithms: Experimental Results Validation of results was done by comparing DHI to a standard clustering algorithm: k-nearest neighbours (KNN) Testing materials used were 238 specimens with particle-based identifications (ground truth) Examined: correct identification rates incorrect identification rates impact of thresholding average confidences Cindy Wong Aug-11 35 Computation Algorithms: Correct Rates Correct rates illustrate the thoroughness in dataset identification DHI has more thorough and predictable results than KNN Cindy Wong Aug-11 36 Computation Algorithms: Incorrect Rates Incorrect rates show the reliability of the generated identifications in the dataset DHI is more reliable and predictable than KNN Cindy Wong Aug-11 37 Computation Algorithms: Threshold Results Lower thresholds imply more leveraging Comparing threshold results illustrates how limiting propagation confidence affects throughput of identification Cindy Wong Aug-11 38 Computation Algorithms: Average Confidence Average confidence illustrates how quickly dataset identifications are propagated Results predict thoroughness of correct rates Cindy Wong Aug-11 39 Conclusion Cindy Wong Aug-11 40 Conclusion: Contributions (Evolutionary Prototyping) Created the first crowdsourcing design for microfossil identification Developed components of the Microfossil Quest prototype, a crowdsourcing approach evolved from a computer-aided approach Provided a case study on developing a crowdsourcing project using the evolutionary prototyping design cycle Cindy Wong Aug-11 41 Conclusion: Contributions (Human Interaction) Unlike most crowdsourcing projects that involve websites, the Microfossil Quest design: Enables volunteer control over identification tasks Incorporates educational material on the system A new interactive digital representation, which presents illumination and depth information, was included in the website – it is a contribution to a coauthored Journal of Microscopy paper Cindy Wong Aug-11 42 Conclusion: Contributions (Computation Algorithms) Created a supervised learning algorithm to propagate identifications using tree structures computed by unsupervised learning Created a dynamic learning algorithm, which prioritizes specimens for identification Testing of the DHI algorithm verifies an increase in thoroughness, reliability, predictability, and throughput, when compared to a benchmark KNN identification algorithm Cindy Wong Aug-11 43 Acknowledgements Thank you to Dr. Dileepan Joseph, Dr. Kamal Ranaweera, and Adam Harrison for their guidance and support Thank you to family and friends for their support through both undergraduate and graduate school Cindy Wong Aug-11 44 Appendix Cindy Wong Aug-11 45 Special Cases: Genus Correct and incorrect genus rates versus image quality: (left) using specialist ratings of quality (S. Bains); (right) using automatic ratings of quality (Fourier method) Cindy Wong Aug-11 46 Special Cases: Species Correct and incorrect species rates versus image quality: (left) using specialist ratings of quality (S. Bains); (right) using automatic ratings of quality (Fourier method) Cindy Wong Aug-11 47