* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MSc_2011 - University of Alberta
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Cindy M. Wong, August 2011 Electronic Imaging Lab University of Alberta HUMAN-BASED COMPUTATION FOR MICROFOSSIL IDENTIFICATION Outline  Introduction  Evolutionary Prototyping  Human Interaction  Computation Algorithms  Conclusion Cindy Wong Aug-11 2 Introduction Cindy Wong Aug-11 3 Introduction: Motivation  Image understanding is considered an artificial intelligence (AI) complete problem  Human-based computation is gaining popularity as a method to solve AI-complete problems  Progress in this area may be made with a concrete application of sufficient importance  Microfossil identification is one such application, which is the focus of this work Cindy Wong Aug-11 4 Introduction: Crowdsourcing Cindy Wong Aug-11 5 Introduction: Crowdsourcing Humans Computers Cindy Wong Aug-11 6 Introduction: Foraminifera  Microfossils help to locate hydrocarbon deposits via biostratigraphy and to study prehistoric environmental conditions via geochemistry  Foraminifera (forams) – single-celled protozoa with shells (~1 mm) that live in bodies of water Acarinina Subbotina Morozovella  Identified manually by experts at present  Research has been performed on automated identification with limited success Cindy Wong Aug-11 7 Introduction: Automated Identification Rule-based approaches need a person to input features  Require experts to manually view and manipulate specimens (example: VIDES) Artificial Neural Network based approaches involve training a system  Need high quality SEM images (COGNIS), generate high incorrect rates (COGNIS Light), or are difficult to understand (SYRACO) Cindy Wong Aug-11 8 Evolutionary Prototyping Cindy Wong Aug-11 9 Evolutionary Prototyping: Design Cycle Requirements Refinement Testing and Validation Prototype Modification  Ideal design cycle because:  Exploratory research requires validation  Crowdsourcing is unpredictable  Modifying old prototypes saves time Cindy Wong Aug-11 10 Evolutionary Prototyping: Prototypes  Prototype timeline (year 0 is Jan. 1, 2006):  CASSIE 1 (0 to 1 1/12 )  CASSIE 2 (1 1/12 to 3 1/2 )  Microfossil Quest (3 1/2 to 5 2/3 ) Cindy Wong Aug-11 11 Evolutionary Prototyping: First Prototype Specimen Acquisition Computation Algorithms Human Interaction  Computer-aided system for specimen identification and examination (CASSIE) 1 prototype (Jan. 2006–Feb. 2007)  Requirement: reduce expert workload  Modification: clustering using image correlation to compare similarity  Validation: identifications obtained via Microfossil Wiki for analysis Cindy Wong Aug-11 12 Evolutionary Prototyping: Second Prototype Specimen Acquisition Computation Algorithms Human Interaction Specimen Dissemination  CASSIE 2 prototype (Feb. 2007–Jun. 2009)  Requirement: improve digital representations to account for illumination variability  Modification: automatic video capture  Validation: difficulty obtaining ground truth identifications but variability addressed Cindy Wong Aug-11 13 Evolutionary Prototyping: Third Prototype Specimen Acquisition Human Interaction Specimen Dissemination Computation Algorithms  Microfossil Quest prototype (Jun. 2009–Aug. 2011)  Requirement: transition from computer-aided to crowdsourcing system  Modification: leverage crowdsourcing  Validation: individual components validated Cindy Wong Aug-11 14 Evolutionary Prototyping: Languages and Architectures  Quest code organization, execution location, inter and intra-component interaction, and programming languages Cindy Wong Aug-11 15 Human Interaction Cindy Wong Aug-11 16 Human Interaction: Overview  Created the Microfossil Quest website to interact with volunteers and inform users  For this human-based computation system, the human interaction part incorporates citizen science in its design Cindy Wong Aug-11 17 Human Interaction: Organization  Microfossil Quest site is navigated using a menu for non-linear navigation  Layout goes left-to-right from more specific information to more general information Specific General Cindy Wong Aug-11 18 Human Interaction: Home  Users search the database for a subset of specimens or use the default search  Users update captions to update specimen identifications  Website demo (http://www.ece.ualber ta.ca/~imagesci/microf ossilQuestO865) Cindy Wong Aug-11 19 Human Interaction: Tutorial  Training for volunteers and information for other users  Focus is placed on teaching features  Organization of topics top-to-bottom based on requirement of least to most knowledge Cindy Wong Aug-11 20 Human Interaction: System  Gives an overview of the Microfossil Quest system Users Specimen Acquisition Knowledge Base  Users are able to click on the different modules to get more details Computer Intelligence Human Intelligence Cindy Wong Aug-11 21 Computation Algorithms Cindy Wong Aug-11 22 Computation Algorithms: Overview  Dynamic hierarchical identification (DHI)  Unsupervised learning  Supervised learning  Dynamic learning  Experimental results Cindy Wong Aug-11 23 Computation Algorithms: Unsupervised Learning  Generates clusters to increase thoroughness  Does not require user input  Uses agglomerative hierarchical clustering  Formation of clusters visualized with trees Cindy Wong Aug-11 24 Computation Algorithms: Unsupervised Learning 0.4118 2104 0.5027 0.5854 0.9141 0.4104 0.2458 2105 1472 1205 1633 0.9 0.7 0.3122 0.5 0.7087 0.2474 0.2 0.3066 Cindy Wong Aug-11 25 Computation Algorithms: Unsupervised Learning 0.4104 2104 2105 1472 1205 1633 0.5027 0.9 0.5854 0.2458 0.7 0.5 0.7087 0.2 0.3066 Cindy Wong Aug-11 26 Computation Algorithms: Unsupervised Learning 0.4104 2104 2105 1472 1205 1633 0.5027 0.9 0.2458 0.7 0.5 0.2 Cindy Wong Aug-11 27 Computation Algorithms: Unsupervised Learning 2104 2105 1472 1205 1633 0.9 0.2458 0.7 0.5 0.2 Cindy Wong Aug-11 28 Computation Algorithms: Unsupervised Learning 2104 2105 1472 1205 1633 0.9 0.7 0.5 0.2 Cindy Wong Aug-11 29 Computation Algorithms: Supervised Learning  Propagates identifications reliably  Assumes only some specimen identifications are known (direct identifications)  Uses the trees to propagate identifications (indirect identifications)  Propagates identifications according to majority identification in the cluster  Assigns confidence level for indirect identifications according to merge level Cindy Wong Aug-11 30 Computation Algorithms: Supervised Learning M. subb M. subb 0.75 M. subb M. subb 0.51 M. subb M. subb 0.9 M. vela M. M. vela M. vela M. vela M. M. vela 0.35 M. vela M. vela M. subb M. subb M. vela 0.108 M. vela M. vela M. subb M. subb M. vela Cindy Wong Aug-11 M. vela 31 Computation Algorithms: Dynamic Learning  Serves to increase throughput with priority generation algorithm  Assumes users are only able to identify a small number of specimens at a time  Encourages users to identify specimens according to what increases the average confidence of the dataset the most  Calculates distance, or amount of improvement if identified, to determine priority (one minus merge level equals new priority) Cindy Wong Aug-11 32 ∞ Computation Algorithms: Dynamic Learning ∞ ∞ ∞ −∞ ∞ ∞ 2011 2012 2013 2014 2015 2016 2017 ∞ 0.1 0.8 ∞ 0.2 0.6 0.4 0.2 −∞ 0.5 0.4 0.2 −∞ 0.9 =1-0.9 0.5 0.3 0.7 0.1 0.4 0.2 −∞ 0.5 0.2 0.7 0.1 0.4 0.2 −∞ 0.5 0.8 priority (2) (6) (4) (5) (3) (1) Cindy Wong Aug-11 33 Computation Algorithms: Multiple Trees Order Genus Species             - unknown - known  Computation algorithms depend on taxonomic detail available for specimens in the tree  Run algorithms with different trees using specimens from the top to the bottom of the table Cindy Wong Aug-11 34 Computation Algorithms: Experimental Results  Validation of results was done by comparing DHI to a standard clustering algorithm: k-nearest neighbours (KNN)  Testing materials used were 238 specimens with particle-based identifications (ground truth)  Examined:  correct identification rates  incorrect identification rates  impact of thresholding  average confidences Cindy Wong Aug-11 35 Computation Algorithms: Correct Rates  Correct rates illustrate the thoroughness in dataset identification  DHI has more thorough and predictable results than KNN Cindy Wong Aug-11 36 Computation Algorithms: Incorrect Rates  Incorrect rates show the reliability of the generated identifications in the dataset  DHI is more reliable and predictable than KNN Cindy Wong Aug-11 37 Computation Algorithms: Threshold Results  Lower thresholds imply more leveraging  Comparing threshold results illustrates how limiting propagation confidence affects throughput of identification Cindy Wong Aug-11 38 Computation Algorithms: Average Confidence  Average confidence illustrates how quickly dataset identifications are propagated  Results predict thoroughness of correct rates Cindy Wong Aug-11 39 Conclusion Cindy Wong Aug-11 40 Conclusion: Contributions (Evolutionary Prototyping)  Created the first crowdsourcing design for microfossil identification  Developed components of the Microfossil Quest prototype, a crowdsourcing approach evolved from a computer-aided approach  Provided a case study on developing a crowdsourcing project using the evolutionary prototyping design cycle Cindy Wong Aug-11 41 Conclusion: Contributions (Human Interaction)  Unlike most crowdsourcing projects that involve websites, the Microfossil Quest design:  Enables volunteer control over identification tasks  Incorporates educational material on the system  A new interactive digital representation, which presents illumination and depth information, was included in the website – it is a contribution to a coauthored Journal of Microscopy paper Cindy Wong Aug-11 42 Conclusion: Contributions (Computation Algorithms)  Created a supervised learning algorithm to propagate identifications using tree structures computed by unsupervised learning  Created a dynamic learning algorithm, which prioritizes specimens for identification  Testing of the DHI algorithm verifies an increase in thoroughness, reliability, predictability, and throughput, when compared to a benchmark KNN identification algorithm Cindy Wong Aug-11 43 Acknowledgements  Thank you to Dr. Dileepan Joseph, Dr. Kamal Ranaweera, and Adam Harrison for their guidance and support  Thank you to family and friends for their support through both undergraduate and graduate school Cindy Wong Aug-11 44 Appendix Cindy Wong Aug-11 45 Special Cases: Genus  Correct and incorrect genus rates versus image quality: (left) using specialist ratings of quality (S. Bains); (right) using automatic ratings of quality (Fourier method) Cindy Wong Aug-11 46 Special Cases: Species  Correct and incorrect species rates versus image quality: (left) using specialist ratings of quality (S. Bains); (right) using automatic ratings of quality (Fourier method) Cindy Wong Aug-11 47
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            