Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Semi-Supervised Training for Appearance-Based Statistical Object Detection Methods Charles Rosenberg Thesis Oral May 10, 2004 Thesis Committee Martial Hebert, co-chair Sebastian Thrun, co-chair Henry Schneiderman Avrim Blum Tom Minka, Microsoft Research 1 Motivation: Object Detection Example eye detections from the Schneiderman detector. • Modern object detection systems “work”. • Lots of manually labeled training data required. • How can we reduce the cost of training data? 2 Approach: Semi-Supervised Training • Supervised training: costly fully labeled data • Semi-Supervised training: fully and weakly labeled data. • Goal: Develop semi-supervised approach for the object detection problem and characterize issues. 3 What is Semi-Supervised Training? • Supervised Training – Standard training approach – Training with fully labeled data • Semi-Supervised Training – Training with a combination of fully labeled data and unlabeled or weakly labeled data • Weakly Labeled Data – Certain label values unknown – E.g. object is present, but location and scale unknown – Labeling is relatively “cheap” • Unlabeled Data – No label information known 4 Issues for Object Detection • What semi-supervised approaches are applicable? – Ability to handle object detection problem uniqueness. – Compatibility with existing detector implementations. • What are the practical concerns? – Object detector interactions – Training data issues – Detector parameter settings • What kind of performance gain possible? – How much labeled training data is needed? 5 Contributions • Devised approach which achieves substantial performance gains through semi-supervised training. • Comprehensive evaluation of semi-supervised training applied to object detection. • Detailed characterization and comparison of semisupervised approaches used. 6 Presentation Outline • • • • • • Introduction Background Semi-supervised Training Approach Analysis: Filter Based Detector Analysis: Schneiderman Detector Conclusions and Future Work 7 What is Unique About Object Detection? • Complex feature set – high dimensional, continuous with a complex distribution • Large inherent variation – lighting, viewpoint, scale, location, etc. • Many examples per training image – many negative examples and a very small number of positive examples. • Negative examples are free. • Large class overlap – the object class is a “subset” of the clutter class P(X) 8 Background • Graph-Based Approaches – Graph is constructed to represent the labeled and unlabeled data relationships – construction method important. – Edges in the graph are weighted according to distance measure. – Blum, Chawla, ICML 2001. Szummer, Jaakkola, NIPS 2001. Zhu, Ghahramani, Lafferty, ICML 2003. • Information Regularization – explicit about information transferred from P(X) to P(Y|X) – Szummer, Jaakkola, NIPS 2002; Corduneanu, Jaakkola, UAI 2003. • Multiple Instance Learning – Addresses multiple examples per data element – Dietterich, Lathrop, Lozano-Perez, AI 97. Maron, Lozano-Perez, NIPS 1998. Zhang, Goldman, NIPS 2001. • Transduction, other methods… 9 Presentation Outline • • • • • • Introduction Background Semi-supervised Training Approach Analysis: Filter Based Detector Analysis: Schneiderman Detector Conclusions and Future Work 10 Semi-Supervised Training Approaches • Expectation-Maximization (EM) – Batch Algorithm • All data processed each iteration – Soft Class Assignments • Likelihood distribution over class labels • Distribution recomputed each iteration • Self-Training – Incremental Algorithm • Data added to active pool at iteration – Hard Class Assignments • Most likely class assigned • Labels do not change once assigned 11 Semi-Supervised Training with EM Train initial detector model with initial labeled data set. Run detector on weakly labeled set and compute most likely detection. •Dempster, Laird, Rubin, 1977. •Nigam, McCallum, Thrun, Mitchell. 1999. Repeat for a fixed number of iterations or until convergence. Compute expected statistics of fully labeled examples and weakly labeled examples weighted by class likelihoods. Update the parameters of the detection model. Expectation step Maximization Step 12 Semi-Supervised Training with Self-Training Train detector model with the labeled data set. Run detector on weakly labeled set and compute most likely detection. Score each detection with the selection metric. Repeat until weakly labeled data exhausted for until some other stopping criterion. Select the m best scoring examples and add them to the labeled training set. Nigam, Ghani, 2000. Moreno, Agaarwal, ICML 2003 13 Self-Training Selection Metrics • Detector Confidence – Score = detection confidence – Intuitively appealing – Can prove problematic in practice • Nearest Neighbor (NN) Distance – Score = minimum distance between detection and labeled examples data point score = minimum distance 14 Selection Metric Behavior Confidence Metric = class 1 Nearest-Neighbor (NN) Metric = class 2 = unlabeled 15 Selection Metric Behavior Confidence Metric = class 1 Nearest-Neighbor (NN) Metric = class 2 = unlabeled 16 Selection Metric Behavior Confidence Metric = class 1 Nearest-Neighbor (NN) Metric = class 2 = unlabeled 17 Selection Metric Behavior Confidence Metric = class 1 Nearest-Neighbor (NN) Metric = class 2 = unlabeled 18 Selection Metric Behavior Confidence Metric = class 1 Nearest-Neighbor (NN) Metric = class 2 = unlabeled 19 Selection Metric Behavior Confidence Metric = class 1 Nearest-Neighbor (NN) Metric = class 2 = unlabeled 20 Semi-Supervised Training & Computer Vision • EM Approaches – S. Baluja. Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data. NIPS 1998. – R. Fergus, P. Perona, A. Zisserman. Object Class Recognition by Unsupervised Scale-Invariant Learning. CVPR 2003. • Self Training – A. Selinger. Minimally Supervised Acquisition of 3D Recognition Models from Cluttered Images. CVPR 2001. • Summary – Reasonable performance improvements reported – “One of” experiments – No insight into issues or general application. 21 Presentation Outline • • • • • • Introduction Background Semi-supervised Training Approach Analysis: Filter Based Detector Analysis: Schneiderman Detector Conclusions and Future Work 22 Filter Based Detector Clutter GMM Input Image xi Filter Bank Feature Vector Object GMM Gaussian Mixture Models fi Mo+Mc 23 Filter Based Detector Overview • Input Features and Model – Features = output of 20 filters at each pixel location – Generative Model = separate Gaussian Mixture Model for object and clutter class – A single model is used for all locations on the object • Detection – Compute filter responses and likelihood under the object and clutter models at each pixel location – “Spatial Model” used to aggregate pixel responses into object level responses 24 Spatial Model Training Images Log Likelihood Ratio Object Masks Log Likelihood Ratio Spatial Model Example Detection 25 Typical Example Filter Model Detections Sample Detection Plots Log Likelihood Ratio Plots 26 Filter Based Detector Overview • Fully Supervised Training – fully labeled example = image + pixel mask – Gaussian Mixture Model parameters trained – Spatial model trained from pixel masks • Semi-Supervised Training – weakly labeled example = image with the object – Initial model is trained using the fully labeled object and clutter data – The spatial model and clutter class model are fixed once trained with the initial labeled data set. – EM and self-training variants are evaluated 27 Self-Training Selection Metrics • Confidence based selection metric – selection is detector odds ratio P (Y Object| X ) P (Y Clutter| X ) • Nearest neighbor (NN) selection metric – selection is distance to closest labeled example – distance is based on a model of each weakly labeled example data point score = minimum distance 28 Filter Based Experiment Details • Training Data – 12 images desktop telephone + clutter, view points +/- 90 degrees – roughly constant scale and lighting conditions – 96 images clutter only • Experimental variations – 12 repetitions with different fully / weakly training data splits • Testing data – 12 images, disjoint set, similar imaging conditions Correct Detection Incorrect Detection 29 Example Filter Model Results Labeled Data Only Self-Training Confidence Metric Expectation-Maximization Self-Training NN Metric 30 Single Image Semi-Supervised Results Labeled Only = 26.7% Confidence Metric = 34.2% Expect-Max = 19.2% 1-NN Selection Metric = 47.5% 31 Two Image Semi-Supervised Results Reference Close Labeled Data Only + Near Pair = 52.5% Near Far 4-NN Metric + Near Pair = 85.8% 32 Presentation Outline • • • • • • Introduction Background Semi-supervised Training Approach Analysis: Filter Based Detector Analysis: Schneiderman Detector Conclusions and Future Work 33 Example Schneiderman Face Detections 34 Schneiderman Detector Details Schneiderman 98,00,03,04 Detection Process Wavelet Transform Feature Construction Classifier 1 log Wavelet Transform Feature Search Search Over Location + Scale P ( F1 |o ) P ( F1 |c ) Feature Selection ... Adaboost Training Process 35 Schneiderman Detector Training Data • Fully Supervised Training – fully labeled examples with landmark locations • Semi-Supervised Training – weakly labeled example = image containing the object – initial model is trained using fully labeled data – Variants of self-training are evaluated 36 Self Training Selection Metrics • Confidence based selection metric – Classifier output / odds ratio 1 log P ( F1 |o ) P ( F1 |c ) 2 log P ( F2 |o ) P ( F2 |c ) ... r log • Nearest Neighbor selection metric – Preprocessing = high pass filter + normalized variance – Mahalanobis distance to closest labeled example P ( Fr |o ) P ( Fr |c ) Labeled Images Candidate Image Score(Wi ) Min j Mah ( g (Wi ), g ( L j ), ) 37 Schneiderman Experiment Details • Training Data – 231 images from the Feret data set and the web – Multiple eyes per image = 480 training examples – 80 synthetic variations – position, scale, orientation – Native object resolution = 24x16 pixels – 15,000 non-object examples from clutter images 38 Schneiderman Experiment Details • Evaluation Metric – +/- 0.5 object radius and +/- 1 scale octave are correct – Area under the ROC curve (AUC) performance measure Detection Rate in Percent • ROC curve = Receiver Operating Characteristic Curve • Detection rate vs. false positive count Number of False Positives 39 Schneiderman Experiment Details • Experimental Variations – 5-10 runs with random data splits per experiment • Experimental Complexity – Training the detector = one iteration – One iteration = 12 CPU hours on a 2 GHz class machine – One run = 10 iterations = 120 CPU hours = 5 CPU days – One experiment = 10 runs = 50 CPU days – All experiments took approximately 3 CPU years • Testing Data – Separate set of 44 images with 102 examples 40 Example Detection Results Fully Labeled Data Only Fully Labeled + Weakly Labeled Data 41 Example Detection Results Fully Labeled Data Only Fully Labeled + Weakly Labeled Data 42 When can weakly labeled data help? Full Data Normalized AUC Performance vs. Fully Labeled Data Set Size smooth saturated failure Fully Labeled Training Set Size on a Log Scale • It can help in the “smooth” regime • Three regimes of operation: saturated, smooth, failure 43 Performance of Confidence Metric Self-Training Full Data Normalized AUC Confidence Metric Self-Training AUC Performance 24 30 34 40 48 Fully Labeled Training Set Size 60 • Improved performance over range of data set sizes. • Not all improvements significant at 95% level. 44 Performance of NN Metric Self-Training Full Data Normalized AUC NN Metric Self-Training AUC Performance 24 30 34 40 48 Fully Labeled Training Set Size 60 • Improved performance over range of data set sizes. • All improvements significant at 95% level. 45 Base Data Normalized AUC Base Data Normalized AUC MSE Metric Changes to Self-Training Behavior Iteration Number Confidence Metric Performance vs. Iteration Iteration Number NN Metric Performance vs. Iteration NN metric performance trend is level or upwards 46 Example Training Image Progression 0.822 0.822 Confidence Metric NN Metric 0.770 1 0.867 0.798 2 0.882 47 Example Training Image Progression 0.798 3 0.922 0.745 4 0.931 0.759 5 0.906 48 Training Data Size Weakly labeled data set size 24 30 34 40 48 60 Fully Labeled Training Set Size Ratio of Weakly to Fully Labeled Data How much weakly labeled data is used? Weakly labeled data set ratio 24 30 34 40 48 60 Fully Labeled Training Set Size It is relatively constant over initial data set size. 49 Presentation Outline • • • • • • Introduction Background Semi-supervised Training Approach Analysis: Filter Based Detector Analysis: Schneiderman Detector Conclusions and Future Work 50 Contributions • Devised approach which achieves substantial performance gains through semi-supervised training. • Comprehensive evaluation (3 CPU years) of semi-supervised training applied to object detection. • Detailed characterization and comparison of semisupervised approaches used – much more analysis and many more details in the thesis. 51 Future Work • Enabling the use of training images with clutter for context – Context priming • A. Torralba, P. Sinha. ICCV 2001 and A. Torralba, K. Murphy, W. Freeman, M. Rubin. ICCV 2003. • Training with weakly labeled data only – Online robot learning – Mining the web for object detection • K. Barnard, D. Forsyth. ICCV 2001. • K. Barnard, P. Duygulu, N. de Frietas, D. Forsyth. D. Blei. M. Jordan. JMLR 2003. 52 Conclusions • Semi-supervised training can be practically applied to object detection to good effect. • Self-training approach can substantially outperform EM. • Selection metric is crucial for self-training performance. 53 ••• 54 ••• 55 Filter Model Results Algorithm Single Image Accuracy Full Data Set 100.0% True Location 86.7% Labeled Only 26.7% Batch EM 19.2% Confidence Metric 34.2% 1-NN Metric 47.5% 4-NN / 40-MM Metric 53.3% Close Pair Accuracy 100.0% 95.8% 40.8% 35.8% 48.3% 64.2% 69.2% Near Pair Accuracy 100.0% 98.3% 52.5% 52.5% 73.3% 82.5% 85.8% Far Pair Accuracy 100% 98.3% 50.8% 54.2% 52.5% 70.8% 76.7% • Key Points – Batch EM does not provide performance increase – Self-training provides a performance increase – 1-NN and 4-NN metrics work better than confidence – “Near Pair” accuracy is highest 56 Weakly Labeled Point Performance Does confidence metric self-training improve point performance? • Yes - over a range of data set sizes. 57 Weakly Labeled Point Performance Does MSE metric self-training improve point performance? • Yes – to a significant level over a range of data set sizes. 58 Schneiderman Features 59 Schneiderman Detection Process 60 Sample Schneiderman Face Detections 61 ••• 62 Simulation Data Labeled and Unlabeled Data Hidden Labels 63 Simulation Data Nearest Neighbor Confidence Metric 64 Simulation Data Model Based Confidence Metric 65 ••• 66 Future Work – Mining the Web “Clinton” Colors “Not-Clinton” Colors Green regions are “Not-Clinton”. 67 Future Work – Mining the Web “Flag” Colors “Not-Flag” Colors Green regions are “Not-Flag”. 68