Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Image Understanding & Web Security Henry Baird Joint work with: Richard Fateman, Allison Coates, Kris Popat, Monica Chew, Tom Breuel, & Mark Luk 1 A fast-emerging research topic Human Interactive Proofs (definition later): – – – – first instance in 1999 research took hold in CS security theory field first intersects image understanding, cog sci, etc etc fast attracting researchers, engineers, & users This talk: – – – – – A brief history of HIPs Professional activities, so far -- incl. the 1st Int’l Workshop Existing systems -- w/ my critiques Next steps for the field In detail: PARC’s PessimalPrint & BaffleText H. Baird & K. Popat, “Web Security & Document Image Analysis,” in J. Hu & A. Antonacopoulos (Eds.), Web Document Analysis, World Scientific, 2003 (in press). CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 2 Early rumblings… 90’s: spammers trolling for email addresses – in defense, people disguise them, e.g. “baird at parc dot com” 1997: abuse of ‘Add-URL’ feature at AltaVista – some write programs to add their URL many times – skewed the popularity rankings Andrei Broder et al (then at DEC SRC) – a user action which is legitimate when performed once becomes abusive when repeated many times – no effective legal recourse – how to block or slow down these programs … CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 3 The first known instance: Altavista’s AddURL filter An image of text, not ASCII 1999: “ransom note filter” – randomly pick letters, fonts, rotations – render as an image – every user required to read and type it in correctly – reduced “spam add_URL” by “over 95%” Weaknesses: isolated chars, filterable noise, affine deformations M. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for Selectively Restricting Access to Computer Systems,” U.S. Patent No. 6,195,698, Issued February 27, 2001. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 4 Yahoo!’s “Chat Room Problem” September 2000 Udi Manber asked Prof. Manuel Blum’s group at CMS-SCS: – programs impersonate people in chat rooms, then hand out ads – ugh! – how can all machines be denied access to a Web site without inconveniencing any human users? I.e., how to distinguish between machines and people on-line … some variation on ‘Turing tests’ ! CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 5 Alan Turing (1912-1954) 1936 a universal model of computation 1940s helped break Enigma (U-boat) cipher 1949 first serious uses of a working computer including plans to read printed text (he expected it would be easy) 1950 CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) proposed strong test for machine intelligence 6 Turing Tests How to judge that a machine can ‘think’: – play an ‘imitation game’ conducted via teletypes – a human judge & two invisible interlocutors: a human a machine `pretending’ to be human – after asking any questions (challenges) he/she wishes, the judge decides which is human – failure to decide correctly would be convincing evidence of machine intelligence (Turing asserted) Modern GUIs invite richer challenges than teletypes…. A. Turing, “Computing Machinery & Intelligence,” Mind, Vol. 59(236), 1950. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 7 “CAPTCHAs”: Completely Automated Public Turing Tests to Tell Computers & Humans Apart (M. Blum, L. A. von Ahn, J. Langford, et al, CMU SCS) challenges can be generated & graded automatically (i.e. the judge is a machine) accepts virtually all humans, quickly & easily rejects virtually all machines resists automatic attack for many years (even assuming that its algorithms are known?) NOTE: the machine administers, but cannot pass the test! L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw, Poland, May 4-8, 2003 [to appear]. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 8 CMU’s ‘Gimpy’ CAPTCHA Randomly pick: English words, deformations, occlusions, backgrounds, etc Challenge user to type in any three of the words Designed by CMU team: tried out by Yahoo! Problem: users hated it --- it was withdrawn L. Von Ahn, M. Blum, N. J. Hopper, J. Langford, The CAPTCHA Web Page, http://www.captcha.net. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 9 Yahoo!’s present CAPTCHA: “EZ-Gimpy” Randomly pick: one English word, deformations, degradations, occlusions, colored backgrounds Better tolerated by users Now used on a large scale to protect various services: Well tolerated by users Weaknesses: a single typeface, English lexicon CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 10 PayPal’s CAPTCHA Nothing published Seems to use one typeface Picks, at random: letters, overlain pattern Weaknesses: single typeface, simple grid, no image degradations, spaced apart CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 11 Cropping up everywhere… In use today, defending against: – – – – – …have you seen others? Coming up over the horizon: they can discourage… – – – – skewing search-engine rankings (Altavista, 1999) infesting chat rooms, etc (Yahoo!, 2000) gaming financial accounts (PayPal, 2001) robot spamming (SpamArrest, MailBlock, 2002) also: Overture, Chinese website, CD-rebate, TicketMaster, … password guessing denial-of-service attacks ballot stuffing …many others Similar problems w/ scrapers; also, likely on Intranets. D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing Office, Stanford Graduate School of Business, Stanford Univ., 2001. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 12 The Known Limits of Image Understanding Technology There remains a large gap in ability between human and machine vision systems, even in reading printed text The performance of OCR machines has been systematically studied: 7 year olds can consistently do better! Researchers have developed stochastic models of document image degradation: so we can generate challenging word images pseudorandomly S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the Frontier, Kluwer Academic Publishers: 1999. H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke, & K. Yamamoto (Eds.), Structured Document Image Analysis, Springer-Verlag: New York, 1992. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 13 Can You Read These Degraded Images? Of course you can …. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) but OCR machines cannot! 14 Experiments by PARC & UCB-CS Pick words at random: – 70 words commonly used on the Web – w/out ascenders or descenders (cf. Spitz) Vary physics-based image degradation parameters: blur, threshold, x-scale -- within certain ranges Pick fonts at random from a large set: Times Roman (TR), Times Italic (TI), Palatino Roman (PR), Palatino Italic (PI), Courier Roman (CR), Courier Oblique (CO), etc Test legibility on: – ten human volunteers (UC Berkeley CS Dept grad students) – three OCR machines: Expervision TR (E), ABBYY FineReader (A), IRIS Reader (I) CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 15 Results: OCR Accuracy, by machine 1 0.9 0.8 0.7 fraction 0.6 of words 0.5 correct 0.4 Times R 0.3 Times I 0.2 0.1 Courier O 0 Palatino R Palatino I total Expervis'n ABBYY IRIS OCR machine Each machine has its peculiar blind spots CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 16 OCR Accuracy: varying blur & threshold They share some blind spots CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 17 PessimalPrint: exploiting image degradations Three OCR machines fail when: – blur = 0.0 & threshold 0.02 - 0.08 OCR outputs ~~~.I~~~ ~~i1~~ N/A N/A – threshold = 0.02 & any value of blur … but people find these easy to read A. Coates, H. Baird, R. Fateman, “PessimalPrint: A Reverse Turing Test,” Proc. 6th IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01), Seattle, WA, Sep 10-13, 2001. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) N/A ~~I~~ 18 Jan 2002: High Time for a Workshop! Manuel Blum proposes it, rounds up some key speakers Henry Baird offers PARC as venue; Kris Popat helps run it Goals: Invite known principals: theory, systems, engineers, users Describe the state of the art Plan next steps for the field Organization: – – – – – – ~30 attendees abstracts only, 1-5 pages, no refereeing, no archival publication 100% participation: everyone gives a (short) talk “mixing it up”: panel & working group discussions 2-1/2 days, lots of breaks for informal socializing plenary talk by John McCarthy ‘Father of AI’ CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 19 NSF 1st Int’l HIP Workshop Jan 9-11, 2002, Palo Alto, CA CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 20 HIP’2002 Participants CMU - SCS, Aladdin Center Manuel Blum, Lenore Blum, Luis von Ahn, John Langford, Guy Blelloch, Nick Hopper, Ke Yang, Brighten Godfrey, Bartosz Przydatek, Rachel Rue PARC - SPIA/Security/Theory Henry Baird, Kris Popat, Tom Breuel, Prateek Sarkar, Tom Berson, Dirk Balfanz, David Goldberg UCB - CS & SIMS Richard Fateman, Allison Coates, Jitendra Malik, Doug Tygar, Alma Whitten, Rachna Dhamija, Monica Chew, Adrian Perrig, Dawn Song RPI George Nagy Stanford John McCarthy Altavista Andrei Broder Yahoo! Udi Manber Bell Labs Dan Lopresti IBM T.J. Watson Charles Bennett InterTrust Star Labs Stuart Haber City Univ. of Hong Hong Nancy Chan Weizmann Institute Moni Naor RSA Security Laboratories Ari Juels Document Recognition Techs, Inc NSF Robert Sloan CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) Larry Spitz 21 Variations & Generalizations CAPTCHA Completely Automatic Public Turing test to tell Computers and Humans Apart HUMANOID Text-based dialogue which an individual can use to authenticate that he/she is himself/herself (‘naked in a glass bubble’) PHONOID Individual authentication using spoken language Human Interactive Proof (HIP) An automatically administered challenge/response protocol allowing a person to authenticate him/herself as belonging to a certain group over a network without the burden of passwords, biometrics, mechanical aids, or special training. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 22 Highlights Theory – some text-based CAPTCHAs are provably breakable Ability Gaps – – – – vision: gestalt, segmentation, noise immunity, style consistency speech: noise of many kinds, clutter (cocktail party effect) intelligence: puzzles, analogical reasoning, weak logic gestures, reflexes, common knowledge, … Applications – subtle system-level vulnerabilties – aggressive arms race with shadowy enemies CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 23 Disciplines Participating Cryptography Security Document Image Analysis Computer Vision Artificial Intelligence Needed Cognitive Science Psychophysics (esp. of Reading) Biometrics eCommerce, Business ….? CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 25 Weaknesses of Existing Reading-Based CAPTCHAs English lexicon is too predictable: – dictionaries are too small – only 1.2 bits of entropy per character (cf. Shannon) Physics-based image degradations vulnerable to well-studied image restoration attacks, e.g. Complex images irritate people – even when they can read them – need user-tolerance experiments CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 26 Strengths of Human Reading Literature on the psychophysics of reading is relevant: familiarity helps, e.g. English words optimal word-image size (subtended angle) is known (0.3-2 degrees) optimal contrast conditions known other factors measured for the best performance: to achieve and sustain “critical reading speed” BUT gives no answer to: where’s the optimal comfort zone? G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske, “Psychophysics of Reading: I. normal vision,” Vision Research 25(2), 1985. AJ. Grainger & J. Segui, “Neighborhood Frequency Effects in Visual Word Recognition,’ Perception & Psychophysics 47, 1990.. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 27 Designing a Stronger CAPTCHA: BaffleText principles Nonsense words. – generate ‘pronounceable’ – not ‘spellable’ – words using a variable-length character n-gram Markov model – they look familiar, but aren’t in any lexicon, e.g. ablithan wouquire quasis Gestalt perception. – force inference of a whole word-image from fragmentary or occluded characters, e.g. – using a single familiar typeface also helps people M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,” Proc., SPIE/IS&T Conf. on Document Recognition & Retrieval X, Santa Clara, CA, January 23-24, 2003. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 28 Mask Degradations Parameters of pseudorandom mask generator: – shape type: square, circle, ellipse, mixed – density: black-area / whole-area – range of radii of shapes CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 29 BaffleText Experiment at PARC Goal: map the margins of accurate & comfortable human reading on this family of images Metrics: – – – – objectiive difficulty: accuracy subjective difficulty: rating response time exit survey: how tolerable overall Participation: – 41 individual sessions – >1200 challenge/response trials – 18 exit surveys CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 30 BaffleText challenge webpage CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 31 BaffleText user rating CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 32 User Acceptance % Subjects who say they’re willing to solve a BaffleText… 17% every time they send email 39% … if it cut spam by 10x 89% every time they register for an e-commerce site 94% … if it led to more trustworthy recommendations 100% every time they register for an email account Out of 18 responses to the exit survey. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 33 Subjective difficulty tracks objective difficulty CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 34 How to engineer BaffleText When we generate a challenge, – need to be able to estimate its difficulty – throw away if too easy or too hard Apply an idea from the psychophysics of reading: – image “complexity” metric: how hard to read – simple to compute: CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) perimeter** / black-area 35 Image complexity predicts objective difficulty CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 36 Image complexity predicts subjective difficulty CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 37 Engineering guidelines For high performance, image complexity should fall in the range 50-100; e.g. 50 100 Within this regime, BaffleText performs well: – – – – – 100% human subjects willing to try to read it 89% accuracy by humans 0% accuracy by commercial OCR 3.3 difficulty rating, out of 10 (on average) 8.7 seconds / trial on average CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 38 G. Mori & J. Malik, “Recognizing Objects in Adversarial Clutter,” submitted to CVPR’03, Madison, WI, June 16-22, 2003. The latest serious (known or published) attack… Greg Mori & Jitendra Malik (UCB-CS) – Generalized Shape Context CV method – requires known lexicon – else, fails completely – expects known font (or fonts) – else, does worse Results of Mori-Malik attacks (Dec 2002) with foreknowledge of both lexicon and font: CAPTCHA EZ-GIMPY Attack success rate 83% Yahoo! + CMU PessimalPrint 40% PARC + UCB BaffleText 25% PARC + UCB CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 39 BaffleText: the strongest known CAPTCHA? Resists many known attacks: – physics-based image restoration – recognizing into a lexicon – typeface targeting – segmenting then recognizing Exploits hard-to-automate human cognition powers: – Gestalt perception – “semi-linguistic” familiarity – style consistency CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 40 PARC’s Leadership Role Published 1st refereed paper on CAPTCHAs: A. L. Coates, H. S. Baird, R. Fateman, “Pessimal Print: a Reverse Turing Test,” Proc., 6th IAPR Int’l Conf. On Document Analysis & Recognition, Seattle, WA, Sept. 10-13, 2001. Hosted first professional event: 1st NSF Int’l Workshop on HIPs, Jan. 9-11, 2002, Palo Alto, CA. Plays both offense & defense: – attacks CAPTCHAs; builds high-performance OCR systems – builds strong CAPTCHAs Validates using human-factors research: – human-subject trials measuring accuracy & tolerance – PARC’s interdisciplinary tradition: social + computer sciences CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 41 The Arms Race Will serious technical attacks be launched? – ‘spam kings’ make $$ millions – two spam-blocking e-commerce firms use CAPTCHAs How long can a CAPTCHA stand against attack? – especially if its algorithms are published or guessed Keep a pipeline of defenses in reserve: – a long partnership between research & users CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 42 Lots of Open Research Questions What are the most intractable obstacles to machine vision? segmentation, occlusion, degradations, …? Under what conditions is human reading most robust? linguistic & semantic context, Gestalt, style consistency…? Where are ‘ability gaps’ located? quantitatively, not just qualitatively How to generate challenges strictly within ability gaps? fully automatically an indefinitely long sequence of distinct challenges CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 43 HIP Research Community HIP Website at Aladdin Center, CMU SCS www.captcha.net Volunteers for a CAPTCHA usability test? PARC CAPTCHA experimental software tools: – FreeType-based, C++, C for Linux etc (T. Breuel) – Doc. image degradation generator (H. Baird) – New Gestalt-inspired degradations (M. Chew, UCB) – PHP4 code for CAPTCHA test web site (M. Chew, M. Luk) … would a free GPL license be acceptable? A 2nd HIP Workshop soon? CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 44 Alan Turing might have enjoyed the irony … A technical problem – machine reading – which he thought would be easy, has resisted attack for 50 years, and now allows the first widespread practical use of variants of his test for artificial intelligence. CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 45 Contact Henry S. Baird [email protected] www.parc.com/baird CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 46 47 CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 48 CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 49 OCR Accuracy: varying blur Performance for threshold = 0.7 1 fraction correct 0.8 E TR 0.6 A TR 0.4 0.2 0 0 0.1 0.2 Blur parameter CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 50 OCR Accuracy: varying blur Performance for threshold = 0.7 1 0.8 fraction 0.6 correct 0.4 E TI A TI 0.2 0 0 0.1 0.2 Blur parameter CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 51 OCR Accuracy: varying blur Performance for threshold = 0.7 1 E TR A TR 0.8 fraction 0.6 correct 0.4 E TI A TI 0.2 0 0 0.1 0.2 Blur parameter CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 52 Yahoo!’s current CAPTCHA Randomly pick: one English word, typeface, distortions, occlusions, background More tolerable to users Used on a large scale to protect various services CSD, Nasa Ames, Moffat Field - Feb 19, 2003 (HSB) 53