Download The Limits of OCR

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer vision wikipedia , lookup

Intelligence explosion wikipedia , lookup

Human–computer interaction wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Turing test wikipedia , lookup

Visual Turing Test wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

Transcript
Image Understanding
& Web Security
Henry Baird
Joint work with:
Richard Fateman, Allison Coates, Kris Popat,
Monica Chew, Tom Breuel, & Mark Luk
A fast-emerging research topic
Human Interactive Proofs (HIPs; definition later):
–
–
–
–
first instance in 1999
research took hold in CS security theory field first
intersects image understanding, cog sci, etc etc
fast attracting researchers, engineers, & users
This talk:
–
–
–
–
A brief history of HIPs
Existing systems -- w/ my critiques
Professional activities, so far -- incl. the 1st Int’l Workshop
In detail: PARC’s PessimalPrint & BaffleText
H. Baird & K. Popat, “Web Security & Document Image Analysis,”
in J. Hu & A. Antonacopoulos (Eds.), Web Document Analysis,
World Scientific, 2003 (in press).
DIAR, Madison, WI – June 21, 2003 (HSB)
2
Straws in the wind…

90’s: spammers trolling for email addresses
– in defense, people disguise them, e.g.
“baird AT parc DOT com”

1997: abuse of ‘Add-URL’ feature at AltaVista
– some write programs to add their URL many times
– skewed the search rankings

Andrei Broder et al (then at DEC SRC)
– a user action which is legitimate when performed once
becomes abusive when repeated many times
– no effective legal recourse
– how to block or slow down these programs …
DIAR, Madison, WI – June 21, 2003 (HSB)
3
The first known instance…
Altavista’s AddURL filter
An image of
text, not ASCII

1999: “ransom note filter”
– randomly pick letters, fonts, rotations – render as an image
– every user is required to read and type it in correctly
– reduced “spam add_URL” by “over 95%”

Weaknesses: isolated chars, filterable noise, affine deformations
M. D. Lillibridge, M. Abadi, K. Bharat, & A. Z. Broder, “Method for
Selectively Restricting Access to Computer Systems,” U.S. Patent
No. 6,195,698, Filed April 13, 1998, Issued February 27, 2001.
DIAR, Madison, WI – June 21, 2003 (HSB)
4
Yahoo!’s “Chat Room Problem”
September 2000
Udi Manber asked Prof. Manuel Blum’s group at CMU:
– programs impersonate people in chat rooms,
then hand out ads – ugh!
– how can all machines be denied access to a Web site
without inconveniencing any human users?
I.e., how to distinguish between machines and people on-line
… a kind of ‘Turing test’ !
DIAR, Madison, WI – June 21, 2003 (HSB)
5
Alan Turing (1912-1954)
1936
a universal model of computation
1940s helped break Enigma (U-boat) cipher
1949
first serious uses of a working computer
including plans to read printed text
(he expected it would be easy)
1950
DIAR, Madison, WI – June 21, 2003 (HSB)
proposed a test for machine intelligence
6
Turing’s Test for AI
How to judge that a machine can ‘think’:
– play an ‘imitation game’ conducted via teletypes
– a human judge & two invisible interlocutors:

a human

a machine `pretending’ to be human
– after asking any questions (challenges) he/she
wishes, the judge decides which is human
– failure to decide correctly would be convincing
evidence of machine intelligence (Turing asserted)
Modern GUIs invite richer challenges than teletypes….
A. Turing, “Computing Machinery & Intelligence,” Mind, Vol.
59(236), 1950.
DIAR, Madison, WI – June 21, 2003 (HSB)
7
“CAPTCHAs”:
Completely Automated Public Turing Tests
to Tell Computers & Humans Apart
(M. Blum, L. A. von Ahn, J. Langford, et al, CMU-SCS)




challenges can be generated & graded automatically
(i.e. the judge is a machine)
accepts virtually all humans, quickly & easily
rejects virtually all machines
resists automatic attack for many years
(even assuming that its algorithms are known?)
NOTE: the machine administers, but cannot pass the test!
L. von Ahn, M. Blum, N.J. Hopper, J. Langford, “CAPTCHA: Using
Hard AI Problems For Security,” Proc., EuroCrypt 2003, Warsaw,
Poland, May 4-8, 2003 [to appear].
DIAR, Madison, WI – June 21, 2003 (HSB)
8
CMU’s ‘Gimpy’ CAPTCHA

Randomly pick:
English words, deformations, occlusions, backgrounds, etc



Challenge user to type in any three of the words
Designed by CMU team: tried out by Yahoo!
Problem: users hated it --- Yahoo! withdrew it
L. Von Ahn, M. Blum, N. J. Hopper, J. Langford, The CAPTCHA
Web Page, http://www.captcha.net.
DIAR, Madison, WI – June 21, 2003 (HSB)
9
Yahoo!’s present CAPTCHA:
“EZ-Gimpy”

Randomly pick:
one English word, deformations, degradations, occlusions,
colored backgrounds, etc



Better tolerated by users
Now used on a large scale to protect various services
Weaknesses: a single typeface, English lexicon
DIAR, Madison, WI – June 21, 2003 (HSB)
10
PayPal’s CAPTCHA



Nothing published
Seems to use a single typeface
Picks, at random:
letters, overlain pattern

DIAR, Madison, WI – June 21, 2003 (HSB)
Weaknesses: single typeface, simple grid,
no image degradations, spaced apart
11
Cropping up everywhere…

In use today, to defend against:
–
–
–
–
–
skewing search-engine rankings (Altavista, 1999)
infesting chat rooms, etc (Yahoo!, 2000)
gaming financial accounts (PayPal, 2001)
robot spamming (MailBlocks, SpamArrest 2002)
In the last few months: Overture, Chinese website, HotMail,
CD-rebate, TicketMaster, MailFrontier, Qurb, Madonnarama, …
…have you seen others?

On the horizon:
– ballot stuffing, password guessing, denial-of-service attacks
– `blunt force’ attacks (e.g. UT Austin break-in, Mar ’03)
– …many others
Similar problems w/ scrapers; also, likely on Intranets.
D. P. Baron, “eBay and Database Protection,” Case No. P-33, Case Writing Office, Stanford
Graduate School of Business, Stanford Univ., 2001.
DIAR, Madison, WI – June 21, 2003 (HSB)
12
The Known Limits of
Image Understanding Technology
There remains a large gap in ability
between human and machine vision systems,
even when reading printed text
Performance of OCR machines has been systematically studied:
7 year olds can consistently do better!
This ability gap has been mapped quantitatively
S. Rice, G. Nagy, T. Nartker, OCR: An Illustrated Guide to the
Frontier, Kluwer Academic Publishers: 1999.
DIAR, Madison, WI – June 21, 2003 (HSB)
13
Image Degradation Modeling
thrs x blur
Effects of printing & imaging:
blur
thrs
sens
We can generate challenging
images pseudorandomly
H. Baird, “Document Image Defect Models,” in H. Baird, H. Bunke, & K. Yamamoto (Eds.),
Structured Document Image Analysis, Springer-Verlag: New York, 1992.
DIAR, Madison, WI – June 21, 2003 (HSB)
14
Machine Accuracy is a Smooth
Monotonic Function of Parameters
T. K. Ho & H. S. Baird, “Large Scale Simulation Studies in Image Pattern Recognition,”
IEEE Trans. on PAMI, Vol. 19, No. 10, p. 1067-1079, October 1997.
DIAR, Madison, WI – June 21, 2003 (HSB)
15
Can You Read
These Degraded Images?
Of course you can …. but OCR machines cannot!
DIAR, Madison, WI – June 21, 2003 (HSB)
16
Experiments by PARC & UCB-CS


Pick words at random:
–
70 words commonly used on the Web
–
w/out ascenders or descenders (cf. Spitz)
Vary physics-based image degradation parameters:
blur, threshold, x-scale -- within certain ranges

Pick fonts at random from a large set:
Times Roman (TR), Times Italic (TI),
Palatino Roman (PR), Palatino Italic (PI),
Courier Roman (CR), Courier Oblique (CO), etc

Test legibility on:
– ten human volunteers (UC Berkeley CS Dept grad students)
– three OCR machines:
Expervision TR (E), ABBYY FineReader (A), IRIS Reader (I)
DIAR, Madison, WI – June 21, 2003 (HSB)
17
Results:
OCR Accuracy, by machine
1
0.9
0.8
0.7
fraction 0.6
of words 0.5
correct 0.4
Times R
0.3
Times I
0.2
0.1
Courier O
0
Palatino R
Palatino I
total
Expervis'n
ABBYY
IRIS
OCR machine
Each machine has its peculiar blind spots
DIAR, Madison, WI – June 21, 2003 (HSB)
18
OCR Accuracy:
varying blur & threshold
The machines share some blind spots
DIAR, Madison, WI – June 21, 2003 (HSB)
19
PessimalPrint:
exploiting image degradations
Three OCR machines fail when:
– blur = 0.0
& threshold  0.02 - 0.08
OCR outputs
~~~.I~~~
~~i1~~
N/A
N/A
– threshold = 0.02
& any value of blur
… but people find all these easy to read
N/A
~~I~~
A. Coates, H. Baird, R. Fateman, “Pessimal Print: A Reverse Turing Test,” Proc. 6th
IAPR Int’l Conf. On Doc. Anal. & Recogn. (ICDAR’01), Seattle, WA, Sep 10-13, 2001.
DIAR, Madison, WI – June 21, 2003 (HSB)
20
High Time for a Workshop!
Manuel Blum proposes it, rounds up some key speakers
Henry Baird offers PARC as venue; Kris Popat helps run it
Goals:
Invite all known principals: theory, systems, engineers, users
Describe the state of the art
Plan next steps for the field
Organization:
–
–
–
–
–
–
~30 attendees
abstracts only, 1-5 pages, no refereeing, no archival publication
100% participation: everyone gives a (short) talk
“mixing it up”: panel & working group discussions
2-1/2 days, lots of breaks for informal socializing
plenary talk by John McCarthy ‘Father of AI’
DIAR, Madison, WI – June 21, 2003 (HSB)
21
1st NSF Int’l Workshop on
Human Interactive Proofs
PARC, Palo Alto, CA, January 9-11, 2002
DIAR, Madison, WI – June 21, 2003 (HSB)
22
HIP’2002 Participants
CMU - SCS, Aladdin Center
Manuel Blum, Lenore Blum, Luis von
Ahn, John Langford, Guy Blelloch,
Nick Hopper, Ke Yang, Brighten
Godfrey, Bartosz Przydatek, Rachel
Rue
PARC - SPIA/Security/Theory
Henry Baird, Kris Popat, Tom Breuel,
Prateek Sarkar, Tom Berson, Dirk
Balfanz, David Goldberg
UCB - CS & SIMS
Richard Fateman, Allison Coates,
Jitendra Malik, Doug Tygar, Alma
Whitten, Rachna Dhamija, Monica
Chew, Adrian Perrig, Dawn Song
RPI
George Nagy
Stanford
John McCarthy
Altavista
Andrei Broder
Yahoo!
Udi Manber
Bell Labs
Dan Lopresti
IBM T.J. Watson
Charles Bennett
InterTrust Star Labs
Stuart Haber
City Univ. of Hong Hong
Nancy Chan
Weizmann Institute
Moni Naor
RSA Security Laboratories
Ari Juels
Document Recognition Techs, Inc
NSF
Robert Sloan
DIAR, Madison, WI – June 21, 2003 (HSB)
Larry Spitz
23
Variations & Generalizations

CAPTCHA
Completely Automatic Public Turing test to tell Computers and
Humans Apart

HUMANOID
Text-based dialogue which an individual can use to authenticate
that he/she is himself/herself (‘naked in a glass bubble’)

PHONOID
Individual authentication using spoken language
Human Interactive Proof (HIP)
An automatically administered challenge/response protocol
allowing a person to authenticate him/herself as belonging to a
certain group over a network without the burden of passwords,
biometrics, mechanical aids, or special training.
DIAR, Madison, WI – June 21, 2003 (HSB)
24
Highlights of HIP’2002

Theory
– some text-based CAPTCHAs are provably breakable

Ability Gaps
–
–
–
–

vision: gestalt, segmentation, noise immunity, style consistency
speech: noise of many kinds, clutter (cocktail party effect)
intelligence: puzzles, analogical reasoning, weak logic
gestures, reflexes, common knowledge, …
Applications
– subtle system-level vulnerabilties
– aggressive arms race with shadowy enemies
http://www.parc.com/istl/groups/did/HIP2002
DIAR, Madison, WI – June 21, 2003 (HSB)
25
Funding & Partnerships

NSF
– Robert Sloan, Dir, Theory of Computing Pgm
– strongly supportive of this newborn field
– encouraged grant proposals

Yahoo!
– willing to run field trials
– user acceptance laboratory
– able to detect intrusion
DIAR, Madison, WI – June 21, 2003 (HSB)
26
Disciplines
Participating now:






Cryptography
Security
Pattern Recognition
Computer Vision
Artificial Intelligence
eCommerce
Needed:





Cognitive Science
Psychophysics (esp. of Reading)
Biometrics
Business, Law, …
….?
DIAR, Madison, WI – June 21, 2003 (HSB)
27
Weaknesses of Existing
Reading-Based CAPTCHAs

English lexicon is too predictable:
– dictionaries are too small
– only 1.2 bits of entropy per character (cf. Shannon)

Physics-based image degradations vulnerable
to well-studied image restoration attacks, e.g.


Complex images irritate people
– even when they can read them
– need user-tolerance experiments
DIAR, Madison, WI – June 21, 2003 (HSB)
28
Strengths of
Human Reading
Literature on the psychophysics of reading is relevant:




familiarity helps, e.g. English words
optimal word-image size (subtended angle)
is known (0.3-2 degrees)
optimal contrast conditions known
other factors measured for the best performance:
to achieve and sustain “critical reading speed”
BUT gives no answer to:
where’s the optimal comfort zone?
G. E. Legge, D. G. Pelli, G. S. Rubin, & M. M. Schleske, “Psychophysics of Reading:
I. normal vision,” Vision Research 25(2), 1985.
A. J. Grainger & J. Segui, “Neighborhood Frequency Effects in Visual Word
Recognition,’ Perception & Psychophysics 47, 1990..
DIAR, Madison, WI – June 21, 2003 (HSB)
29
Designing a Stronger CAPTCHA:
BaffleText principles

Nonsense words.
– generate ‘pronounceable’ – not ‘spellable’ – words
using a variable-length character n-gram Markov model
– they look familiar, but aren’t in any lexicon, e.g.
ablithan

wouquire
quasis
Gestalt perception.
– force inference of a whole word-image
from fragmentary or occluded characters, e.g.
– using a single familiar typeface also helps
M. Chew & H. S. Baird, “BaffleText: A Human Interactive Proof,” Proc., SPIE/IS&T
Conf. on Document Recognition & Retrieval X, Santa Clara, CA, January 23-24, 2003.
DIAR, Madison, WI – June 21, 2003 (HSB)
30
Mask Degradations
Parameters of pseudorandom mask generator:
– shape type: square, circle, ellipse, mixed
– density: black-area / whole-area
– range of radii of shapes
DIAR, Madison, WI – June 21, 2003 (HSB)
31
BaffleText Experiments at PARC


Goal: map the margins of accurate & comfortable
human reading on this family of images
Metrics:
–
–
–
–

objective difficulty: accuracy
subjective difficulty: rating
response time
exit survey: how tolerable overall
Participation:
– 41 individual sessions
– >1200 challenge/response trials
– 18 exit surveys
DIAR, Madison, WI – June 21, 2003 (HSB)
32
BaffleText challenge webpage
DIAR, Madison, WI – June 21, 2003 (HSB)
33
BaffleText user ratings
DIAR, Madison, WI – June 21, 2003 (HSB)
34
User Acceptance
% Subjects willing to solve a BaffleText…
17% every time they send email
39% … if it cut spam by 10x
89% every time they register for an e-commerce site
94% … if it led to more trustworthy recommendations
100% every time they register for an email account
Out of 18 responses to the exit survey.
DIAR, Madison, WI – June 21, 2003 (HSB)
35
Subjective difficulty
tracks objective difficulty
DIAR, Madison, WI – June 21, 2003 (HSB)
36
How to engineer BaffleText

When we generate a challenge,
– need to estimate its difficulty
– throw away if too easy or too hard

Apply an idea from the psychophysics of reading:
– image “complexity” metric: how hard to read
– simple to compute:
DIAR, Madison, WI – June 21, 2003 (HSB)
perimeter** / black-area
37
Image complexity
predicts objective difficulty
DIAR, Madison, WI – June 21, 2003 (HSB)
38
Image complexity
predicts subjective difficulty
DIAR, Madison, WI – June 21, 2003 (HSB)
39
Engineering guidelines

For high performance, image complexity
should fall in the range 50-100; e.g.
50

100
Within this regime, BaffleText performs well:
– 100% human subjects willing to try to read it
–
–
–
–
89% accuracy by humans
0% accuracy by commercial OCR
3.3 difficulty rating, out of 10 (on average)
8.7 seconds / trial on average
DIAR, Madison, WI – June 21, 2003 (HSB)
40
G. Mori & J. Malik, “Recognizing Objects in
Adversarial Clutter,” submitted to CVPR’03,
Madison, WI, June 16-22, 2003.
The latest serious
(known or published) attack…
Greg Mori & Jitendra Malik (UCB-CS)
– Generalized Shape Context CV method
– requires known lexicon – else, fails completely
– expects known font (or fonts) – else, does worse
Results of Mori-Malik attacks (Dec 2002) given
perfect foreknowledge of both lexicon and font:
CAPTCHA
EZ-GIMPY
Attack success rate
83%
Yahoo! + CMU
PessimalPrint
40%
PARC + UCB
BaffleText
25%
PARC + UCB
DIAR, Madison, WI – June 21, 2003 (HSB)
41
BaffleText:
the strongest known CAPTCHA?

Resists many known algorithmic attacks:
– physics-based image restoration
– recognizing into a lexicon
– known-typeface targeting
– segmenting then recognizing

Exploits hard-to-automate human cognition powers:
– Gestalt perception
– “semi-linguistic” familiarity
– within-typeface “style consistency”
DIAR, Madison, WI – June 21, 2003 (HSB)
42
Recent Microsoft CAPTCHA
• Random strings, local space-warping; plus meaningless
curving strokes, both black (overlaid) and white (erasing)
• Fielded Dec 2002 on Passport (HotMail, etc)
• Immediate reduction in new Hotmail accounts, with
virtually no user complaints
P. Y. Simard, R. Szeliski, J. Benaloh, J. Couvreur, I. Calinov, “Using Character Recognition and
Segmentation to Tell Computer from Humans,” Proc., Int’l Conf. on Document Analysis &
Recognition, Edinburgh, Scotland, August, 2003 [to appear].
DIAR, Madison, WI – June 21, 2003 (HSB)
43
PARC’s Leadership in R&D on
Reading-based CAPTCHAs

First refereed article on CAPTCHAs:
A. L. Coates, H. S. Baird, R. Fateman, “Pessimal Print: a Reverse Turing Test,” Proc., 6th
IAPR Int’l Conf. On Document Analysis & Recognition, Seattle, WA, Sept. 10-13, 2001.

First professional HIP event, organized by PARC:
1st NSF Int’l Workshop on HIPs, Jan. 9-11, 2002, PARC, Palo Alto, CA.

First to ‘play both offense & defense’:
– builds high-performance OCR systems; attacks CAPTCHAs
– builds strong CAPTCHAs

First to validate using human-factors research:
– human-subject trials measuring both accuracy & tolerance
– PARC’s interdisciplinary tradition: social + computer sciences
DIAR, Madison, WI – June 21, 2003 (HSB)
44
The Arms Race

When will serious technical attacks be launched?
– ‘spam kings’ make $$ millions
– two spam-blocking e-commerce firms now use CAPTCHAs

How long can a CAPTCHA withstand attack?
– especially if its algorithms are published or guessed

Strategy: keep a pipeline of defenses in reserve:
– continuing partnership between R&D & users
DIAR, Madison, WI – June 21, 2003 (HSB)
45
Lots of Open Research Questions

What are the most intractable obstacles to machine vision?
segmentation, occlusion, degradations, …?

Under what conditions is human reading most robust?
linguistic & semantic context, Gestalt, style consistency…?

Where are ‘ability gaps’ located?
quantitatively, not just qualitatively

How to generate challenges strictly within ability gaps?
fully automatically
an indefinitely long sequence of distinct challenges
DIAR, Madison, WI – June 21, 2003 (HSB)
46
HIP Research Community

PARC CAPTCHA website
www.parc.com/istl/projects/captcha

HIP’2002 Workshop
www.parc.com/istl/groups/did/HIP2002

HIP Website at Aladdin Center, CMU-SCS
www.captcha.net

Volunteers for a PARC CAPTCHA usability test?

A 2nd HIP Workshop soon?
DIAR, Madison, WI – June 21, 2003 (HSB)
47
Alan Turing might have
enjoyed the irony …
A technical problem – machine reading –
which he thought would be easy,
has resisted attack for 50 years, and
now allows the first widespread
practical use of variants of
his test for artificial intelligence.
DIAR, Madison, WI – June 21, 2003 (HSB)
48
Contact
Henry S. Baird
[email protected]
www.parc.com/baird
DIAR, Madison, WI – June 21, 2003 (HSB)
49