Download Lecture 19: Conclusion Roadmap Paradigm Course plan State

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of artificial intelligence wikipedia , lookup

Transcript
Lecture 19: Conclusion
Roadmap
Natural language understanding
Computer vision
Summary of CS221
Next courses
History of AI
Food for thought
CS221 / Autumn 2016 / Liang
CS221 / Autumn 2016 / Liang
39
Paradigm
Course plan
[utterance: user input]
semantic parsing
Search problems
[program]
Markov decision processes
Constraint satisfaction problems
Adversarial games
Bayesian networks
States
Variables
Reflex
execute
”Low-level intelligence”
Logic
”High-level intelligence”
Machine learning
[denotation:
user output]
Induce hidden program to accomplish end goal
CS221 / Autumn 2016 / Liang
19
CS221 / Autumn 2016 / Liang
State-based models
S
41
Variable-based models
F
D
NT
Q
WA
B
C
G
SA
E
NSW
V
Key idea: state
T
A state is a summary of all the past actions sufficient to choose
future actions optimally.
Key idea: factor graphs
Graph structure captures conditional independence.
CS221 / Autumn 2016 / Liang
42
CS221 / Autumn 2016 / Liang
43
Logic-based models
Syntax
Machine learning
Semantics
Objective: loss minimization
formula
min
X
Loss(x, y, w)
w
models
(x,y)∈Dtrain
Algorithm: stochastic gradient descent
Inference
rules
w → w − ηt ∇Loss(x, y, w)
|
{z
}
prediction−target
Applies to many models!
Key idea: logic
Formulas enable more powerful models (infinite).
CS221 / Autumn 2016 / Liang
44
CS221 / Autumn 2016 / Liang
Tools
7
Roadmap
• CS221 provides a set of tools to solve real-world problems
Natural language understanding
Computer vision
Summary of CS221
Next courses
History of AI
• Start with the nail, and figure out what tool to use
Food for thought
CS221 / Autumn 2016 / Liang
75
CS221 / Autumn 2016 / Liang
46
Other AI-related courses
Other AI-related courses
http://ai.stanford.edu/courses/
http://ai.stanford.edu/courses/
Foundations:
Applications:
• CS228: Probabilistic Graphical Models
• CS229: Machine Learning
• CS224N: Natural Language Processing (with Deep Learning)
• CS229T: Statistical Learning Theory
• CS224U: Natural Language Understanding
• CS334A: Convex Optimization
• CS231A: Introduction to Computer Vision
• ...
• CS231N: Convolutional Neural Networks for Visual Recognition
Applications:
• CS224N: Natural Language Processing
• CS223A: Introduction to Robotics
• CS276: Information Retrieval and Web Search
• CS227B: General Game Playing
• CS231A: Introduction to Computer Vision
• CS224W: Social and Information Network Analysis
• CS223A: Introduction to Robotics
CS221 / Autumn 2016 / Liang
47
CS221 / Autumn 2016 / Liang
11
Probabilistic graphical models (CS228)
Machine learning (CS229)
• Boosting, bagging, feature selection
h
• Discrete ⇒ continuous
x
• Forward-backward, variable elimination ⇒ belief propagation, variational inference
• K-means ⇒ mixture of Gaussians
• Gibbs sampling ⇒ Markov Chain Monte Carlo (MCMC)
• Q-learning ⇒ policy gradient
• Learning the structure
CS221 / Autumn 2016 / Liang
12
CS221 / Autumn 2016 / Liang
Statistical learning theory (CS229T)
48
Cognitive science
Question: what are the mathematical principles behind learning?
Question: How does the human mind work?
• Cognitive science and AI grew up together
Uniform convergence: with probability at least 0.95, your algorithm will
return a predictor h ∈ H such that
• Humans can learn from few examples on many tasks
Computation and cognitive science (PSYCH204):
TestError(h) ≤ TrainError(h) +
q
Complexity(H)
n
CS221 / Autumn 2016 / Liang
• Cognition as Bayesian modeling — probabilistic program [Tenenbaum, Goodman, Griffiths]
14
CS221 / Autumn 2016 / Liang
Neuroscience
51
Online materials
• Online courses (Coursera, edX)
• Videolectures.net: tons of recorded talks from major leaders of AI
(and other fields)
• Neuroscience: hardware; cognitive science: software
• Artificial neural network as computational models of the brain
• Modern neural networks (GPUs + backpropagation) not biologically plausible
• arXiv.org: latest research (pre-prints)
• Analogy: birds versus airplanes; what are principles of intelligence?
CS221 / Autumn 2016 / Liang
16
CS221 / Autumn 2016 / Liang
52
Conferences
Roadmap
• AI: IJCAI, AAAI
Natural language understanding
• Machine learning: ICML, NIPS, UAI, COLT
Computer vision
• Data mining: KDD, CIKM, WWW
Summary of CS221
• Natural language processing: ACL, EMNLP, NAACL
Next courses
• Computer vision: CPVR, ICCV, ECCV
History of AI
• Robotics: RSS, ICRA
CS221 / Autumn 2016 / Liang
Food for thought
53
CS221 / Autumn 2016 / Liang
Pre-AI developments
54
Birth of AI
1956: Workshop at Dartmouth College; attendees: John McCarthy, Marvin Minsky, Claude Shannon, etc.
Philosophy: intelligence can be achieved via mechanical
computation (e.g., Aristotle)
Church-Turing thesis (1930s): any computable function
is computable by a Turing machine
Real computers (1940s): actual hardware to do it: Heath
Robinson, Z-3, ABC/ENIAC
Aim for general principles:
Every aspect of learning or any other feature of intelligence can be so
precisely described that a machine can be made to simulate it.
CS221 / Autumn 2016 / Liang
55
CS221 / Autumn 2016 / Liang
Birth of AI, early successes
56
Overwhelming optimism...
Machines will be capable, within twenty years, of doing any work a man
can do. —Herbert Simon
Checkers (1952): Samuel’s program learned weights and
played at strong amateur level
Within 10 years the problems of artificial intelligence will be substantially
solved. —Marvin Minsky
Problem solving (1955): Newell & Simon’s Logic Theorist: prove theorems in Principia Mathematica using
search + heuristics; later, General Problem Solver (GPS)
CS221 / Autumn 2016 / Liang
I visualize a time when we will be to robots what dogs are to humans,
and I’m rooting for the machines. —Claude Shannon
57
CS221 / Autumn 2016 / Liang
58
...underwhelming results
Implications of early era
Example: machine translation
Problems:
• Limited computation: search space grew exponentially, outpacing hardware (100! ≈ 10157 > 1080 )
• Limited information: complexity of AI problems (number of
words, objects, concepts in the world)
The spirit is willing but the flesh is weak.
(Russian)
Contributions:
The vodka is good but the meat is rotten.
• Lisp, garbage collection, time-sharing (John McCarthy)
• Key paradigm: separate modeling (declarative) and algorithms
(procedural)
1966: ALPAC report cut off government funding for MT
CS221 / Autumn 2016 / Liang
59
Knowledge-based systems (70-80s)
CS221 / Autumn 2016 / Liang
60
Knowledge-based systems (70-80s)
DENDRAL: infer molecular structure from mass spectrometry
MYCIN: diagnose blood infections, recommend antibiotics
Expert systems: elicit specific domain knowledge from experts in form
of rules:
XCON: convert customer orders into parts specification;
save DEC $40 million a year by 1986
if [premises] then [conclusion]
CS221 / Autumn 2016 / Liang
61
CS221 / Autumn 2016 / Liang
Knowledge-based systems
The Complexity Barrier
Contributions:
• First real application that impacted industry
• Knowledge helped curb the exponential growth
Problems:
• Knowledge is not deterministic rules, need to model uncertainty
• Requires considerable manual effort to create rules, hard to maintain
CS221 / Autumn 2016 / Liang
62
63
A number of people have suggested to me that large programs like the
SHRDLU program for understanding natural language represent a kind
of dead end in AI programming. Complex interactions between its components give the program much of its power, but at the same time they
present a formidable obstacle to understanding and extending it. In order
to grasp any part, it is necessary to understand how it fits with other
parts, presents a dense mass, with no easy footholds. Even having written the program, I find it near the limit of what I can keep in mind at
once.
— Terry Winograd (1972)
CS221 / Autumn 2016 / Liang
29
Modern AI (90s-present)
Modern AI (90s-present)
• Probability: Pearl (1988) promote Bayesian networks in AI to
model uncertainty (based on Bayes rule from 1700s)
model
predictions
• Machine learning: Vapnik (1995) invented support vector machines to tune parameters (based on statistical models in early
1900s)
data
reasoning
perception
model
CS221 / Autumn 2016 / Liang
30
CS221 / Autumn 2016 / Liang
31
Roadmap
Roadmap
Summary of CS221
Natural language understanding
Next courses
Computer vision
Summary of CS221
History of AI
Next courses
My research
History of AI
Food for thought
CS221 / Autumn 2016 / Liang
Food for thought
32
CS221 / Autumn 2016 / Liang
1
Property 1: compositionality
Property 2: ambiguity
What is the largest city?
What is largest city in Europe?
What is second largest city in Europe?
What is the population of the second largest city in Europe?
What is largest city in Europe?
What is largest grossing movie in 2016?
Who’s in town?
Who’s in?
build complex meanings from primitives
infer full meaning from context
CS221 / Autumn 2016 / Liang
34
CS221 / Autumn 2016 / Liang
35
A communication system
Language as programs
[database]
What is the largest city in Europe by population?
semantic parsing
argmax(Cities ∩ ContainedBy(Europe), Population)
execute
compositionality
ambiguity
Istanbul
CS221 / Autumn 2016 / Liang
36
CS221 / Autumn 2016 / Liang
Language as programs
37
Language as programs
[calendar]
[context]
Remind me to buy milk after my last meeting on Monday.
[sentence]
semantic parsing
semantic parsing
Add(Buy(Milk), argmax(Meetings ∩ HasDate(2016-07-18), EndTime))
[program]
execute
execute
[reminder added]
CS221 / Autumn 2016 / Liang
[behavior]
38
CS221 / Autumn 2016 / Liang
A brief history of semantic parsing
39
Compositional semantics
Richard Montague
cities in Europe
GeoQuery [Zelle & Mooney 1996]
Cities ∩ ContainedBy(Europe)
Inductive logic programming [Tang & Mooney 2001]
Cities
CCG [Zettlemoyer & Collins 2005]
String kernels [Kate & Mooney 2006]
Synchronous grammars [Wong & Mooney 2007] Relaxed CCG [Zettlemoyer & Collins 2007]
Higher-order unification [Kwiatkowski et al.
Language + vision [Matsusek et al.
Large-scale KBs [Berant et al.; Kwiatkowski et al.
Reduction to paraphrasing [Berant & Liang
Compositionality on tables [Pasupat & Liang,
CS221 / Autumn 2016 / Liang
cities
Learning from world [Clarke et al. 2010]
2011] Learning from answers [Liang et al. 2011]
2012]
Regular expressions [Kushman et al. 2013]
2013]
Instruction following [Artzi & Zettlemoyer 2013]
2014]
2015] Dataset from logical forms [Wang et al. 2015]
ContainedBy(Europe)
ContainedBy
Europe
in
Europe
Cities ∩ ContainedBy(Europe)
40
CS221 / Autumn 2016 / Liang
41
Language variation
Deep learning
cities in Europe
European cities
cities that are in Europe
cities located in Europe
cities on the European continent
Object recognition: Krizhevsky/Sutskever/Hinton (2012)
car
Machine translation: Sutskever/Vinyals/Le (2014)
This
is
a
blue
house
no...!
C’est
une
Cities ∩ ContainedBy(Europe)
maison
bleue
accuracy
CS221 / Autumn 2016 / Liang
42
simplicity
CS221 / Autumn 2016 / Liang
43
[Robin Jia, ACL 2016]
[Robin Jia, ACL 2016]
Neural semantic parsing
argmax
Type
City
,
Neural semantic parsing
Population
Dataset: US Geography dataset (Zelle & Mooney, 1996)
What is the highest point in Florida?
100
is
the
largest
city
accuracy
what
• Learn semantic composition without predefined grammar
• Encode compositionality through data recombination
what’s the capital of Germany?
what countries border France?
what’s the capital of France?
what countries border Germany?
CapitalOf(Germany)
Borders(France)
CapitalOf(France)
Borders(Germany)
90
89.3
88.9
86.1
86.6
WM07
ZC07
85
80
70
KZGS11
RNN
RNN+recomb
state-of-art, simpler
CS221 / Autumn 2016 / Liang
44
CS221 / Autumn 2016 / Liang
45
[Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Clarke et al. 2010; Liang et al., 2011]
Summary so far
[context]
Training data for semantic parsing
What is the largest city in Europe by population?
Heavy supervision
semantic parsing
argmax(Cities ∩ ContainedBy(Europe), Population)
execute
Istanbul
What’s Bulgaria’s capital?
What’s Bulgaria’s capital?
CapitalOf(Bulgaria)
Sofia
When was Walmart started?
When was Walmart started?
DateFounded(Walmart)
1962
What movies has Tom Cruise been in?
What movies has Tom Cruise been in?
Movies ∩ Starring(TomCruise)
TopGun,VanillaSky,...
...
• Language encodes computation
Light supervision
...
• Semantic parsing represents language as programs
• Recurrent neural networks: conceptually simpler
CS221 / Autumn 2016 / Liang
46
CS221 / Autumn 2016 / Liang
47
Training intuition
Searching...
Where did Mozart tupress?
Greece held its last Summer Olympics in which year?
PlaceOfBirth(WolfgangMozart)
⇒ Salzburg
PlaceOfDeath(WolfgangMozart)
⇒ Vienna
R[Date].R[Year].argmax(Country.Greece, Index)
PlaceOfMarriage(WolfgangMozart) ⇒ Vienna
Vienna
2004
Year
City
1896
Athens
Greece
14
⇒ London
1900
Paris
France
24
⇒ London
1904
St. Louis
USA
12
...
...
...
...
PlaceOfMarriage(WilliamHogarth) ⇒ Paddington
2004
Athens
Greece
201
London
2008
Beijing
China
204
2012
London
UK
204
Where did Hogarth tupress?
PlaceOfBirth(WilliamHogarth)
PlaceOfDeath(WilliamHogarth)
CS221 / Autumn 2016 / Liang
48
Country Nations
CS221 / Autumn 2016 / Liang
49
Fighting combinatorial explosion
Search from points of certainty [Berant & Liang, 2014]:
What was Tom Cruise in?
MoviesOf(TomCruise)
Dynamic programming [Pasupat & Liang, 2015]:
PopulationOf.CapitalOf.Colorado
PopulationOf.argmax(Type.City u ContainedBy.Colorado, Population)
]
PopulationOf.Denver
Reinforcement learning [Berant & Liang 2015]:
s0 , a1 , s1 , a2 , s2 , . . .
CS221 / Autumn 2016 / Liang
50
CS221 / Autumn 2016 / Liang
51
[Bollacker, 2008; Google, 2013]
[Pasupat & Liang 2015]
Freebase
100M entities (nodes)
Tables
1B assertions (edges)
154 million tables on the web [Cafarella et al. 2008]
MichelleObama
Gender
Female
PlacesLived
USState
1992.10.03
Spouse
Type
StartDate
Event21
Event8
Hawaii
ContainedBy
Location
Type
Marriage
UnitedStates
ContainedBy
ContainedBy
Chicago
BarackObama
Location
PlaceOfBirth
Honolulu
PlacesLived
Event3
Person
CS221 / Autumn 2016 / Liang
Type
DateOfBirth
1961.08.04
Profession
Politician
Type
City
52
CS221 / Autumn 2016 / Liang
53
[Pasupat & Liang 2014]
Web pages
4.7 billion web pages
CS221 / Autumn 2016 / Liang
54
CS221 / Autumn 2016 / Liang
WikiTableQuestions
55
Natural language understanding
• 22000 question/answers, 2100 tables
answer accuracy
50
37.1
40
30
20
12.7
How long is the CS221 lecture?
10
2:50 PM - 1:30 PM = 1:20
0
Predict table cell
Semantic parser
CS221 / Autumn 2016 / Liang
56
CS221 / Autumn 2016 / Liang
Natural language understanding
57
Summary so far
[context]
language
What is the largest city in Europe by population?
semantic parsing
?
execute
Istanbul
• Learn without annotated programs
• Scale up to large knowledge bases, tables, web pages
• Search is hard, needed even to get training signal
• Programs connect language with the world
world
CS221 / Autumn 2016 / Liang
58
CS221 / Autumn 2016 / Liang
59
Language game
SHRDLURN
Wittgenstein (1953):
Language derives its meaning from use.
add(hascolor(red))
remove red
add(hascolor(brown))
remove(hascolor(red))
remove(hascolor(brown))
CS221 / Autumn 2016 / Liang
60
sees goal
perform actions
has language
no language
CS221 / Autumn 2016 / Liang
SHRDLURN
61
Experiments
100 players from Amazon Mechanical Turk
6 hours ⇒ 10K utterances
CS221 / Autumn 2016 / Liang
62
Results: top players (rank 1-20)
(3.01)
(9.17)
Remove the center block
Remove the red block
Remove all red blocks
Remove the first orange block
Put a brown block on the first brown block
Add blue block on first blue block
reinsert pink
take brown
put in pink
remove two pink from second layer
Add two red to second layer in odd intervals
Add five pink to second layer
Remove one blue and one brown from bottom layer
(2.78)
(7.18)
remove the brown block
remove all orange blocks
put brown block on orange blocks
put orange blocks on all blocks
put blue block on leftmost blue block in top row
CS221 / Autumn 2016 / Liang
63
Results: average players (rank 21-50)
(2.72)
rem cy pos 1
stack or blk pos 4
rem blk pos 2 thru 5
rem blk pos 2 thru 4
stack bn blk pos 1 thru 2
fill bn blk
stack or blk pos 2 thru 6
rem cy blk pos 2 fill rd blk
CS221 / Autumn 2016 / Liang
64
move second cube
double red with blue
double first red with red
triple second and fourth with orange
add red
remove orange on row two
add blue to column two
add brown on first and third
CS221 / Autumn 2016 / Liang
(8.37)
remove red
remove 1 red
remove 2 4 orange
add 2 red
add 1 2 3 4 blue
emove 1 3 5 orange
add 2 4 orange
add 2 orange
remove 2 3 brown
add 1 2 3 4 5 red
remove 2 3 4 5 6
remove 2
add 1 2 3 4 6 red
65
Results: worst players (rank 51-100)
(12.6)
Results: interesting players
(14.15)
‘add red cubes on center left
center right
far left and far right’
‘remove blue blocks on row two column two
row two column four’
remove red blocks in center left and center right on second row
(14.32)
laugh with me
red blocks with one aqua
aqua red alternate
brown red red orange aqua orange
red brown red brown red brown
space red orange red
second level red space red space red space
CS221 / Autumn 2016 / Liang
(Polish)
holdleftmost
holdbrown
holdleftmost
blueonblue
brownonblue1
blueonorange
holdblue
holdorange2
blueonred2
holdends1
holdrightend
hold2
orangeonorangerightmost
(Polish notation)
usuń brazowe klocki
postaw pomarańczowy klocek na pierwszym klocku
postaw czerwone klocki na pomarańczowych
usuń pomarańczowe klocki w górnym rzedzie
66
rm scat + 1 c
+1c
rm sh
+ 1 2 4 sh
+1c
-4o
rm 1 r
+13o
full fill c
rm o
full fill sh
-13
full fill sh
rm sh
rm r
+23r
rm o
+ 3 sh
+ 2 3 sh
CS221 / Autumn 2016 / Liang
Summary so far
67
worksheets.codalab.org
• Downstream goal drives language learning
• Both human and computer learn and adapt
CS221 / Autumn 2016 / Liang
68
CS221 / Autumn 2016 / Liang
69
[van den Oord et al., 2016]
Roadmap
WaveNet for audio generation
Summary of CS221
• Work with raw audio (16K observations / second)
Next courses
History of AI
My research
Food for thought
• Key idea: dilated convolutions captures multiple scales of resolution, not recurrent
CS221 / Autumn 2016 / Liang
70
CS221 / Autumn 2016 / Liang
71
[Szegedy et al., 2013; Goodfellow et al., 2014]
[Isola et al., 2016]
Conditional adversarial networks
Adversaries
AlexNet predicts correctly on the left
Key idea: game between
• Generator: generates fake images
• Discriminator: distinguishes between fake/real images
AlexNet predicts ostrich on the right
CS221 / Autumn 2016 / Liang
72
CS221 / Autumn 2016 / Liang
73
[Mykel Kochdorfer]
Safety guarantees
Privacy
• For air-traffic control, threshold level of safety: probability 10−9
for a catastrophic failure (e.g., collision) per flight hour
• Not reveal sensitive information (income, health, communication)
• Compute average statistics (how many people have cancer?)
• Move from human designed rules to a numeric Q-value table?
yes
CS221 / Autumn 2016 / Liang
74
yes
no
yes
no
no
no
no
yes
no
yes
CS221 / Autumn 2016 / Liang
75
[Warner, 1965]
Randomized response
Fairness
Do you have a sibling?
Prediction task:
personal information ⇒ grant insurance?
Fallacy 1: it’s just math, right?
Method:
• Flip two coins.
• If both heads: answer yes/no randomly
• Otherwise: answer yes/no truthfully
no...
learning algorithms ⇐ data ⇐ reflection of our society
Fallacy 2: remove race, gender, etc.
no...
Analysis:
true-prob =
4
3
CS221 / Autumn 2016 / Liang
zipcode is still there, but correlated with race
× (observed-prob − 18 )
76
CS221 / Autumn 2016 / Liang
77
Causality
Feedback loops
Goal: figure out the effect of a treatment on survival
Data
Data:
Hypothesis
Predictions
For untreated patients, 80% survive
For treated patients, 30% survive
Does the treatment help?
Nothing! Sick people are more likely to undergo treatment...
CS221 / Autumn 2016 / Liang
78
CS221 / Autumn 2016 / Liang
79
CS221 / Autumn 2016 / Liang
80
CS221 / Autumn 2016 / Liang
81
CS221 / Autumn 2016 / Liang
82
CS221 / Autumn 2016 / Liang
83
Search problems
Markov decision processes
Constraint satisfaction problems
Adversarial games
Bayesian networks
States
Variables
Reflex
”Low-level intelligence”
Logic
”High-level intelligence”
Machine learning
Please fill out course evaluations on Axess.
Thanks for an exciting quarter!
CS221 / Autumn 2016 / Liang
84