Download PAC-learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Can Inductive Learning Work?
Inductive
hypothesis h
size m
Training set D
-
+
-
+
+
+
+
-
+
-
+
-
+
+
- that+
p(x): probability
+ from X
example x is picked
-
-
Example set X
+
h: hypothesis that
agrees with all
examples in D
L
Hypothesis space H
size |H|
Approximately Correct Hypothesis
h  H is approximately correct (AC)
with accuracy e iff:
Pr[h(x) correct] > 1 – e
where x is an example picked with
probability distribution p from X
PAC Learning Algorithm
 A leaning algorithm L is Provably Approximately
Correct (PAC) with confidence 1-g iff the
probability that it generates a non-AC hypothesis
h is  g:
Pr[h is non-AC]  g
• Can L be PAC if the size m of the training set D is
large enough?
• If yes, how big should m be?
Intuition
 If m is large enough and g  H is not AC, it is
unlikely that it agrees with all examples in the
training dataset D
 So, if m is large enough, there should be few
non-AC hypotheses that agree with all
examples in D
 Hence, it is unlikely that L will pick one
Can L Be PAC?
 Let g be an arbitrary hypothesis in H that is not
approximately correct
h  H is AC iff:
Pr[h(x) correct] > 1–e
 Since g is not AC, we have:
Pr[g(x) correct]  1–e
 The probability that g is consistent with all the
examples in D is at most (1-e)m
 The probability that there exists a non-AC hypothesis
matching all examples in D is at most |H|(1-e)m
 Therefore, L is PAC if m verifies: |H|(1-e)m  g
L is PAC if Pr[h is non-AC]  g
Calculus
 H = {h1, h2, …, h|H|}
 Pr(hi is not-AC and agrees with D)  (1-e)m
• Pr(h1, or h2, …, is not-AC and agrees with D)
 Si=1,…,|H|Pr(hi is not-AC and agrees with D)
 |H| (1-e)m
Size of Training Set
 From |H|(1-e)m  g we derive:
m  ln(g/|H|) / ln(1-e)
 Since e < -ln(1-e) for 0 < e <1, we have:
m  ln(g/|H|) / (-e)
m  ln(|H|/g) / e
 So, m increases logarithmically with the size
of the hypothesis space
But how big is |H|?
Importance of KIS Bias
 If H is the set of all logical sentences with n
n
2
observable predicates, then |H| = 2 , and m is
exponential in n
 If H is the set of all conjunctions of k << n
observable predicates picked among n predicates,
then |H| = O(nk) and m is logarithmic in n
  Importance of choosing a “good” KIS bias
Related documents