Download Sebastiaan Terwijn

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Fuzzy logic wikipedia , lookup

Logic programming wikipedia , lookup

Concept learning wikipedia , lookup

Mathematical model wikipedia , lookup

Inductive probability wikipedia , lookup

Neural modeling fields wikipedia , lookup

Transcript
Probability Logic
Sebastiaan Terwijn
Radboud University Nijmegen
Logic Colloquium 2015, Helsinki
Human probabilistic reasoning
Humans are notoriously bad at probabilistic reasoning.
• Examples from Kahneman and Tversky, where 85% of the
respondents rated the probability of an event A ∧ B higher
than the probability of just A.
• Substituting plausibible for probable leads to violations of laws
of probability.
• Even experts perform badly when reasoning with conditionals
(cf. Stenning–van Lambalgen 2008).
• Common mistake: ignoring base rates (cf. Kahneman).
• Random data generated by humans too regular. (Example:
Stapel.)
Probability logic
Two kinds of probability logic:
• “Probabilities over models”
Examples: many! Large literature, e.g.
I
I
I
logics assigning probabilities to sentences, inclusing those of
Carnap, Gaifman, Scott and Kraus, Nilsson, Väänänen.
PCTL in model checking.
Valiant’s robust logics.
• “Models with probabilities”
I
I
I
I
H. Friedman’s measure quantifier Q.
Keisler and Hoover.
ε-logic.
Approximate Measure Logic (Goldbring and Towsner).
See also Leitgeb 2014 for a survey of many of these.
How to interpret a universal statement
• Aristotle: “. . . every belief comes either through syllogism or
from induction.”
• Sextus Empiricus, Against the logicians: Universal statements
problematic outside of realm of mathematics. Impossibility of
stating “All ravens are black”.
• Hume: Problem of induction. Popper: Falsification instead of
verification.
• Long history of alternative interpretations of universal
statements.
Probability quantifiers
• Carnap: Probability as degree of confirmation. However, in
Carnap’s inductive logic (1945), ∀xR(x) always has degree of
confirmation zero in infinite models.
• H. Friedman’s measure quantifier Qx. Borel structures
(cf. Steinhorn 1985).
• Keisler introduced probability quantifiers of the form
(P x > r)ϕ(x).
Hard to combine with classical ∃ because projections of
measurable sets need not be measurable.
Pac-learning
pac (for “probably approximately correct”) is a quality mark for
learning algorithms in computational learning theory. The
pac-model of learning was introduced by Valiant in 1984.
• X sample space with an unknown probability distribution D.
• C ⊆ P(X) concept class.
In a typical learning situation, given an unknown concept c ∈ C, we
have to formulate an hypothesis h that closely approximates c on
the basis of finite samples from X, drawn according to D, and
labeled according to c.
Pac-learning (continued)
More precisely, we are given an error parameter ε and a confidence
parameter δ.
The algorithm has to produce an hypothesis h ∈ C that is close to
the unknown concept c. Since the algorithm is probabilistic (as it
relies on D), we only require this to happen with high probability.
An algorithm pac-learns an unknown concept c ∈ C if for any D,
given ε and δ it produces, after sampling from X using D, with
probability at least 1 − δ an hypothesis h such that
PrD c 4 h < ε.
That is, the output of the algorithm is probably approximately
correct.
Vapnik-Chervonenkis dimension
The VC-dimension of a concept class C ⊆ P(X) (Vapnik and
Chervonenkis 1972) is a purely combinatorial measure of the
“richness” of the class.
Given S ⊆ X, we can consiser all possible “behaviours” c ∩ S of C
on the set S. If |S| = n, we view a behaviour as a subset of
{0, 1}n . We say that S is shattered by C if the number of
behaviours is maximal, i.e. 2n . The VC-dimension of C is the
largest n such that there is a set S of cardinality n that is
shattered by C, and ∞ if there is no largest such n.
Invented in statistics, the notion of VC-dimension has become a
central notion in computational learning theory. It was discovered
independently in model theory.
VC-dimension and learning theory
Let ΠC (n) be the maximal number of behaviours of C on sets S of
cardinality n.
Sauer-Shelah Lemma (1972) ΠC (n) is either a polynomial of
degree d = VC-dim(C) or equal to 2n .
The link with learning theory is provided by the “4 Germans”
paper:
Theorem (Blumer, Ehrenfeucht, Haussler, and Warmuth 1989)
Suppose there exists a learning algoritm for C that produces
hypotheses that are consistent with given samples. Then
C is pac-learnable iff the VC-dimension of C is finite.
Sampling from models
M first-order model, D arbitrary probability distribution over M.
Want to decide with high probability the approximate truth of
sentences ϕ on the basis of samples of atomic data taken from M.
We have the following assymmetry between ∃ and ∀:
• On seeing an atomic truth R(a), we know with certainty that
∃xR(x),
• On the other hand, inducing a universal statement ∀xR(x)
can only be done probabilistically.
As in the pac-model, it is important that the distribution D is
unknown, which is counterbalanced by the fact that success of the
learning task is measured using the same distribution D.
ε-Logic
Fix error parameter ε.
Inductively define the relation (M, D) |=ε ϕ as follows.
• For every atomic formula ϕ, (M, D) |=ε ϕ if M |= ϕ.
• The logical connectives ∧ and ∨ are treated classically.
• The existential quantifier is also classical:
(M, D) |=ε ∃xϕ(x)
if there exists a ∈ M such that (M, D) |=ε ϕ(a).
ε-Logic
• The case of negation is split into subcases as follows:
I
I
For ϕ atomic, (M, D) |=ε ¬ϕ if (M, D) 6|=ε ϕ.
¬ distributes in the classical way over ∧ and ∨, e.g.
(M, D) |=ε ¬(ϕ ∧ ψ) if (M, D) |=ε ¬ϕ ∨ ¬ψ.
I
I
I
I
(M, D) |=ε
(M, D) |=ε
(M, D) |=ε
(M, D) |=ε
¬¬ϕ if (M, D) |=ε ϕ.
¬(ϕ → ψ) if (M, D) |=ε ϕ ∧ ¬ψ.
¬∃xϕ(x) if (M, D) |=ε ∀x¬ϕ(x).
¬∀xϕ(x) if (M, D) |=ε ∃x¬ϕ(x).
• (M, D) |=ε ϕ → ψ if (M, D) |=ε ¬ϕ ∨ ψ.
• Finally, we define (M, D) |=ε ∀xϕ(x) if
PrD a ∈ M | (M, D) |=ε ϕ(a) > 1 − ε.
ε-Logic
Everything is “as classically as possible”, except the interpretation
of ∀.
The rules for negation allow us to put every formula in prenex
normal form.
Note that both (M, D) |=ε ∃xϕ(x) and (M, D) |=ε ∀x¬ϕ(x) may
hold, since the interpretation of the first is the classical one, but
the interpretation of the second is that most x’s satisfy ¬ϕ(x).
Since by definition ¬∃xϕ(x) ≡ ∀x¬ϕ(x), we see that ε-logic is
paraconsistent.
Probabilistic truth
ϕ is ε-valid if (M, D) |=ε ϕ for every M and D.
Note that unlike in classical logic, validity and satisfiability are not
complementary.
Theorem For all ε < ε0 , the ε-validities are strictly included in the
ε0 -validities.
Theorem Although for ε = 0, ε-truth is not the same as classical
truth, the 0-valid formulas coincide with the classically validities.
Measurability assumptions on models
In truth definition we require in particular that the set
x ∈ M : (M, D) |=ε ϕ(x)
is D-measurable, where ϕ may contain parameters from M.
To avoid pathologies, in the following we will require a bit more,
namely:
for every k-ary predicate R occurring in ϕ, the set of
k-tuples satisfying R is Dk -measurable.
We given an example of an excluded model, based on Sierpinski’s
argument that under CH there are unmeasurable subsets of the
real plane. Define D on ω1 by

1 if A = ω1 with the exception of at most
D(A) =
countably many elements,

0 otherwise.
Then we have:
(ω1 , D) |=0 ∀x∀y(x < y)
(ω1 , D) 6|=0 ∀y∀x(x < y)
Note that the relation {(x, y) ∈ ω1 2 : x < y} is not
D2 -measurable: Since all its vertical sections {y : x < y} have
D-measure 1, and all its horizontal sections {x : x < y} have
D-measure 0, this would contradict Fubini’s theorem.
Sampling from models
EX(D) sampling oracle
When called upon the oracle randomly draws an element x from
the model M according to the distribution D.
Given a sample of elements, the oracle supplies us with all the
atomic truths these elements satisfy, for every relation of every
arity in the language L.
We assume that the language L is of finite signature, so that every
sample satisfies only finitely many atomic truths.
Learning first-order formulas
Definition A (probabilistic) algorithm L pac-learns sentence ϕ if,
for any unknown M and D, given error parameter ε > 0 and
confidence parameter δ > 0, and with access to the sampling
oracle EX(D), L outputs one of the possibilities M |=D,ε ϕ,
M |=D,ε ¬ϕ, such that with probability at least 1 − δ the output is
correct.
Note that both possible outputs can be correct.
Theorem (T 2005) There exists an algorithm L that pac-learns
any sentence ϕ. For a fixed number of quantifiers n,
L is polynomial in 1ε and 1δ .
A sufficient sample size for this algorithm is
(Kuyper 2011)
(n+1)
1 1
(2n)!
ε2 δ
Valiant’s Robust Logic
Introduced by Valiant 2000.
Related to ε-logic in motivation of interpreting ∀ probabilistically,
and ∃ classically.
However, restricts to finite models.
More importantly, this logic falls under the “probabilities over
models” approach. Models are coded by vectors, that in this
context are called scenes, and aribtrary probability distributions
over the set of all scenes are considered.
Valiant showed that a certain class of formulas called rules are
pac-learnable from scenes in this context.
Approximate Measure Logic
Introduced by Goldbring and Towsner 2012.
AML similar to ε-logic, but with a completely different motivation.
More specifically, it aims to provide a logical framework for the
connection between results in finitary combinatorics and results in
measure theory or ergodic theory. Typical example: Szemerédi’s
Theorem on arithmetic progressions in sufficiently dense sets of
natural numbers. The idea for this logic was based in part on work
by Hrushovski.
Classical truth is not a limit
Theorem There exist sentences ϕ that are not classically valid
but that are ε-valid for all ε > 0.
Proof. Consider
lin = ∀x∀y(x 6 y ∨ y 6 x)
ϕ = ¬lin ∨ ∃x∀y(y 6 x)
When (M, D) 6|=ε ¬lin then we have classically M |= lin so 6
really is a linear order in M. But if this is the case, by the
measurability assumption that 6 is D2 -measurable, there is a point
x such that most of the measure is to the left of x.
M
x
Example
The sentence ∀x∀y R(x, y) ∧ ∀y∀x ¬R(x, y) is finitely 13 -satisfiable:
v
f
f
v
v
f
f
v
f
Example (continued)
The stronger sentence ∀x∀y R(x, y) ∧ ¬R(y, x) is also
1
3 -satisfiable, but not by any finite model.
By approximating this one can obtain finite ε-models for any
ε > 31 . One can also show that the sentence does not have any
countable 13 -models.
Finite and countable models
Downward Löwenheim-Skolem fails for ε-logic: Not every
ε-satisfiable sentence has a countable model.
Theorem T.f.a.e.:
(i) ∀M finite M |= ϕ,
(ii) ∀M countable ∀D ∀ε > 0 (M, D) |=ε ϕ.
Notice that it is essential we exclude the case ε = 0, since
otherwise we would obtain all classical validities instead of only the
finitely valid sentences.
Failure of classical Löwenheim-Skolem
Example of a satisfiable sentence without any countable model.
Let ϕ = ∀x∀y(R(x, y) ∧ ¬R(x, x)). Then ϕ is 0-satisfiable; take
the unit interval [0, 1] with the Lebesgue measure and take R(x, y)
to be x 6= y. However, ϕ does not have any countable 0-models: If
(M, D) |=0 ϕ then for almost every x ∈ M the set
Bx = {y ∈ M | (M, D) |=0 ¬R(x, y) ∨ R(x, x)}
S
has measure zero. Since x ∈ Bx , the set x∈M Bx equals M, and
therefore has measure 1. But if M is countable it is also the union
of countable many sets of measure 0 and hence has measure 0, a
contradiction.
Note that ϕ is finitely ε-satisfiable for every ε > 0.
Downward Löwenheim-Skolem
Theorem (Kuyper–T 2013) Let (M, D) be an ε-model and let
X ⊆ M be of cardinality at most 2ω . Then there exists an
elementary ε-submodel
(N , E) ≺ε (M, D)
such that X ⊆ N and N is of cardinality at most 2ω .
Result is for countable languages, with equality, but no function
symbols. For languages without equality we have:
Theorem (Kuyper–T 2013) Let Γ be an ε-satisfiable set of
sentences. Then there exists an ε-model on [0, 1] with the
Lebesgue measure which ε-satisfies Γ. Furthermore, all relations in
the new ε-model can be chosen to be Borel.
The Löwenheim number
Let λε be the Löwenheim number of ε-logic, that is, the smallest
cardinal λ such that every ε-satisfiable sentence has a model of
cardinality λ.
Theorem (Kuyper–T 2013)
For every rational ε ∈ [0, 1) we have
(i) ℵ1 6 λε 6 2ℵ0 ,
(ii) If Martin’s axiom MA holds then λε = 2ℵ0 .
Hence the statement λε = ℵ1 is independent of ZFC.
Question Is there a model of ZFC in which λε < 2ℵ0 ?
For example λε = ℵ1 < 2ℵ0 ?
Compactness
Theorem (Kuyper–T 2013) For every rational ε ∈ (0, 1), ε-logic
is not compact, i.e. there exists a countable set Γ of sentences
such that each finite subset of Γ is ε-satisfiable, but Γ itself is not
ε-satisfiable.
Idea similar to that for Keisler’s logic.
Theorem (Kuyper 2014) For languages without function symbols,
0-logic is compact.
Weak ε-models
In the definition of ε-model we required that all relations and
functions are D-measurable. If we drop this requirement, we obtain
the notion of weak ε-model.
Ultraproducts
Given an ultrafilter U on ω, and (finitely additive) measure spaces
(Xi , Di ), we can define the ultraproduct
Y
(Xi , Di )/U
i∈ω
equipped with a uniquely defined (σ-additive) ultraproduct
measure. (Construction due to Hoover.)
Theorem (Kuyper–T 2013)
• A variant of Los’ theorem holds.
• The ultraproduct is a weak ε-model.
A probabilistic version of Los’ theorem
Theorem (Kuyper–T 2013) T.f.a.e.:
Q
(i) i∈ω (Mi , Di )/U |=ε ϕ([a1 ], . . . , [an ]),
(ii) for all ε0 > ε, i ∈ ω | (Mi , Di ) |=ε0 ϕ(a1i , . . . , ani ) ∈ U,
(iii) there exists a sequence ε0 , ε1 , . . . with
U-limit ε such that
i ∈ ω | (Mi , Di ) |=εi ϕ(a1i , . . . , ani ) ∈ U.
1 , . . . , an ) ∈ U, then we
In particular,
if
i
∈
ω
|
(M
,
D
)
|=
ϕ(a
i
i
ε
i
i
Q
have i∈ω (Mi , Di )/U |=ε ϕ([a1 ], . . . , [an ]).
Weak models and compactness
Theorem (Kuyper–T 2013) ε-logic becomes compact when
considering weak ε-models.
This is very similar to the compactness of Goldbring and Towsner’s
AML.
Satisfiability as a limit
Theorem (Kuyper 2014)
\
ε-satisfiable = 0-satisfiable.
ε>0
Theorem (Kuyper 2014) For ε > 0 we have
\
ε0 -satisfiable 6= ε-satisfiable.
ε0 >ε
The latter result follows from considerations about computational
complexity of satisfiability.
The complexity of ε-satisfiability
Theorem Given ϕ we can decide whether there exists ε < 1 such
that ϕ is ε-satisfiable.
Theorem (Kuyper 2014) For ε = 0, ε-satisfiability is decidable.
Proof uses Skolemization + indiscernables + Ramsey theory.
Theorem (Kuyper 2014) For ε ∈ (0, 1) ∩ Q, ε-satisfiability is
Σ11 -complete.
Proof of latter result uses ideas from Hoover.
Caveat: These results are only for relational languages.
The complexity of ε-validity
Terwijn (2009) proved that for every rational ε ∈ [0, 1), the set of
ε-valid formulas is Σ01 -hard.
(Recall that ε = 0 equals the classical case.)
Theorem (Kuyper 2014) For every rational ε ∈ (0, 1), the set of
ε-valid formulas is Π11 -hard.
Question: Is it Π11 -complete ?
Finite models of ε-logic
G. Yang considered the complexity of ε-satisfiability and ε-validity
over finite models.
Theorem (Yang 2014) Over finite models we have:
(i) for rational ε ∈ (0, 1),
ε-satisfiability is Σ01 -complete, and ε-validity is Π01 -complete.
(ii) for ε = 0,
ε-satisfiability is decidable, and ε-validity is Π01 -complete.
Some open problems about ε-logic
• For the Löwenheim number λε of ε-logic,
does ZFC prove that λε = 2ℵ0 ?
• Does the Σ11 -completeness of ε-satisfiability hold for languages
with = and function symbols?
• Is ε-validity Π11 -complete for ε > 0 ?
• Craig interpolation / Beth definability / Robinson consistency ?
Note that classically the equivalence of these properties relies
on compactness. Potentially, some of these could hold for
ε-logic, while others do not.
• Proof theory.
• Study ε-logic as part of the larger area of model-theoretic
logics (cf. Barwise and Feferman 1985) and generalized
quantifiers (cf. Väänänen 2008).
References
I. Goldbring and H. Towsner, An approximate logic for measures,
Israel Journal of Mathematics 199(2) (2014) 867–913.
H. J. Keisler, Probability quantifiers, in: J. Barwise and S. Feferman
(eds.), Model-Theoretic Logics, Springer-Verlag 1985, 509–556.
R. Kuyper, Computability, probability and logic, PhD thesis,
Radboud University Nijmegen, 2015.
R. Kuyper and S. A. Terwijn, Model theory of measure spaces and
probability logic, Review of Symbolic Logic 6(3) (2013) 367–393.
H. Leitgeb, Probability in Logic, Oxford Handbook of Probability
and Philosophy, forthcoming, 2014.
S. A. Terwijn, Probabilistic logic and induction, Journal of Logic and
Computation 15(4) (2005) 507–515.
L. G. Valiant, Robust logics, Artificial Intelligence 117 (2000)
231–253.
G. Yang, Computabilities of Validity and Satisfiability in Probability
Logics over Finite and Countable Models, arXiv, 2014.