Download Discussion:

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Artificial intelligence wikipedia , lookup

Birthday problem wikipedia , lookup

Probability box wikipedia , lookup

Transcript
1
Probabilistically Reliable Conditional Reasoning:
From Psychology to Logic to Psychology
Gerhard Schurz, University of Düsseldorf
1
2
2
1. Psychology  the Starting Point:
Finding No 1: Human conditional reasoning violates deductive laws of conditional logic (e.g. Evans 1982 et al.)
Finding No. 2: Humans understand conditionals as uncertain or normic (Scriven 1959)  as admitting of exceptions:
If A, then normally B
Most As are Bs
P(B|A) = high
Default Modus Ponens:
Add exception premises:
All ravens are black.
This animal is a raven
__________________________________
This animal is black.
4g +
?
& This animal is a glacier raven
and all glacier ravens are white
-
u 0
Mean: 3,53
This animal is black
Mean: 0,54
 Non-monotonicity at the level of conditionals
Finding No. 3: Humans obey the pattern of strict specificity to a high degree:
This animal is not black
Mean: 0,67
4g
+
?
-
u 0
This animal is white.
Mean: 3,46
3
2. Logic: Probabilistic Semantics for Conditional Reasoning
Pelletier & Elio (2001):non-monotonic conditional reasoning doesn't have standards of correctness
(in contrast to strict truth preservation for deductive logic)
mere intuitions of 'plausibility'  irrational?
No: 'normative' standard is high probability preservation:
Different kinds of probabilistic semantics. Most important: System P
A  B means P(B|A) = high ( 1u)
(This is not Adams' thesis, which is more problematic )
P  statistical generic probability  applies to open formulas A(x), B(x)
3
4
4
2.1 System P - Monotonic Inferences |P among conditionals
(Language: monadic predicates, x (omitted), a; A  B simple conditionals)
Cautious Cut CC:
A  B, AB  C |P A  C
Cautious Monotonicity CM:
A  B, A  C |P AB  C
Cautious Disjunction CD:
A  C, B  C |P AB  C
Supraclassicality SC:
If | A  B, then |P A  B.
Supra-Strictness:
AP AB
Note: Given the classical P-definition, P(B|A) = 1 if P(A) = 0,
then AB are definable as AB   (Hawthorne / Makinson 2007)
Not so for Popper-functions
Additional rule of P+: (Adams 1986, Schurz 1998)
Language: Boolean combinations of conditionals
Weak Rational Monotonicity WRM:
A  B, (A  C) |P+ AC  B
Some Derived Rules:
And:
A  B, A  C |P A  BC
Left Logical Equivalence LLE:
A  B |P A  C |P B  C
Right Weakening RW:
B  C, A  B |P A  C
Li abbreviates an uncertain conditional or 'normic law' Ai  Bi
P(Li) := P(Bi|Ai) = the probability associated with Li;
U(Li) := 1  P(Li)
5
2.2 Semantics and Completeness Theorems for P (P+): (Adams 1975, 1998)
L1,,Ln |P L
iff
Normal world semantics:  iff for all ranked models M = (W,R)
if all Li (1in) hold in M, then L holds in M.
w6
w3 w4 w5
w1
w2
Weak preservation of high probabilities - uncertainty-sum-semantics:
 iff for all probability functions P (models (W,A,P) ): U(L)  1in U(Li)
For P+: L1,,Ln, (L'1),,(L'm) |P L iff U(L)  1in u(Li) / 1jm u(L'j)
Allows computation of lower probability bound of the conclusion from given bounds of the premises, e.g. CC: A 0.95B,
AB 0.96 C |P A0.91C
Other kinds of probabilistic semantics for P (P+)
Hawthorne (1996): Strict preservation of 1-probabilities with Popper functions
 For Popper-functions, P(B|A) behaves still non-monotonic
 iff for all Popper probability functions P: i  {1,,n}: P(Li) = 1  P(L) = 1
Leitgeb (2009): extends this semantics to a full  - language with nested conditionals
Further variations:
Convergence-to-1 semantics (with standard instad of Popper P's) , and more
 see Leitgeb (2004), ch. III.
5
6
6
2.3 Difference between P (for uncertain indicative conditionals) and V (Lewis 1973, for counterfactuals):
Instead of possible worlds being ordered according to closeness-to-acutuality, they are ordered according to their
(probabilistic) normality.
 The actual world is not assumed to be (most) normal.
Consequences for the relation between conditonal and factual (non-conditional) formulas:
A and T  A must not be identified (Adams' 'mistake'; relevant to Pfeifer/Kleiter)
 No monotonic relation holds between conditionals and factual formulas:
MP is not probabilistically valid, in the sense that
P(B|A) = 1, Aa (or: Aa is certain) / P(Ba) = 1
Generic probability P does not apply to factual formulas (confusion  see below).
'Certainty' is a subjective probability or degree-of-belief DB:  DB(Aa) = 1.
Instead of MP we have the principle of total evidence (Carnap, Reichenbach):
(TE) A  B, Aa |DP Ba iff A(a) comprises all (relevant) evidence (about a)
A r B, Aa |DP,r Ba
("DP" for "default extension of P")
 This is Reichenbachs coordination principle: how to transfer P to DB
Note: L1,,Ln, Aa |DP Ba iff L1,,Ln |P A B and Aa comprises all evidence
7
P + (TE) entail the rule of (strict/weak) specificity as follows:
normal case law: exception law strict/normic specificity law:
A B,
C  B,
Cx
/
Ax
|P
ACB
hence by TE: A B, C  B, C / A, Aa, Ca |DP Ba.
A stronger version of the specificity rule is provided by WRM:
here the specificity law Cx  Ax is replaced by (Cx  Ax),
2.4 Weaker systems than P - strict probability threshold semantics
Hawthorne and Makinson (2007):
System O: LLE, RW, Ref
(Threshold-invalid: And, CC, CM, Or. Instead:)
VCM: A BC / ABC
("V" for "very")
WOr: AB  C, AC C
("W" for "weak")
WAnd: A  C, AB (AB) / A  BC
O is correct for strict threshold preservation: If L1,,Ln |O L , then
for all tand P's: if for all 1in, P(Li)  t, the P(L)  t
Open problem: is O complete?
Note: the probabilistic strength of And: O + And = P
7
8
8
2.5 Systems stronger than P  non-monotonic on the level of |:
2.5.1 Default assumptions of irrelevance (Key idea of non-monotonic logic  overview Brewka 1991; Gabbay 1994)
AB
C not negatively relevant: P(B|AC)  P(B|A)
Birds can fly
Intermediate step: AC  B
A(a)  C(a)
This animal is a female bird
|~DP B(a)
|~DP This animal can fly
2.5.2 Default assumptions for contraposition
Joint project with Leitgeb: contraposition CP is very strong: Min + CP = Class ().
Min = LLE, RW, Ref, And = the minimal normal conditional logic after Segerberg
CP is justified by the natural default assumptions of specific predicates: P(BA)  P(AB) iff P(A)  P(B)
Birds can fly
P(something is a bird)  P(something cannot fly)
|P If something cannnot fly, it isn't a bird
Enables quasi-classical reasoning First steps in Schurz (1997, 2005): For L a set of uncertain conditionals:
L  F(a) | classical Aa iff LF |P F  A
where LF, the 'F-update of L', is constructed from L by 'reasonable' irrelevance and contraposition assumptions
9
3. Back to Psychology:
Schurz (2001)  evolution-theoretic foundation of normic laws:
 normic laws describe the regularities of evolutionary systems
 therefore: human reasoning is expected to be well adapted to reasoning with normic / uncertain conditonals
 in accordance with probabilistic reliability.
Major results of an experimental study:
(1.) Humans follow the pattern of strict specificity to a high degree.
See initial example. Many further examples.
This result cannot be explained by the rivaling deductive reasoning hypothesis
– according to this rivaling hypothesis humans should tend to conclude nothing:
Inconsistent premise set (scored according to classical deductive logic):
Peter and Paul have the same nationality (are compatriots)
Peter is a Frenchman.
Paul is an Italian.
_______________________________________________
Peter is an Italian.
Paul is a Frenchman.
0g
+
?
-
u 4
0 g
+
?
-
u 4
Mean: 3,11 / 3,05 (scored in favor of skeptical reasoning)
9
10
10
(2.) Results on specificity are independent of different formulations of a conditional:
Bird can fly
All birds can fly
If something is a bird, it can fly
Normally, birds can fly
However, addition of "normally" or "most" decreases tendency to default MP (explanation by Grice relevance).
(3.) Weak specificity is followed by human reasoning to a lower but nevertheless significantly positive degree.
(4.) When the exception law is only implicit, the effect to reason according to the specificity rule is weaker but still clearly
present.
(5.) Training in deductive logic slightly decreases the tendency to reason non-monotonically.
Concluding descriptive thesis (Schurz 2007):
 Human conditional reasoning is based on some cognitively deep-rooted and
probabilistically sound rules for uncertain conditionals (system P)
together with plausible default assumptions.
 These rules are performed unconsciously.
Explain the deviations from classical logic.
11
4. Alternative approaches
4.1 Step-by-step-propagation of tight upper and lower probability bounds through chains of rule-applications:
Gilio (2000) (Frisch/Haddayway 1994), Bourne/Parsons 1998,Lukasiewicz 1999):
The uncertainty-sum-rule is not generally tight:
For SC, And, LW, RW, and WRM: tight
For CC and Or: only almost tight
(for CM even worse)
Is stepwise propagation of tight bounds better than uncertainty sum semantics?
No – because: tight bounds are not generally preserved along rule chainings. Example:
A  B  D: u1
RW
ABD  C : u2
CC
A  D : u1
A  C : u1+u2  u1.u2
CM
A D  C : (u1+u2  u1.u2) / (1  u1) = u1/(1u1) + u2 > u1 + u2
tight-propagation
sum-rule
Hence: Interval propagation is NOT an adequate (complete) semantics for system P.
New results of Gilio: tight boudns for iterations of CM, and iterations of Or.
 Question of tightness for P-inferences is largely an open problem.
11
12
12
4.2 Subjective Probabilities for conditional and factual premises
Pfeifer and Kleiter (2008)
75% certain: A  B (Assumption: Adams' thesis; but cf. Douven/Verbrugge)
80% certain: A
How certain are you concerning B?
 specify a point or an interval, in [0,1])
Assuming Adams' thesis, certain rules of P have been nicely confirmed by Pfeifer and Kleiter,, e.g.LLE, RW, And, Or, CM.
13
Problem 1: Strict (or weak) sepcificity cannot be adequately represented if one assumes only one subjective probability
function:
Because:
P(B|A) = high
P(B|C) = high
is impossible (incoherent)
P(AC) = high
because: P(B|A) = P(B|AC)P(C|A) + P(B|AC)P(C|A)
is high
is low
must be low, but can't be because P(AC) is high
 One must distinguish generic conditional probabilities from degrees-of-belief of factual formulas.
A way of handling this within subjective probabilities (Pearl 1988):
Interpret P(Ba|Aa) as "DB of Ba if Aa were all what I know about a"
Note. This is an application of Reichenbach's coordination principle.
Connected: Pfeifer & Kleiter's experiments (2008) don't consider exceptional informations.
E.g., they find that MP is mastered with largely coherent probability estimates.
But if exceptional information were added MP would break down.
13
14
14
Problem 2: Pfeifer & Kleiter(2008) don't consider the mistake of coherent but too narrow probability estimates (point values
or small intervals)
Problem 3: Pfeifer & Kleiter (2008) don't consider default assumptions of irrelevant premise strengthening or contraposition
4.3 Premise strengthening  monotonicity (M)
A  B / AC  B
Without default irrelevance assumption (M) is probabilistically 'uninformative'.
Kleiter & Pfeifer argue that their findings support the uninformativeness assumption.
I would doubt this (the support is rather weak).
Connolly et al. (2007,10ff): Tendency to draw "default inheritance to subclasses" is clearly present  but it sinks, the more
narrow the subclaseses are:
"Baby /Peruvian /ducks have webbed feet"
This is more in accordance with probabilistic default assumptions.
Problem: Irrelevance is in conflict with Gricean premise-relevance
(Pelletier and Elio 2003)
 Project of a new experiment series
15
4.4 Contraposition CP
Also (CP) is 'uninformative' without default assumptions.
In Pfeifer & Kleiter (2008, p. 22), only 40% responded with lower bounds  7% and upper bounds  93%;. 25% reproduced
given value.
 More in favor of the default assmption model.
Default assumption model is corroborated by findings in Evans (1982, 128ff):
MP
MT
If p then q
100
75
If p then not-q
100
56
If not-p then q
100
12
If not-p then not-q
100
25
 Planned: new experiment series concerning MP, DA, AC and MT with specific versus unspecific predicates and examples
from natural versus social domain.
15
16
16
References:
Adams, E. W. (1975): The Logic of Conditionals, Reidel, Dordrecht.
Adams, E.W. (1986): "On the logic of high probability", Journal of Philosophical Logic 15, 255 - 279.
Adams, E.W. (1998): A Primer of Probability Logic, CSLI Publications, Stanford.
Barkow, J., Cosmides, L. & Tooby, J. (1992, Eds.). The adapted mind: evolutionary psychology and the generation of culture. New York: Oxford Univ. Press.
Bourne, R., and Parsons, S. (1998): "Propagating Probabilities in System P", Proceedings of the 11th International FLAIRS
Conference, 440-445.
Brewka, G. (1991): Nonmonotonic Reasoning. Logical Foundations of Commonsense. Cambridge University Press.
Connolly, A.C., Fodor, J.A., Gleitman, L.R. and Gleitman, H. (2007). 'Why Stereotypes Don't Even Make Good Defaults',
Cognition 103: 1-22.
Cosmides, L. & Tooby, J. (1992). "Cognitive adaptations for social exchange". In Barkow et al. (1992, Eds.).
Evans, J.St. (1982): The Psychology of Deductive Reasoning, Routledge & Kegan Paul, London.
Ford, M., and Billington, D. (2000): "Strategies in Human Nonmonotonic Reasoning", Computational Intelligence Vol 16,
No 3, 446 - 467.
Frisch, A.M. and Haddaway, P. (1994): "Anytime Deduction for Probabilistic Logic", Artificial Intelligence 69, 93-122.
Gabbay, D. M. et al. (Eds., 1994): Handbook of Logic in Artificial Intelligence and Logic Programming, Vol. 3: Nonmonotonic Reasoning and Uncertain Reasoning, Clarendon Press, Oxford.
Gilio, A. (2000): "Precise Propagation of Upper and Lower Probability Bounds in Sytem P", in: Annals of Mathematics and
Artificial Intelligence.
Grice, P. (1975): "Logic and Conversation", in P. Cole, J. Morgan (eds.), Syntax and Semantics Vol. 3, New York.
Hawthorne, J. (1996): "On the Logic of Non-Monotonic Conditionals and Conditional probabilities", Journal of Philosophical Logic 25, 185-218.
Hawthorne, J., and Makinson, D. (2007): "The Quantitative / Qualitative Watershed for Rules of Uncertain Inference",
Studia Logica 86, 247-297.
17
Kraus, S., Lehmann, D. and Magidor, M. (1990): "Nonmonotonic Reasoning, Preferential Models and Cumulative Logics",
Artificial Intelligence 44, 167 - 207.
Lehmann, D. and M. Magidor (1992): "What Does a Conditional Knowledge Base Entail?", Artificial Intelligence 55, 1-60.
Leitgeb, H. (2004). Inference at the Low Level, Kluwer, Dordrecht.
Lewis, D. (1973): Counterfactuals, B. Blackwell, Oxford.
Lukasiewicz, T. (1999): "Probabilistic Deduction with Conditional Constraints over Basic Events", Journal of Artificial Intelligence Research10, 199-241.
Pearl, J.: 1988, Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, Santa Mateo, California.
Pelletier, F. J., and Elio, R. (2001): "Logic and Computation: Human Performance in default Reasoning", in: P. Gärdenfors
et al. (Hrsg.), Logic, Methodology and Phi
Pfeifer, N., and Kleiter, G. (2008): "The Conditional in Mental Probability Logic", in: M. Oaksford (ed.), The Psychology of
Conditionals, Oxford University Press, Oxford.
Schurz, G. (1997): "Probabilistic Default Reasoning Based on Relevance- and Irrelevance Assumptions", in: D. Gabbay et
al. (eds.), Qualitative and Quantitative Practical Reasoning, LNAI 1244, Springer, Berlin, 36 - 553.
Schurz, G. (1998): "Probabilistic Semantics for Delgrande's Conditional Logic and a Counterexample to his Default Logic",
Artificial Intelligence 102, No. 1, 81-95.
Schurz, G. (2001): "What Is 'Normal'? An Evolution-Theoretic Foundation of Normic Laws and Their Relation to Statistical
Normality", Philosophy of Science28, 2001, 476-97.
Schurz, G. (2005): Non-monotonic reasoning from an evo-lutionary viewpoint. Synthese 146/1-2, 37-51.
Schurz, G. (2007): "Human Conditional Reasoning Explained by Non-Monotonicity and Probability: An Evolutionary Account", in: Vosniadou, S. et al. (eds.), Proceedings of EuroCogSci07. The European Cognitive Science Conference 2007,
Lawrence Erlbaum Assoc., New York, 2007, 628-633.
Scriven, M. (1959): "Truisms as Grounds for Historical Explanations'", in: Gardiner, P.: (ed.), Theories of History. New
York: The Free Press.
Segerberg, K. (1989): "Notes on Conditonal Logic", Studia Logica 48, 157-168.
Sperber, D. and Wilson, D (1986): Relevance. Communication and Cognition, Oxford, B. Blackwell.
17
18
Appendix - further material:
Results of Kleiter/Pfeifer (2008) on premise strengthening_
Conclusion has negated predicates:
Kleiter/Pfeifer (2003), p.11:
mean interval sizes for M tasks are typically around 0.5 (often less)  but not 1!
Kleiter / Pfeiffer (2008), p. 17: only 69% were interval responses
more that 50% responded by lower bounds  1%.
27% responded by lower bounds  1% and upper bounds  91%.
Kleiter/Pfeifer (2006) on Contraposition: not informative presentation
New Result for threshold semantics:
If O is complete, then O*[t] = O+Inct is complete for threshold t, where
(Inct) If L1,,Ln |P AB and n(1t) < t, then L1,,Ln, A B |O* A 

18
19
Strength of P-rules given O (Hawthorne/Makinson 2007):
O+And + {CM,Or}  O+{CM,CC}  P
O + CC O+CM
O+Or
O
Descriptive thesis:
 This is not in conflict with the fact that in the domain of social rules, humans tend to apply classical rules for strict conditionals (Cosmides & Tooby 1992)  humans' ability to detect rule-breakers requires the apllication of rules for strict conditionals.
Contraposition:
U(A|B) = P(A|B) = P(B|A)(P()/P(B)) = U(B|A)(P(A)/P(B))
 U(A|B)  U(B|A) iff P(A)  P(B)
For MT in Pfeifer-Kleiter-method you get:
P(A)  ( 1U(B|A)(P(A)/P(B)) )P(B) = P(B)U(B|A)P(A)  P(B)U(B|A).
But, is the wrong representation. MT and contraposition should behave in the same way  test by experiment!
In general: a test is needed to discriminate between the one-prob-functiion and the two-prob-function hypothesis.
19
20
20