Download 1. Introduction

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
STA111 - Lecture 1
Welcome to STA111!
Some basic information:
• Instructor: Víctor Peña (email: [email protected])
• Course Website: http://stat.duke.edu/~vp58/sta111.
1
What is the difference between Probability and Statistics?
Unfortunately, I don’t think there is a simple answer to this question (some might even argue that there
isn’t one at all!). Justin Rising gives this answer on Quora:
In probability, we’re given a model, and asked what kind of data we’re likely to see. In statistics,
we’re given data, and asked what kind of model is likely to have generated it.
This definition is not completely satisfactory (as we will see as we learn more about probability and statistics), but it is a good start. Let’s illustrate this with a typical example (see for example the answer by John
D. Cook on StackExchange).
Suppose we have a bag with a total of 100 jelly beans. Some of them are red, and some of them are
green. The probabilist knows the proportion of red to green jelly beans, and wants to know, for example,
the probability of drawing 2 red jelly beans in a row. The statistician doesn’t know the proportion of red
to green jelly beans, and wants to estimate it after having drawn 2 red jelly beans in a row. Let’s stick with
this example for a bit. The statistician is also interested in:
1. Quantifiying how precise the estimation is. Suppose that the statistician has drawn 98 jelly beans and
all of them are red. It seems clear that the estimation after drawing 98 beans will be more “precise”
(in some sense) than the original estimation based on a sample size of 2 beans.
2. Deciding how many jelly beans he should draw until he expects to achieve a sufficient precision. Drawing
jelly beans out of a bag is boring, so he might not want to draw all 100 beans and know the proportion
with all certainty. The statistician might be content with estimating the proportion sufficiently well.
3. Investigating whether the assumed probabilistic framework corresponds with reality. Imagine that
the statistician draws 30 red jelly beans in a row, but knows that the proportion of red/green jelly
beans should be roughly 50%. The statistician was planning on estimating the proportion under the
assumption that the jelly beans are mixed well. After seeing this, the statistician suspects that it might
not be a reasonable assumption – whoever put the jelly beans in that bag might have put all the green
beans first and didn’t mix the beans at all, so it is pretty likely that the first 30 beans are all red, even
if the true proportion is 50%.
There are other types of questions statisticians are interested in. For example, some statisticians study
how, when, and under which assumptions we can infer causal relationships from data (e.g. infer a causal
relationship between smoking and cancer from data). We will talk about this later in the course.
1
Harvey Motulsky on StackExchange proposes the following diagram which summarizes the section pretty well:
Probability
General → Specific
Population → Sample
Model → Data
Statistics
General ← Specific
Population ← Sample
Model ← Data
In this course we will cover probability first and then move on to statistics (labs will be an exception!).
2
Basic Probability
Here I will follow Chapter 1 of our textbook pretty closely.
2.1
Sets, Experiments, Sample Spaces, and Events
For us, a set is simply a collection of objects. We can define sets by listing its elements (for example,
A = {a, e, i, o, u} or B = {1, 2, 3, 4}) or by giving a complete description (for instance, A is the set of
vowels, B is the set of positive integers stricly less than 5). An experiment will be “anything” whose
outcome is yet unknown to us but for which we know the possible set of outcomes in advance. The sample
space is the set of possible outcomes. An event is a subset of the sample space. For example:
• Tossing a coin twice is an experiment with sample space equal to
{heads/heads, heads/tails, tails/heads, tails/tails}.
An example of event is “obtaining the same outcome twice”, which corresponds to the subset
{heads/heads, tails/tails}
of the sample space.
• Rolling a die is an experiment with sample space equal to {1, 2, 3, 4, 5, 6}. The event “obtaining an
odd number” corresponds to the subset {1, 3, 5} of the sample space.
• Asking ourselves whether Duke basketball will win the national championship in 2016 also counts as
an experiment, since the set of possible outcomes is known ({yes, no}) but the outcome is something
we don’t know yet.
Exercise 1. Come up with 4 examples of experiments. Specify their sample space and give an example of
an event for each of them.
2
2.2
Interpretations of Probability
There are many different interpretations of probability. Philosophers (and some statisticians) are still debating
on this issue. Here are 3 very rough explanations of 3 interpretations of probability we will use in this course:
• Principle of indifference: Break down the sample space until you can convince yourself (and others) that
there is no reason to consider one outcome more likely than another. Then assign equal probabilities
to all of them. For example, if we are rolling a die and it is “fair”, we can say that the outcomes
1,2,3,4,5,6 are equally likely because of “symmetry” or “physics”. Then, the probability of an event is
defined as (number of favorable outcomes)/(number of possible outcomes).
• Limiting frequencies: This one is easier to understand. The probability of an event can be interpreted
as the limit of frequency of times it would occur if we were to repeat the experiment ad infinitum. For
instance, we can interpret the probability of the event “rolling a die and getting a 6” as the long-run
proportion of times we get a 6 as we roll the die again and again. Single events such as “the next time
I roll a die I will get a 6”, or “Duke basketball will win the national championship in 2016” don’t fit very
well here.
• Degree of belief: The probability of an event is your degree of belief that it will happen. Different
people have different opinions and, given an event, two agents can assign different probabilities. If my
beliefs about uncertain propositions are coherent and I want to update them in light of data, probability
calculus is the way to do it.
We will not spend much time discussing the pros and cons of each of them, and our interpretation will
depend on the context. Maybe we could use different simbols for the different interpretations, but (almost)
nobody does that. If you are interested, you can take a look at this article or ask me.
It turns out that the mathematical definition of probability doesn’t depend on how we interpret it. A
probability measure will be defined as a function that maps events to numbers between 0 and 1 and satisfies
some properties. Before we introduce the mathematical definition of probability, though, we need to brush
up some basic set theory.
2.3
Basic Set Theory
The empty set ∅ is the set containing no elements. The symbol ∈ denotes set membership and 6∈ denotes
that an element is not a member of a set. If A and B are sets, A is a subset of B (A ⊂ B) if x ∈ A implies
x ∈ B. Two sets are equal if A ⊂ B and B ⊂ A. Now we define some operations on sets:
• Union: x ∈ A ∪ B if x ∈ A or x ∈ B (or both).
• Intersection: x ∈ A ∩ B if x ∈ A and x ∈ B.
• Complement: (with respect to a universal set Ω) Let A ⊂ Ω. Then x ∈ Ac if x ∈ Ω but x 6∈ A.
• Set difference: Let A and B be subsets of Ω. Then, A \ B = A ∩ B c : that is A \ B are the x ∈ Ω
such that x ∈ A and x 6∈ B.
• Cardinality: |A| is the number of elements in A.
• Power set: P(A) is the collection of all subsets of A.
3
Two sets A and B are said to be disjoint if A ∩ B = ∅ (i.e. they have no elements in common).
Examples:
• Let Ω = {1, 2, 3, 4, 5, 6, 7, 8} and A = {0, 1, 2, 3, 4}, B = {2, 3}, C = {3, 4, 5, 7}. Then A ∪ B = A,
A ∩ B = B, B ∪ C = {2, 3, 4, 5, 7}, B ∩ C = {3}, A ∪ C = {0, 1, 2, 3, 4, 5, 7}, A \ B = {0, 1, 4},
AC = {5, 6, 7, 8}, |A| = 5, |B| = 2, P(B) = {∅, {2}, {3}, {2, 3}}, etc.
• Let N0 = {0, 1, 2, 3, ... }, O = {1, 3, 5, 7, ... }, E = {0, 2, 4, 6, ... }. Then O ⊂ N0 , E ⊂ N0 ,
O ∪ E = N0 , O ∩ E = ∅, etc.
Exercise 2. Let Ω be the universal set and let A, B ⊂ Ω. Answer the following questions, justifying your
answers.
1. What is A ∪ Ac ?
2. What is A ∩ Ac ?
3. Assume A ⊂ B. What are A ∩ B and A ∪ B?
We finish this section with a couple of useful results:
• De Morgan’s laws: (A1 ∪ A2 ∪ · · · ∪ An )c = (Ac1 ∩ Ac2 ∩ · · · ∩ Acn ) and (A1 ∩ A2 ∩ · · · ∩ An )c =
(Ac1 ∪ Ac2 ∪ · · · ∪ Acn ).
• Inclusion-Exclusion formula: |A ∪ B| = |A| + |B| − |A ∩ B|.
2.4
Mathematical Definition of Probability
Let Ω be the sample space of an experiment and A be the collection of events, which is a suitable collection
of subsets of Ω. A probability measure is a function that takes events in A as inputs and satisfies:
1. P (A) ≤ 1 for all A ∈ A.
2. P (Ω) = 1.
3. If A1 , A2 , ... are disjoint events, then P (∪i Ai ) =
P
i
P (Ai )
That is, it assigns numbers between 0 and 1 for all events, the probability of the universal set is 1, and the
probability of the union of disjoint events equals the sum of the probabilities of the events. If A is an event,
the interpretation of P (A) is “the probability that A happens”.
Examples:
• Suppose we toss a fair coin twice. Let H denote heads and T denote tails, so the sample space
is {HH, HT, TH, TT}. Since the coin is not loaded, they are all equally likely: P ({TT}) = P ({TH}) =
P ({HT}) = P ({TT}) = 1/4. The probability of obtaining the same outcome twice is P ({TT} ∪ {HH}),
and since {TT} and {HH} are disjoint, we have P ({TT} ∪ {HH}) = 1/2.
Properties: Let Ω be the sample space, and let A1 and A2 be events:
1. P (∅) = 0.
2. 0 ≤ P (A) ≤ 1.
4
3. If A1 ⊂ A2 , P (A1 ) ≤ P (A2 ).
4. If Ac = Ω \ A is the complement of A, P (Ac ) = 1 − P (A).
5. P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ).
6. P (A1 ) = P (A1 ∩ A2 ) + P (A1 ∩ Ac2 ).
Exercise 3. Show the properties above.