Download Week 8 Feb. 28-29

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Inference: Probabilities and
Distributions
Feb 28 - 29, 2012
A funny little thing called probability
• As we noted earlier, when we take a sample and
conduct a study we want to generalize (or infer)
the results of our study to the wider population
our sample is drawn from.
• If 40 percent of our sample say they will vote
Conservative we would like to estimate that this
is the situation among the general population
• However, we know there is a chance that if we
had drawn two separate samples and done two
simultaneous studies, we might have gotten
different results for each sample
• If the variation of results between samples is too
great, then we cannot generalize our results to
the wider population.
• Therefore we need to know about probability to
estimate the chance that our results will vary
from the actual situation in the population at
large.
Because there is always a
probability we could be wrong…
• We always state our confidence interval and our margin
of error
• For example, a pollster might tell us there is a 95%
chance that our survey result is accurate plus or minus
3% (meaning there is a 95% chance that the real support
for the Conservatives among the general population is
between 37% and 43%)
• 95% is our confidence interval
• +/- 3% is our margin of error
• More will be said about these terms in later weeks
In the Long-Run
• The law of probability is based on a
regularly documented observation that
while chance can produce erratic results
over the short-term or when small
numbers are looked at, it generates
regular and predictable outcomes over the
long-term and as numbers increase
• Random
– Outcomes are uncertain but there is
nonetheless a regular distribution of outcomes
in a large number of repetitions (this is not a
pattern).
• Probability
– The probability of any outcome of a random
phenomenon is the proportion of times the
outcome would occur in a very long series of
repetitions.
Examples of Commonly Known Probable
Long-term Results
• Coin Tossing
– Fifty/Fifty
• Stockmarket returns
– Reversion to the mean
• The fall of cards in a game (if you are able
to count them properly and quickly
enough)
Randomness
• Perfect Randomness is very rare and the
ability to select numbers totally at random
(so that each and every other number had
just as good a chance of being selected
and no pattern can ever be predicted) is
valuable
Probability Models
• Sample Space “S” of a random phenomenon is
the set of all possible outcomes
• An Event is an outcome or a set of outcomes of
a random phenomenon (the roll of the dice, the
flip of the coin, etc.)
• A Probability Model is a mathematical
description of a random phenomenon consisting
of
– A sample space S
– A way of assigning probability to events
As in figure 10.2 in the book: There are 36 possible combinations if
you roll two standard dice. If we wanted to define a sample space S
for (5) it would be comprised of the four possible ways to roll 5 (i.e. the
four “events” that result in 5
A ={ roll 1 & 4, roll 2 & 3, roll 3 & 2, roll 4 &1}
Graphics: Moore 2009
Some Formal Probability Rules
• A probability is a number between 0 and 1
– An event with a probability of 0 ought never to occur,
An event with a probability of 1 out to always occur
• All possible outcomes together must equal 1
• If two events have no outcomes in common, the
probability that one or the other occurs is the
sum of their individual probabilities. Eg. If one
event occurs in 40% of cases, and the other in
25% and the two cannot occur together then the
probability of one or the other occurring is 65%
• The probability that an event does not occur is 1
minus the probability that it does occur
Discrete vs. Continuous Models
• Discrete Probability Models
– Assume that the sample space is finite
– To assign probabilities list the probabilities of all the
individual outcomes (must be between 1 and 0 and
add up to 1).The probability of an event is the sum of
the outcomes making up the event.
– Think of our dice example: What is the probability of
rolling a five?
•
•
•
•
•
roll 1 & 4 = 1/36
roll 2 & 3 = 1/36
roll 3 & 2 = 1/36
roll 4 &1 = 1/36
Total Probability = 4/36 = 1/9 = 0.111
• Continuous Probability Models
– Assign probabilities as areas under a density curve
(such as the normal curve)
– The area under the curve and above any range of
values is the probability of an outcome in that range
– This is what we did in chapter 3!
Break Time
Sampling Distributions
• Some Key Words
– Parameter: a number that describes the
population. We often can only speculate on
this as we only have data for a sample.
– Statistic: is a number that can be computed
from the sample data without making use of
any unknown parameters. We often use
statistics to estimate parameters.
• The Law of Large Numbers:
– Draw observations at random from any
population with finite mean µ .
– As the number of observations drawn

increases, the mean xof the observed values
gets closer and closer to the mean µ of the
population .
Two types of distribution of
variables (be careful)
• The Population Distribution of a variable is
the distribution of values of the variable in
the population
• The Sampling Distribution of a statistic is
the distribution of values taken by the
statistic in all possible samples of the
same size from the same population (in
other words, how a statistic varies in many
samples drawn from the same population)
For those who like math
Suppose that
X
is the mean of a SRS of size “n” drawn from a large population with
mean
and standard deviation


Then the sampling distribution of
and standard deviation

n
X
has mean

The Central Limit Theorem
• In fact life gets better still
• As we saw earlier, the mean of a sampling
distribution will approach the mean in the
population if you draw a big enough sample
often enough
• Something better happens with the shape of this
sampling distribution
– Even if the population distribution is not normal, when
the sample is large enough, the distribution of the
mean changes shape so as to approach normal
(provided the population has a finite standard
deviation).
Bottom Line and Caution
• If we can compute the average for a large
random sample we have a decent guess as to
what the average is in the population
• The average of a sample is generally a better
guess of the average of the population than any
one case in the population. In other words,
based on a survey of incomes, I can tell you with
a reasonable level of certainty what the average
income of Canadians is. I cannot tell you just
from that what the income of any specific
Canadian is.