Download Week 8 Feb. 28-29

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia, lookup

History of statistics wikipedia, lookup

Probability wikipedia, lookup

Probability interpretations wikipedia, lookup

Ars Conjectandi wikipedia, lookup

Transcript
Inference: Probabilities and
Distributions
Feb 28 - 29, 2012
A funny little thing called probability
• As we noted earlier, when we take a sample and
conduct a study we want to generalize (or infer)
the results of our study to the wider population
our sample is drawn from.
• If 40 percent of our sample say they will vote
Conservative we would like to estimate that this
is the situation among the general population
• However, we know there is a chance that if we
had drawn two separate samples and done two
simultaneous studies, we might have gotten
different results for each sample
• If the variation of results between samples is too
great, then we cannot generalize our results to
the wider population.
• Therefore we need to know about probability to
estimate the chance that our results will vary
from the actual situation in the population at
large.
Because there is always a
probability we could be wrong…
• We always state our confidence interval and our margin
of error
• For example, a pollster might tell us there is a 95%
chance that our survey result is accurate plus or minus
3% (meaning there is a 95% chance that the real support
for the Conservatives among the general population is
between 37% and 43%)
• 95% is our confidence interval
• +/- 3% is our margin of error
• More will be said about these terms in later weeks
In the Long-Run
• The law of probability is based on a
regularly documented observation that
while chance can produce erratic results
over the short-term or when small
numbers are looked at, it generates
regular and predictable outcomes over the
long-term and as numbers increase
• Random
– Outcomes are uncertain but there is
nonetheless a regular distribution of outcomes
in a large number of repetitions (this is not a
pattern).
• Probability
– The probability of any outcome of a random
phenomenon is the proportion of times the
outcome would occur in a very long series of
repetitions.
Examples of Commonly Known Probable
Long-term Results
• Coin Tossing
– Fifty/Fifty
• Stockmarket returns
– Reversion to the mean
• The fall of cards in a game (if you are able
to count them properly and quickly
enough)
Randomness
• Perfect Randomness is very rare and the
ability to select numbers totally at random
(so that each and every other number had
just as good a chance of being selected
and no pattern can ever be predicted) is
valuable
Probability Models
• Sample Space “S” of a random phenomenon is
the set of all possible outcomes
• An Event is an outcome or a set of outcomes of
a random phenomenon (the roll of the dice, the
flip of the coin, etc.)
• A Probability Model is a mathematical
description of a random phenomenon consisting
of
– A sample space S
– A way of assigning probability to events
As in figure 10.2 in the book: There are 36 possible combinations if
you roll two standard dice. If we wanted to define a sample space S
for (5) it would be comprised of the four possible ways to roll 5 (i.e. the
four “events” that result in 5
A ={ roll 1 & 4, roll 2 & 3, roll 3 & 2, roll 4 &1}
Graphics: Moore 2009
Some Formal Probability Rules
• A probability is a number between 0 and 1
– An event with a probability of 0 ought never to occur,
An event with a probability of 1 out to always occur
• All possible outcomes together must equal 1
• If two events have no outcomes in common, the
probability that one or the other occurs is the
sum of their individual probabilities. Eg. If one
event occurs in 40% of cases, and the other in
25% and the two cannot occur together then the
probability of one or the other occurring is 65%
• The probability that an event does not occur is 1
minus the probability that it does occur
Discrete vs. Continuous Models
• Discrete Probability Models
– Assume that the sample space is finite
– To assign probabilities list the probabilities of all the
individual outcomes (must be between 1 and 0 and
add up to 1).The probability of an event is the sum of
the outcomes making up the event.
– Think of our dice example: What is the probability of
rolling a five?
•
•
•
•
•
roll 1 & 4 = 1/36
roll 2 & 3 = 1/36
roll 3 & 2 = 1/36
roll 4 &1 = 1/36
Total Probability = 4/36 = 1/9 = 0.111
• Continuous Probability Models
– Assign probabilities as areas under a density curve
(such as the normal curve)
– The area under the curve and above any range of
values is the probability of an outcome in that range
– This is what we did in chapter 3!
Break Time
Sampling Distributions
• Some Key Words
– Parameter: a number that describes the
population. We often can only speculate on
this as we only have data for a sample.
– Statistic: is a number that can be computed
from the sample data without making use of
any unknown parameters. We often use
statistics to estimate parameters.
• The Law of Large Numbers:
– Draw observations at random from any
population with finite mean µ .
– As the number of observations drawn

increases, the mean xof the observed values
gets closer and closer to the mean µ of the
population .
Two types of distribution of
variables (be careful)
• The Population Distribution of a variable is
the distribution of values of the variable in
the population
• The Sampling Distribution of a statistic is
the distribution of values taken by the
statistic in all possible samples of the
same size from the same population (in
other words, how a statistic varies in many
samples drawn from the same population)
For those who like math
Suppose that
X
is the mean of a SRS of size “n” drawn from a large population with
mean
and standard deviation


Then the sampling distribution of
and standard deviation

n
X
has mean

The Central Limit Theorem
• In fact life gets better still
• As we saw earlier, the mean of a sampling
distribution will approach the mean in the
population if you draw a big enough sample
often enough
• Something better happens with the shape of this
sampling distribution
– Even if the population distribution is not normal, when
the sample is large enough, the distribution of the
mean changes shape so as to approach normal
(provided the population has a finite standard
deviation).
Bottom Line and Caution
• If we can compute the average for a large
random sample we have a decent guess as to
what the average is in the population
• The average of a sample is generally a better
guess of the average of the population than any
one case in the population. In other words,
based on a survey of incomes, I can tell you with
a reasonable level of certainty what the average
income of Canadians is. I cannot tell you just
from that what the income of any specific
Canadian is.