Download Sampling Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Sampling
Distributions
Welcome to inference!!!!
Chapter 9.1/9.3
Parameter

A Parameter is a number that describes the population.


A parameter always exists but in practice we rarely know
it’s value b/c of the difficulty in creating a census.
We use Greek letters to describe them (like μ or σ). If we
are talking about a proportion parameter, we use rho (ρ)
Ex: If we wanted to compare the IQ’s of all American and
Asian males it would be impossible, but it’s important to
realize that μAmericans and μmales exist.
 Ex: If we were interested in whether there is a greater
percentage of women who eat broccoli than men, we
want to know whether ρwomen > ρmen

Statistic



A statistic is a number that describes a sample. The value of a
statistic can always be found when we take a sample . It’s
important to realize that a statistic can change from sample
to sample.
Statistics use variables like 𝑥 , s, and 𝑝 (non greek).
We often use statistics to estimate an unknown parameter.



Ex: I take a random sample of 500 American males and find
their IQ’s. We find that 𝑥 = 103.2.
I take a random sample of 200 women and find that 40 like
broccoli. Then 𝑝w = .2
IMPORTANT! A POPULATION NEEDS TO BE AT LEAST 10 TIMES AS
BIG AS A SAMPLE TAKEN FROM IT. IF NOT, YOU NEED A
SMALLER SAMPLE
Bias
 We
say something is biased if it is a poor
predictor
Variability
*Variability of population doesn’t change- (scoop example)
size of scoop matters
How can we use samples to find parameters if
they give us different results?





Imagine an archer shooting many arrows at a target: 4
situations can occur
a) High bias, low variability
b) Low bias, low variability
c) High bias, high variability
d) Low bias, low variability--IDEAL
Here’s an example:



Suppose our goal was to estimate μamerican male IQ.
We can’t take a census, so we take many samples. We find the
average IQ of american males in each sample (𝑥).
Describe a situation the matches each of the 4 possibilities
described.

a) if our many samples of IQ’s are consistent but higher than the
true average IQ of AM’s, then we have a situation with high bias
and low variability

b) If our many samples are inconsistent- some high, some low
than the true mean of AM’s Iqs, we have low bias and high
variability.
 If
our sample means are not close to each
other but all higher than μAM then we
have high bias and high variability (c)
 Finally,
if our samples all just slightly higher
or slightly lower than μAM, we have our
desired situation: low bias and low
variability.
But…

You aren’t taking a bunch of samples…you’re only going
to take 1! and we want it to predict μam

If we used the data from situation d) then any of the
samples would provide a good predictor for μam

We already know some ways to get a good sample- using
an SRS and being very sure to have no bias when choosing
our sample.

Inference is using our sample statistic (assuming it’s a good
sample) to predict our parameter with a certain degree of
confidence.
The Sampling Distribution
 The
sampling distribution of a statistic is the
distribution of means of all possible samples of
the same size from the population.
 When we sample, we sample with replacement.
 A sampling distribution is a sample space - it
describes everything that can happen when we
sample.
Cool demo
Using Sampling Distributions
 If
the population is Normal, then the sampling
distribution will also be Normal.
 WE
CAN ONLY DO CALCULATIONS BASED ON NORMAL
DISTRIBUTIONS!
 The
Sampling Distribution of means has a mean of
σ
μ and a standard deviation of
𝑛
 N(μ,


σ
𝑛
)
THIS IS ONLY TRUE IF (a) the population is Normal
…or (b) what we will look at next class
Example
 The
true average study time for a final exam in
history is found to be 6 hours and 25 minutes with
a standard deviation of 1 hour and 45 minutes.
Assume the distribution is normal. N(6.417, 1.75)



What is the probability that a student chosen at
random spends more than 7 hours studying?
Normalcdf(7,100,6.417,1.75) = 37%
What is the probability that an SRS of 4 students will
average more than 7 hours in studying?
Normalcdf(7,100,6.417,1.75/√4) = 25.3%.
Why did the probability go down?
A
student to study more than 7 hours is not probable…a
group of 4 to average more than 7 is less probable.
Sampling Distributions
 At
the sample size increases, the standard
deviation of the sampling distribution
decreases (the variability decreases)
Homework:
 Study
Guide 9.3
 Exercises 9.11-9.13, 9.17, 9.31-9.34
 (hint for 9.33…use algebra!)
 Next Class: Bring 25 pennies/pair of
students.