Download Sampling(cont.)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Statistics and Quantitative
Analysis U4320
Segment 5: Sampling and inference
Prof. Sharyn O’Halloran
Sampling

A. Basics

1. Ways to Describe Data



Histograms
Frequency Tables, etc.
2. Ways to Characterize Data

Central Tendency




Mode
Median
Mean
Dispersion


Variance
Standard Deviation
Sampling(cont.)

3. Probability of Events

If Discrete


Rely on Relative Frequency
If Continuous

Rely on the distribution of events


4. Samples


Example: Standard Normal Distribution
We can take a sample of the population and make
inferences about the population.
5. Central Question

How well does the sample represent the underlying
population?
Sampling

(cont.)
B. Random Sampling

1. Problems with Sample Bias

The way we collect our data may bias our results. That
is, the average response in our sample may not
represent the average response in the whole population.


Examples:
 Literary Digest Phone Book Poll
 Primaries
 Relation between economic growth and education
looking only at OECD countries
2. Solution
 Random Sampling
Sampling

(cont.)
C. Moments of the Sample

1. Characteristics of Sample Mean
 2 = variance
 = mean
Sampling

(cont.)
Example

X
Draw a single observation

Sampling

X
(cont.)
Draw two observations
mean= X

X
Sampling

X
X
(cont.)
Draw 4 Observations
mean= X

X
X
Sampling

(cont.)
2. Generalization




Every sample has an expected mean of .
But as our sample size increases, we are more confident
of our results.
That is, the standard deviation (or standard error as we
will call it) of our results is decreasing.
So as N increases, X  
Sampling

(cont.)
3. Hat Experiment


Mean = 10.5
Standard deviation  = 5.77



Now let's take a sample of size 1. (With replacement.)
Now one of size 2.
Now one of size 6.

10.5=
Sampling

(cont.)
4. Equations

For a sample of size n from a population of mean  and
standard deviation , the sample mean X has:
E( X )  
SE( X ) 


n
.
SE( X ): it's called the standard error of the
sampling process.
Inference
We make inferences about a population from a
given sample.
 A. Population and Sampling Parameters
 We have a population with parameters  and .


We then take a sample with parameters X and s.
We want to know how well the sample mean X
approximates the population mean .
Inference (cont.)
draw sample
Population
Sample

X
SE(X)
x, s
make inference about how good an estimate
X is of 

On average the sample mean equals the
population mean.
SE( X ) =

n
Inference (cont.)

B. Referring Back to the Hat Experiment


1. Sample Error decreases as n increases
For instance, before we drew samples of sizes 1, 2, and
6 from the hat.



The first sample of size 1 had standard error 5.77/1 = 5.77.
The second sample of size 2 had standard error 5.77/ 2 = 4.08.
The third sample of size 6 had standard error 5.77/6 = 2.36.
Inference (cont.)

C. Shape of the Sampling Distribution

If you take a sample and find its mean, then take
another sample and find its mean and repeat this
process a large number of times then

X is a random variable with its own mean and standard
error.
Inference (cont.)

1. Central Limit Theorem

Take a large number of samples, then, the sample mean X
is normally distributed with mean  and standard error.
n
Standard Error

Inference (cont.)

2. Example: 3 different distributions
 Example 1;

A population of men on a small, Eastern campus has
a mean height =69" and a standard deviation
=3.22". If a random sample of n=10 men is drawn,
what is the chance that the sample mean will be
within 2" of the population mean?
Inference (cont.)

Answer:

From the Central Limit Theorem, we know that X is
normally distributed, with mean 69 and standard
error:

n
=
3.22
= 1.02.
10
Standard Error= 1.02
X = 67

X = 71
Inference (cont.)

Answer (cont.)


Find z-score
P(Z>1.96) = 0.025. Since there are two tails, the
area in the middle is:
1-.025-.025 = .95.
So there's a 95% probability that the sample mean
falls between 67 and 71.
Inference (cont.)

Example 2:


Suppose a large class in statistics has marks
normally distributed around  = 72 with  = 9. Find
the probability that
a) An individual student drawn at random will have
a mark over 80.
Inference (cont.)

Answer:


The Z-score is (80-72)/9 = .89
Looking this up in the table gives P(Z>.89) = .187, or
about 19%.



80
b) Now, what's the probability that a sample of size 10
has an average of over 80?
Inference (cont.)

Answer:




The standard error is n = 9/  10 = 2.85.
So the Z-Score becomes (80-72)/2.85 = 2.81.
P(Z> 2.81) = .002.
.002
SE = 2.85

80
Inference (cont.)

Example 3: I

f the number of miles per gallon achieved by all cars
of a particular model has  = 25 and  = 2, what is
the probability that for a random sample of 20 such
cars, average miles per gallon will be less than 24?
(assume that the population is normally distributed.)
 Step 1: Standardize X
X   24  25

P( X <24) = P
SE
SE
SE =

n
LM
N
O
PQ
= 2/20 = .4472
P( X <24) = P
LMX    24  25O
= 2.24
P
SE
.
4472
N
Q
Inference (cont.)

Step 2: Then Find the Z scores (From the
standard Normal tables)
= P[Z<-2.24]
= P[Z>2.24] = 0.013 (by symmetry)
.013
SE = 0.4472
24


26
So there is about a 1.3 percent chance that from a
sample of 20 the average will be less than 24.
Inference (cont.)

D. Proportions

1. Proportions as Means


A proportion (P) is just the mean of a dichotomous
variable.
Example

Ask 50 people what they think of Clinton;



0 if think he's doing a poor job; and
1 if think he is doing a good job.
Suppose 30 of the 50 respondents say he's doing a good
job


Then, the sample mean P is 30/50 = .60.
This is just another way of saying that 60% of those
surveyed approved of his job performance.
Inference (cont.)

2. Formula for Standard Error

For a large enough sample of size n, P (the
proportion) will be normally distributed with
mean  and standard deviation .




Population Mean  = Population Proportion 
Sample Mean = Sample Proportion P
Population SD  =  (1  )
SE 
 (1   )
n
.
Inference (cont.)

3. Example: Polling


Suppose that the true approval rating for
Clinton is .50. That is, 50 percent of the
population believe he is doing a good job.  =
.5
If we sample 50 people, what is the probability
that we will observe an approval rating as high
as 60 percent or above?
Inference (cont.)

We know that the true population mean is =.5,



.5(1-.5)
The Standard Error =
= 0.0707
50
Then the Z-score is (.6-.5) / 0.0707= 1.414
Looking this up in the Z-table, P(Z>1.414) = .079, or
about 8 %.
Inference (cont.)

4. Example

Of your first 15 grandchildren, what is the
chance that there will be more than 10 boys?
Inference (cont.)

Answer:


What the probability is that the proportion of
boys is at least 10/15=2/3.
We know that the population mean is =1/2,

The standard error = .5(1-.5)  0129
.
15


Then the Z-score is (.667-.5) / 0.129 = 1.29.
Looking this up in the table, P(Z>1.29) =
.099, or about 10%.
Point Estimation:
Properties

A. Unbiased Estimators

When an estimator has the property that it
converges to the correct value, we say that
it is unbiased.
Def of Unbiased:
towards .
as N , then X
converges
Point Est. Properties

(cont.)
B. Efficient Estimators

Def of Efficient: One estimator is more
efficient than another if its standard error
is lower.
Point Est. Properties

(cont.)
C. N-1 Problem
2
 1.  Known
(
X


)
 i
2
 

N
When we take a sample of size n, if we had the
real from the population, we could calculate
s2 

2
(
X


)
 i
n
Then there wouldn't be a problem; s2would be a
2
consistent estimator of  , if we knew  .
Point Est. Properties



(cont.)
2. Unknown
But we usually don't have , so we have to use
the sample mean X instead. What's the
difference? Why don't we just say that
2
(
X

X
)
 i
s2 
n
It turns out that we can show that X minimizes
the expression  ( X i  _ _ ).2
Point Est. Properties



2. Unknown (cont.)
So if we used  instead, then, the expression
would be bigger.
The right way to correct for this is to multiply by
n , so
n 1
s2
(X


 s2 

(cont.)
i
 X )2
n

n
n 1
2
(
X

X
)
 i
n 1
.
The bottom line is that we use n-1 to make a
consistent, unbiased estimate of the population
variance.
IV. Review Homework

IV. Review Homework
Related documents