Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Chapter 7: Sampling and Sampling distributions
Statistical Inference is to make decisions that are based on
data. We will study hypothesis testing, which is one of the
tools of statistical inference. To do hypothesis testing, you
will need to know sampling distributions, which are the
distributions you encounter when doing sampling.
For example, you want to estimate the mean of a distribution.
You estimate the mean from your sample. If you continue to
do this, you get many different values of sample means. If you
find the frequency distribution of these means, that will
approximate a sampling distribution. The continuous form
of the distribution as the number of samples approaches
infinity is a probability distribution know as a sampling
distribution.
7.2 Sampling Distribution
Central limit theorem: The sampling distribution of
sample means approximates a normal distribution as
the sample size becomes large no matter what the
shape of the population distribution is.
Xn
X1
Population
mean , variance 2
X5
X3
X2
X4
X  
X  / n
If sample size > 30, the distribution of the sample mean will
approach a normal distribution.
The mean of the sampling distributi on of means is the population mean
X  
The variance of the means is inversely proportion al to the sample size
X 
2
X 
2
n

(  2 is the population variance)
[7.2]
(Standard error of the means)
[7.3]
n
Standard error means the standard deviation of a sampling distributi on.
X 
The z - value for means is z 
/ n
Example
Assume that a test is normally distributed with a mean of 800
and a standard deviation of 100.
a) What is the probability that one person selected at random
from the population will have a score at or above 850?
z = (X - )/ = (850 – 800)/100 = 0.5
P(z > 0.5) = 0.3085
b) What is the probability that a sample of 20 people selected
at random from the population will have a mean score above
850?
z  ( X   ) /( / n )  (850  800 ) /(100 / 20 )  2.24
P( z  2.24 )  0.0125
Part a requires a normal population to be valid; part b is valid
without a normal population although a sample size of 30 or
more would be better.
Learning Activity 7.2-2 Sampling distribution calculations
Assume a machine produces parts with a mean diameter of
60.2 and a s.d. of 2.4.
What is the probability that a randomly selected part will have
a diameter greater than 62?
What is the probability that a sample of 17 parts will have a
mean diameter greater than 62?
See SamplDist.xls!Solution
Proportions (百分比)
A proportion is a special case of a mean where the data are
0 and 1. The central limit theorem applies to proportions.
(See 7.C-2 Proportion as a mean.)
Mean of the proportion is ( the population proportion)
p  
Standard error of the proportion (s. d. of the proportion)
p 
 (1   )
n
[7.5]
where p represents a sample proportion, and  a population
proportion.
The mean of the proportion is  and its variance is (1-)/n.
Let X be a random variable representing the number of 1s in
a sample of n 0s or 1s with the population proportion . Then
X has a binomial distribution with E(X) = n and V(X) = n(1- ).
Now what are the mean and variance for the sample
proportion p = X/n, which itself is a mean?
The central limit theorem applies.
That is, p is normally distributed if n is large, and
E(p) = E(X/n) = n/n = ,
V(p) = (1/n)2V(X) = (1/n2) n(1 - ) = (1-)/n
s.d. of p (or standard error) is
 (1   )
n
Actually, the population proportion  is often unknown.
We will use sample proportion p in the above equation.
Target


Sample
size
n
(mean)

(proportion)
Estimator
ˆ
X
Mean
E (ˆ)
Standard
error
 ˆ

/ n

 (1   )
sample
mean
n
p = X/n
sample
proportion
: population mean, 2: population variance
: population proportion
n
7.A Generating Random Numbers
In selecting random samples, it is necessary to generate
random numbers. Random numbers are also used for
Simulations and can be used to create sample datasets.
In Excel you can generate random numbers
Random numbers
Between 0 and 1
Between 0 and 100
Integers between 0 and 99
Integers between 1 to 100
Between a and b
Excel function
=RAND() [note: no 1s]
=RAND()*100 [note: no 100s]
=INT(RAND()*100)
=INT(RAND()*100) + 1
=RAND()*(b – a) + a
You can also use MegaStat | Generate Random Numbers to
generate numbers with uniform, normal or exponential
distributions.
Learning Activity 7.A-1 Generating Random Numbers
 Open RandomNumbers.xls!Start
Create a sample of 20 random numbers between 1 and 100
by using RAND() function.
 Use ROUND function to round the value to 3 decimal places.
See RandomNumbers.xls!RandInt.
 Use MegaStat | Generate Random Numbers to generage
300 normal random numbers with mean of 100 and s.d of 16,
specifying 0 decimal places and live function.
 Look at how the normal random number are generated by
=ROUND(NORMINV(RAND(),100,16),0).
[The values from RAND() are random probabilities that are
input into NORMINV() function to create normally distributed
random numbers.]
Note: NORMINV(probability, mean, standard_dev)
 Check the histogram of the random numbers you created.
Randomizing data (to select random samples)
You can rearrange (or shuffle) a column of existing values
randomly.
Learning Activity 7.A-2 Randomizing Data (take n samples
randomly from a pool of data)
 Open RandomNumbers.xls!PriceData.
 Type =RAND() in cell C2 and copy it down through C125.
 Click anywhere in the random number range and then
click Excel’s Sort Ascending or Sort Descending.
 You can take the first n values, they are your random
samples.
 If you want to put the numbers back in their original order,
sort the No. column.
7.B Central Limit Theorem Simulation
 Open CLT.xls.
The Excel workbook contains 600 random samples
(=RAND()*100,1), i.e. from a uniform population, and
summarize the distribution of the population and the
distribution of the means, that is the sampling distribution.)
 See the distribution of the means.
7.C-2 Proportion as a mean
A professor asks each of his student if he or she owns
or rents and codes the data 1 = own, 0 = rent.
The proportion of his students who owns their own home
is p = count/n = 11/25 = 0.44. You will get the same
answer if you calculate the mean of the data.
 Open Proportions.xls!Start.
 Check Solution1.
 Open Proportions.xls!Practice.
 Check Solution2.