Download Chapter 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 8 Goals
1. Explain why a sample is the only feasible
way to learn about a population
2. Describe methods to select a sample:
•
•
•
Simple Random Sampling
Systematic Random Sampling
Stratified Random Sampling
3. Sampling Error
4. Sampling Distribution Of The Sample
Mean
5. Central Limit Theorem
1
So Far, & The Future…
Chapter 2-4
 Descriptive statistics about something that has already
happened:
 Frequency distributions, charts, measures of central tendency,
dispersion
Chapter 5, 6, 7
 Probability:
 Probability Rules
 Probability Distributions
 Probability distributions encompass all possible outcomes of an
experiment and the probability associated with each outcome
 We use probability distributions to evaluate something that might
occur in the future
 Discrete Probability Distributions :
 Binomial
 Continuous Probability Distributions:
 Standard Normal
2
So Far, & The Future…
Chapter 8
Inferential statistics: determine something
about a population based only on the sample



Sampling

A tool used to infer something about the
population

Talk about 3 probability sampling methods
Construct a Distribution Of The Sample Mean


Sample means tend to cluster around the
population mean
Central Limit Theorem

Shape of the Distribution Of The Sample Mean
tends to follow the normal probability distribution
3
A Sample Is The Only Feasible Way To Learn
About A Population
1. The physical impossibility of checking all items in
the population
 Example:
 Can’t count all the fish in the ocean
2. The cost of studying all the items in a population
 Example:
 General Mills hires firm to test a new cereal:
 Sample test: cost ≈ $40,000
 Population test: cost ≈ $1,000,000,000
4
A Sample Is The Only Feasible Way To Learn
About A Population
3. Contacting the whole population would
often be time-consuming


Political polls can be completed in one or
two days
Polling all the USA voters would take nearly
200 years!
4. The destructive nature of certain tests

Examples:
 Film from Kodak
 Seeds from Burpee
5
A Sample Is The Only Feasible Way To Learn
About A Population
5. The sample results are usually adequate

It is more than likely that the additional
accuracy of testing the whole population
would not add a significant amount of
improvement to the sample results
 Example:
 Consumer price index constructed from a
sample is an excellent estimate for a
consumer price index that could be
constructed from the population
6
Probability Sampling
 A sample selected in such a way
that each item or person in the
population has a known
(nonzero) likelihood of being
included in the sample
 Known chance of being selected
7
Probability Sampling
 Some of the methods used to select
a sample:
1. Simple Random Sampling
2. Systematic Random Sampling
3. Stratified Random Sampling
• There is no “best” method of
selecting a probability sample from a
population of interest
• There are entire books devoted to
sampling theory and design
8
Nonprobability Sample
 In nonprobability sampling,
inclusion in the sample is based
on the judgment of the person
selecting the sample
 Nonprobability sampling can lead
to biased results
9
Simple Random Sampling


A sample selected so that each item or
person in the population has the same
chance of being included
Example:
1. Names of classmates in a hat, mix up names,
select until sample size, “n” is reached
2. Using a table of random numbers to select a
sample from a population
1. Appendix
10
Using A Table Of Random Variables To
Prevent Bias In Selecting A Sample To
Represent A Population:

Example:


Here at Highline, select 50 students at random to fill
out questionnaire about tenured faculty performance
Steps:
1. Use last four numbers of student ID
2. Select random method to select starting point in random
number table


Close eyes and point
Month/day
3. Use first four numbers in table and match to last four in
student ID


If first four numbers in table do not match, move to next
This will give us a list of students that will constitute
a sample with size 50
11
Select 50 Students At Random
0.636222
0.518021
0.069223
0.686897
0.701841
0.370206
0.169599
0.87561
0.330619
0.001197
0.412333
0.443448
0.513373
0.593301
0.594944
0.259376
0.925766
0.973137
0.878903
0.086118
0.051821
0.058912
0.441475
0.442265
0.943098
0.257012
0.726719
0.773857
0.946429
0.195945
0.821483
0.154681
0.56642
0.707408
0.928971
0.020498
0.29908
0.492953
0.571346
0.893545
0.100799
0.883327
0.440034
0.655714
0.140657
0.120766
0.668976
0.327642
0.245097
0.904013
0.601252
0.363145
0.224099
0.965596
0.054894
0.896894
0.399572
0.492101
0.659015
0.761954
0.495535
0.396071
0.552255
0.054817
0.269024
0.864876
0.627691
0.787415
0.408146
0.811295
0.862589
0.726818
0.296277
0.269441
0.059926
0.420565
0.434216
0.884567
0.142257
0.426556
0.99422
0.45625
0.959042
0.723678
0.477342
0.10701
0.536719
0.686028
0.337782
0.305371
0.077443
0.609022
0.825418
0.54019
0.785874
0.62314
0.139922
0.518622
0.673014
0.545753
0.84274
0.990872
0.961672
0.878112
0.015435
0.230584
0.677981
0.070409
0.969701
0.554378
0.556095
0.37986
0.508761
0.554989
0.928234
0.679833
0.972124
0.872873
0.256318
0.129111
0.970896
0.990803
0.286428
0.179304
0.127947
0.639576
0.361119
0.477563
0.895135
0.267845
0.337937
0.980878
0.249725
0.849866
0.707176
0.611231
0.003595
0.122207
0.831304
0.371747
0.113549
0.187189
0.202852
0.972905
0.156304
0.307487
0.646497
0.74116
0.815675
0.227214
0.519689
0.200084
0.934343
0.903448
0.043683
0.763613
0.200614
0.717331
0.624948
0.136974
0.551529
0.644085
0.888244
0.274336
0.488557
0.526314
0.793805
0.450962
0.789146
0.107449
0.493393
0.641384
0.687807
0.46489
0.944349
0.857068
0.830702
0.586457
0.818226
0.367109
Student Id #: 5180, 8611, 4929...
 If you encounter one that is in the table, but
there is no corresponding student id, skip it
12
Systematic Random Sampling:

The items or individuals of the population are arranged in
some order





A random starting point is selected and then every kth
member of the population is selected for the sample



Invoice number
Date
Alphabetically
Social security number
By starting randomly, all items have the same likelihood of being
selected for the sample
Example: Audit Invoices for accuracy, start with 43rd
invoice and select every 20th invoice and check for
accuracy
This method should not be used if there is a pattern to
the population, or else you could get biased sample

Example
13
Under Certain Conditions A Systematic Sample
May Produce Biased Results
 Inventory
Count Problem:
 Stacked bins
with faster
moving parts at
the bottom
 Start with 1st
bin and count
inventory for
accuracy in
every 3rd bin
(may result in
biased sample)
 Simple
random
sampling
would be
better for this
situation
6
7
18
19
5
8
17
20
4
9
16
21
3
10
15
22
2
11
14
23
1
12
13
24
Sample size =
8
% of Sample (2/8)
w # Selected
o
2
25.00%
Sl
# Selected
% of Sample (4/8)
od
4
50.00%
M
% of Sample (2/8)
t # Selected
s
2
25.00%
Fa
Slow-moving
bins = 4
moderately
fast-moving
bins = 16
Fast-moving
bins = 2
% Slow to Total (4/24)
16.67%
% Mod to Total (16/24)
66.67%
% Fast to Total (4/24)
16.67%
14
Stratified Random Sampling
 A population is first divided into
subgroups, called strata, and a sample is
selected from each stratum
 Advantage of stratified random sampling:
 Guarantees representation from each
subgroup
15
Proportional Sample Is Selected
6
7
18
19
5
8
17
20
4
9
16
21
3
10
15
22
2
11
14
23
1
12
13
Sample Size, n = 8
w # in Bin % Slow to total (4/24)
o
Sl
4
16.67%
od
# in Bin
16
% Mod to total (16/24)
66.67%
st
a
F
# in Bin
4
% Fast to total (4/24)
16.67%
M
24
Slow-moving
bins = 4
moderately
fast-moving
bins = 16
Fast-moving
bins = 4
Bins selected from
0.1667*8 =
Bins selected from
0.6667*8 =
Bins selected from
0.1667*8 =
stratum (0.1667*8)
1.3336
stratum (0.6667*8)
5.3336
stratum (0.1667*8)
1.3336
► 1
► 6
► 1
16
Cluster Sampling
 First:
 A population is divided into primary units
 Second:
 Primary units are selected at random (not all primary
units will be selected)
 Third:
 Samples are selected from the primary units
 Employed to reduce the cost of sampling a
population scattered over a large geographic
area
 Textbook shows geographic picture
17
Sampling Error
Will the mean of a sample always be equal
to the population mean?
 No!
 There will usually be some error:
 The difference between a sample statistic
and its corresponding population parameter
 Examples:
Xbar – μ
s – σ

18
Sampling Error




1.
2.
These sampling errors are due to chance
The size of the error will vary from one
sample to the next
So how can we make accurate predictions
based on samples???
Answer:
Sampling Distribution Of The Sample Mean
and
The Central Limit Theorem
19
Sampling Distribution Of The Sample
Mean
 A probability distribution of all possible
sample means of a given sample size
 Take a bunch of samples from the same
population
 Calculate the mean for each and plot all the
means
20
Construct Sampling Distribution Of The Sample Mean
n = 36
Sample
Sample 11
Sample 1
Xbar 1
n = 36
Sample 2
Xbar 2
Sample
Sample 22
n = 36
Sample 3
Sample
Sample 33
Xbar 3
..
..
..
.
.
.
Sample n
n = 36
Sample
Sample nn
Plot All Xbar
Xbar n
21
Construct Sampling Distribution Of Sample
Mean
1. Take many random samples of size “n” from a
large population
2. Calculate the mean for each sample
3. Plot all means on graph (frequency polygon)
4. You would see that the curve looks normal!
 Textbook has good example

In particular:


It shows how even if the population yields a skewed
probability distribution, the distribution of sample means
will be approximately normal
Population mean = mean of the distribution of the sample
mean
22
Plot Distribution Of The Sample Mean
(Approximately Normal)
Sampling Distribution
Of The Sample Mean
-3
-2
-1
In Class Construction
Of
Distribution of Sample Means
And Prove that
µ = µbar
0
1
2
 X  Mean of the Distribution of the Sample Means
 X  SD of the Distribution of the Sample Means
3
z
Sample Mean Xbar
23
Central Limit Theorem
• If all samples of a particular size are
selected from any population, the
sampling distribution of the sample mean
is approximately a normal distribution.
This approximation improves with larger
samples
• If population distribution is symmetrical but not
normal, the distribution will converge toward
normal when n > 10
• Skewed or thick-tailed distributions converge
toward normal when n > 30
• Look at picture on page 265
24
Central Limit Theorem
 We can reason about the distribution of the
sample mean with absolutely no information
about the shape of the original distribution from
which the sample is taken
 The central limit theorem is true for all distributions
 Central Limit Theorem will help us with:
 Chapter 9
 Confidence intervals
 Chapter 10
 Tests of Hypothesis
25
Mean Of The Distribution Of The Sample
Sum of all sample means
Mean
 
X
Total number of samples
 If we are able to select all possible samples of a
particular size from a given population, then the
mean of the distribution of the sample mean will
exactly equal the population mean:
 X    Mean of the Distribution of the Sample Means
 Even if we do not select all possible samples,
they will be approximately equal:
 X    Mean of the Distribution of the Sample Means
26
Standard Deviation Of The Sampling
Distribution Of The Sample Mean
(Standard Error Of The Mean)
 There is less dispersion in the sampling
distribution of the sample mean than in the
population (each value is an average!!)

X 
n
 SD of the Sampling Distribution of the Sample Mean
 σ = population standard deviation
 n = sample size
 When we increase “n” the standard deviation of
the sample will decrease
27
Central Limit Theorem
• Use the Central Limit Theorem to find
probabilities of selecting possible sample
means from a specified population
 If the population is known to follow a
normal distribution, or, n > 30…
 We need our z-scores…
28
Z-Scores
 To determine the probability a sample
mean falls within a particular region, use:
z
X 

n
Sampling error
Standard error of
sampling distribution
of the sample mean
We are interested in the distribution Xbar, the sample mean,
instead of X
29
Business Decisions Example 1
 History for a food manufacturer shows the
weight for a Chocolate Covered Sugar
Bombs (popular breakfast cereal) is:
 μ = 14 oz.
 σ = .4 oz.
 If the morning shift sample shows:
 Xbar = 14.14 oz.
 n = 30
 Is this sampling error reasonable, or do we
need to shut down the filling operations?
30
Business Decisions Example 1
Sugar Bombs Info.
μ =
14 oz.
σ =
0.4 oz.
Xbar = 14.14 oz.
n =
30
z = 1.917
X   14.14  14
z

 n
.4 / 30
 1.91703
Table shows an area of
.4726
.5 - .4726 = .0274
It is unlikely that we
could sample and get
this weight, so we must
investigate the box filling
equipment
In the distribution of
sampling means, it is
unlikely of getting a
sample with 14.14 oz.
31
Suppose the mean selling
price of a gallon of gasoline
in the United States is $1.30.
(μ) Further, assume the
distribution is positively
skewed, with a standard
deviation of $0.28 (sigma).
What is the probability of
selecting a sample of 35
gasoline stations (n = 35)
and finding the sample mean
within $.08?
Step One : Find the z-values corresponding to
$1.22 and $1.38. These are the two points within
$0.08 of the population mean.
X 
$1.38  $1.30
z

 1.69
sigma n
$0.28 35
X 
$1.22  $1.30
z

 1.69
sigma n
$0.28 35
Step Two: determine the probability of a z-value
between -1.69 and 1.69.
P(1.69  z  1.69)  2(.4545 )  .9090
We would expect about 91
percent of the sample
means to be within $0.08 of
the population mean.