Download sampling error

Document related concepts
no text concepts found
Transcript
School of Psychology
Dpt. Experimental Psychology
Design and Data Analysis in
Psychology I
English group (A)
Salvador Chacón Moscoso
Susana Sanduvete Chaves
1
Milagrosa Sánchez Martín
Lesson 5
Sampling and
sampling distribution
2
1. Introduction

The statistical inference presents two categories:
 Estimation theory (lesson 6):
 Given an index in the sample, the aim is to infer
the value of the index in the population.
 Two kinds of estimation:
 Punctual estimation: it provides a single
value.
 Estimation by intervals: it facilitates a range
of values.
 Decision theory (lesson 8):
 Procedure to make decisions in the field of
statistical inference.
3
1. Introduction
ESTIMATION THEORY
STATISTICS
PARAMETERS
4
2. Phases of the inferential process
1.
2.
3.
4.
5.
Obtain a sample randomly.
Calculate the statistics (indexes in the sample):
X , S, p
Construct a sampling distribution (means or
proportions; the possible results that can be
found taking different samples).
Choose a probability model (e.g., if we throw a
dice, there are six possible results, and they are
equiprobable). The most used in psychology is
the normal law.
Calculate the corresponding parameters
(indexes in the population) based on the
5
statistics.
3. Sampling error
The value of the statistic will be closer to the value of the
parameter depending on the degree of representativeness of
the sample studied. For example, it depends on:
The sample size.
The similarity-difference between participants.
The sampling procedure.

Nevertheless,
there will be always some discrepancy
between statistic and parameter. This is the sampling error.

Solution:
 The precise value of the sampling error is unknown.
 Using the inference, we will know with a certain
confidence that this error does not exceed a limit.
6
3. Sampling error. Calculation:
Sample
X
p
S
Statistics
(Latin letters)
Population
μ
π
σ
Parameters
(Greek letters)
7
3. Sampling error. Calculation:
The sampling error is the difference
between a statistic and its corresponding
parameter.
X 
e
p 
8
3. Sampling error

There are two main concepts related to the
sampling error:
1.
Accuracy: the precision with which a
statistic represents the parameter.
1.
Reliability: the measure of the constancy
of a statistic when you calculate it for
several samples of the same type and
size.
9
3. Sampling error

Accuracy: example.

What estimator is more accurate?
X1 = 47
 = 50
X2 = 54
10
3. Sampling error
X1 = 47
e1  47  50  3
X2 = 54
e2  54  50  4
e1  e2
X 1 is more accurate.
11
3. Sampling error

Reliability: example.
X1  76 X2  78 X3  75 X4  77
X 1  20

X 2  40
X 3  60
X 4  80
What group of means is more reliable?
12
3. Sampling error

Reliability: example.
 The first group of means is more
reliable because variation between them
is lower.
13
3. Sampling error

The lower the sampling error is, the more
probable is that the estimator in a sample
presents
the
same
value
as
the
parameter.
14
4. Sampling distribution

Definition: it is a distribution of
theoretical probability that establishes a
functional relation between the possible
values of a statistic, based on a sample
of size n and the probability associated
with each one of these values, for all the
possible samples of size n, extracted
from a particular population.

The construction of a sampling
distribution presents three phases:
15
4. Sampling distribution
PHASE 1. Collect all the samples of the
same size n, extracted randomly from the
population under study.
S1
Population
S2
S3
Sk
16
4. Sampling distribution
PHASE 2. Calculate the same estimator in
each sample.
S1
X1
S2
X2
S3
X3
Sn
Xn
We will find different values of the estimator (e.g., the mean)
17
in the different samples.
4. Sampling distribution
PHASE 3. Group these measures in a new
distribution.
X1
X2
X3
Mean of means
Xn
18
4. Sampling distribution
In general, the sampling distribution will differ from the
distribution of the population.

The
variance of the statistic provides a measure of
dispersion of the particular sampling values with respect to
the expected value of the statistic, considering all the
possible samples of size n.
The
standard deviation of the sampling distribution is called
standard error of the estimator.
We
are only going to study the sampling distribution of two
statistics:
4.1. The mean.
4.2. The proportion.
19
4.1. Sampling distribution of the mean
Mean or expected value
X
X  
Standard error
X
X 

n

S
n 1
20
4.1. Sampling distribution of the mean
Distribution of
the population


Sampling distribution
X
X
X
21
4.1. Sampling distribution of the mean.
Characteristics
1.
The statistics obtained in the samples are
grouped around the parameter of the
population.
2.
The bigger n is, the closer to the
parameter the statistics are.
3.
In large samples, the graphic
representation presents the following
characteristics:
22
4.1. Sampling distribution of the mean.
Characteristics
a) It is symmetric. The central vertical axis is
the parameter .
b) The bigger n is, the narrower the Bellshaped curve is.
c) It takes the form of the normal curve.
23
4.1. Sampling distribution of the mean.
Characteristics
4.
Its mean matches with the real mean in
the population.
X  
5.
It is more or less variable. If its change is
small (i.e., has a small sigma), means
differ little from each other, and it is very
reliable.
24
4.1. Sampling distribution of the mean.
Standardization
s
X
X X
Z
S
Sample
X 
Z



Population
X
X
Z
X 
X

X 
X 


S
n
n 1
25
Sampling distribution
4.1. Sampling distribution of the mean.
Standardization

Standardization allows to calculate
probabilities (if you know the probability
model that has the distribution). We can
consider normal distribution when
n≥30.
26
4.1. Sampling distribution of the mean
X
Means
N=∞
N≠∞
X
Based on σ



Based on S
S
n 1
n

n
N n
N 1
N n
S
N (n  1)
correction
27
4.1. Sampling distribution of the mean.
Example 1
We applied a test to a population and we obtained a
mean (μ) of 18 points and a standard deviation (σ) of
3 points. Assuming that the variable is normally
distributed in the population:
a) Which raw scores do delimit the central 95% of the
participants of that population?
b) Which raw scores do delimit the central 99% of the
average scores in samples of 225 participants,
obtained randomly?
28
4.1. Sampling distribution of the mean.
Example 1
a) Which raw scores do delimit the central
95% of the participants of that population?
95%
0.475 0.475
Z1=-1.96
Z2=1.96
29
4.1. Sampling distribution of the
mean. Example 1
30
4.1. Sampling distribution of the mean.
Example 1
Xi  
X 1  18
Z
 1.96 
 1.96 * 3  X 1  18 

3
 5.88  X 1  18  5.88  18  X 1  X 1  12.12
Xi  
X 2  18
Z
 1.96 
 1.96 * 3  X 2  18 

3
5.88  X 2  18  5.88  18  X 2  X 2  23.88
31
4.1. Sampling distribution of the mean.
Example 1
95%
X1=12.12
X2=23.88
The raw scores that delimit the central 95% of
the participants are 12.12 and 23.88.
32
4.1. Sampling distribution of the mean.
Example 1
b) Which raw scores do delimit the central 99%
of the average scores in samples of 225
participants, obtained randomly?
99%
0.495 0.495
-2.58
2.58
33
4.1. Sampling distribution of the
mean. Example 1
34
4.1. Sampling distribution of the mean.
Example 1
Z
Xi 
X
X 1  18
 2.58 
 2.58 * 0.2  X 1  18 
0.2
 0.516  X 1  18  0.516  18  X 1  X 1  17.484
X 
Z

n

Xi 
X
3
3

 0.2
225 15
X 2  18
 2.58 
 2.58 * 0.2  X 2  18 
0.2
0.516  X 2  18  0.516  18  X 2  X 2  18.516
35
4.1. Sampling distribution of the mean.
Example 1
99%
X 1  17.484
X 2  18.516
17.484 and 18.516 delimit the central 99% of the
average scores in samples of 225 participants.
36
4.1. Sampling distribution of the mean.
Example 2
Calculate the probability of extracting a sample
of 81 participants with mean equal or lower than
42, from a population whose mean () is 40 and
standard deviation () is 9.
37
4.1. Sampling distribution of the mean.
Example 2
Z 
Xi 
X 
X

n

42  40 2

 2
1
1
9
9
 1
81 9
Z  2  p  0.4772


0.5 ?
  40
X  42
P X  42  P Z  2   0.5  0.4772  0.9772
38
4.1. Sampling distribution of the mean.
Example 3
In a sampling distribution of means with samples
of 49 participants, the means of the central 90%
of the samples are between 47 and 53 points.
Calculate:
a) The raw scores that delimit the central 95%
of the means.
b) The standard deviation of the population (σ).
c) The raw scores that delimit the central 95% of
the means, when the sample size is 81.
39
4.1. Sampling distribution of the mean.
Example 3
a) The raw scores that delimit the central 95%
of the means.
53  47

 50
2
Xi 
53  50
Z 
 1.64 

90%
X
0.45 0.45
X 1  47
X 2  53
1.64 X  3   X
X
3

 1.829
1.64
Z 2  1.64
40
4.1. Sampling distribution of the mean.
Example 3
95%
Z
Xi 
X

0.475 0.475
X 1  50
 1.96 

1.829
 1.96 *1.829  X 1  50 
Z1=-1.96
Z2=1.96
 3.585  X 1  50  3.585  50  X 1  X 1  46.415
Z
Xi 
X
X 2  50
 1.96 
 1.96 *1.829  X 2  50
1.829
 3.585  X 2  50  3.585  50  X 2  X 2  53.585
41
4.1. Sampling distribution of the mean.
Example 3
b) The standard deviation of the population (σ).
X 

1.829 
n

49
1.829 * 7  
  12.803
42
4.1. Sampling distribution of the mean.
Example 3
c) The raw scores that delimit the central 95% of
the means, when the sample size is 81.
95%
-1.96

1.96
12.803
X 


n
81
12.803

 1.423
9
43
4.1. Sampling distribution of the mean.
Example 3
Z
Xi 
X
X 1  50
 1.96 
 1.96 *1.423  X 1  50
1.423
 2.789  X 1  50  2.789  50  X 1  X 1  47.211
Z
Xi 
X
X 2  50
 1.96 
 1.96 *1.423  X 2  50
1.423
 2.789  X 2  50  2.789  50  X 2  X 2  52.789
44
4.2. Sampling distribution of proportions

p = x/n, being x the number of participants that
presented a characteristic and n, the sample size.

We can consider normal distribution when Πn ≥5
and (1- Π)n ≥5
45
4.2. Sampling distribution of proportions
Mean or expected value
p
p  
Standard error
p
p 
 (1   )
n

p(1  p)
n
46
4.2. Sampling distribution of proportions.
Standardization
Z
pi - 
P

pi - 
 (1 -  )
n
p

47
4.2. Sampling distribution of proportions
Proportions
p
p
Based on σ
Based on S
N=∞

 (1   )
N≠∞

 (1   ) N  n
p (1  p)
n
n
n
N 1
p (1  p )
correction
N n
N (n  1)
48
4.2. Sampling distribution of proportions.
Example 1
In a population, the proportion of smokers
was 0.60. If we chose from this population
a sample of n=200, which is the
probability of finding 130 or fewer
smokers in that sample?
49
4.2. Sampling distribution of proportions.
Example 1
0,6
0,5
 = 0.60
0.60
0,4
0,3
0,2
0,1
0
  0.60
 = 0.60
(1   )  0.40
Can we consider these data from a
normal distribution?
n    5  200  0.60  120
n  (1   )  5  200  0.40  80
50
4.2. Sampling distribution of proportions.
Example 1
130
p
 0.65
200
Z
p 
(1  )
n

p 
p(1 p)
n

0.65  0.60
0.60 * 0.40
200
 143
.
Z  1.43  P  0.4236
P( p  0.65)  PZ  1.43  0.5  0.4236 
 0.9236
51
4.2. Sampling distribution of proportions.
Example 2
In a election to choose president, a candidate
obtained the 45% of the votes. If you would
choose randomly and independently a sample of
100 voters, which is the probability of obtaining
that the candidate received more than the 50%
of the votes?
52
4.2. Sampling distribution of proportions.
Example 2
Z
p 

 (1   )
n
0.50 - 0.45
1
0.45* 0.55
100
p 
 (1   )
n

P(P  0.50) = P(Z  1) = 0.50-0.3413 = 0.1587
53
4.2. Sampling distribution of proportions.
Example 3
The 30% of the students in Seville passed a concrete
test. Extracting samples of 100 students from this
population, calculate:
a) The values that delimit the central 99% of the
proportions of these samples.
b) The percentage of samples that have a proportion
equal or higher than 0.35 of students that passed the
test.
54
4.2. Sampling distribution of proportions.
Example 3
n    5  100  0.3  30
n  (1   )  5  100  0.7  70
a) Calculate the values that
delimit the central 99% of
the proportions of
these samples.
99%
0.495 0.495
-2.58
2.58
55
4.2. Sampling distribution of proportions.
Example 3
p -
p -
Z

P
 (1 -  )
n
P1  0.3
 2.58 
0.3(1  0.3)
100
p1  0.3
 2.58 
0.045
 2.58 * 0.045  p1  0.3
 0.116  p1  0.3
p1  0.116  0.3
p1  0.184
p 2  0. 3
2.58 
0.3(1  0.3)
100
p2  0.3
2.58 
0.045
2.58 * 0.045  p2  0.3
0.116  p2  0.3
p2  0.116  0.3
p2  0.416
56
4.2. Sampling distribution of proportions.
Example 3
b) The percentage of samples
that have a proportion equal
or higher than 0.35 of
students that passed the test.
Z
?
  0.3 p  .35
p -
P
0.35 - 0.3
0.05
Z

 1.11  P  0.3665
0.045
0.045
P( p  0.35)  0.5  0.3665  0.1335
The 13.35% of samples have a proportion equal or
higher than 0.35
57
Related documents