Download Sampling - Website Staff UI

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Sampling (statistics) wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
 
Sampling Survey
SAMPLING
SURVEY
TECHNIQUE
© aSup-2011
1
 
Sampling Survey
POPULATIONS and SAMPLES
THE POPULATION
is the set of all the individuals of
interest in particular study
The result from the
sample are generalized
from the population
The sample is selected
from the population
THE SAMPLE
is a set of individuals selected from a population, usually
intended to represent the population in a research study
© aSup-2011
2
 
Sampling Survey
Teknik pengumpulan data
Pengumpulan Data
Sensus (populasi)
Sampling (sampel)
Probabilita
© aSup-2011
Non-Probabilita
Sampling Survey
 
PARAMETER and STATISTIC
 A parameter is a value, usually a numerical
value, that describes a population.
A parameter may be obtained from a single
measurement, or it may be derived from a set
of measurements from the population
 A statistic is a value, usually a numerical
value, that describes a sample.
A statistic may be obtained from a single
measurement, or it may be derived from a set
of measurement from sample
© aSup-2011
4
 
Sampling Survey
SAMPLING ERROR
 Although samples are generally
representative of their population, a
sample is not expected to give a perfectly
accurate picture of the whole population
 There usually is some discrepancy
between sample statistic and the
corresponding population parameter
called sampling error
© aSup-2011
5
Sampling Survey
 
TWO KINDS OF NUMERICAL DATA
Generally fall into two major categories:
1. Counted  frequencies  enumeration data
2. Measured  metric or scale values 
measurement or metric data
Statistical procedures deal with both kinds of data
© aSup-2011
6
 
Sampling Survey
DATUM and DATA
 The measurement or observation obtain for
each individual is called a datum or, more
commonly a score or raw score
 The complete set of score or measurement is
called the data set or simply the data
 After data are obtained, statistical methods are
used to organize and interpret the data
© aSup-2011
7
 
Sampling Survey
VARIABLE
 A variable is a characteristic or condition that
changes or has different values for different
individual
 A constant is a characteristic or condition that
does not vary but is the same for every
individual
 A research study comparing vocabulary skills
for 12-year-old boys
© aSup-2011
8
Sampling Survey
 
QUALITATIVE and QUANTITATIVE
Categories
 Qualitative: the classes of objects are different
in kind.
There is no reason for saying that one is greater
or less, higher or lower, better or worse than
another.
 Quantitative: the groups can be ordered
according to quantity or amount
It may be the cases vary continuously along a
continuum which we recognized.
© aSup-2011
9
Sampling Survey
 
DISCRETE and CONTINUOUS Variables
 A discrete variable. No values can exist
between two neighboring categories.
 A continuous variable is divisible into an
infinite number or fractional parts
○ It should be very rare to obtain identical
measurements for two different individual
○ Each measurement category is actually an interval
that must be define by boundaries called real limits
© aSup-2011
10
 
Sampling Survey
CONTINUOUS Variables
 Most interval-scale measurement are taken to
the nearest unit (foot, inch, cm, mm)
depending upon the fineness of the measuring
instrument and the accuracy we demand for
the purposes at hand.
 And so it is with most psychological and
educational measurement. A score of 48 means
from 47.5 to 48.5
 We assume that a score is never a point on the
scale, but occupies an interval from a half unit
below to a half unit above the given number.
© aSup-2011
11
Sampling Survey
 
FREQUENCIES, PERCENTAGES,
PROPORTIONS, and RATIOS
 Frequency defined as the number of objects or event
in category.
 Percentages (P) defined as the number of objects or
event in category divided by 100.
 Proportions (p). Whereas with percentage the base
100, with proportions the base or total is 1.0
 Ratio is a fraction. The ratio of a to b is the fraction
a/b.
A proportion is a special ratio, the ratio of a part to a
total.
© aSup-2011
12
 
Sampling Survey
MEASUREMENTS and SCALES (Stevens, 1946)
Ratio
Interval
Ordinal
Nominal
© aSup-2011
13
Sampling Survey
 
FREQUENCY DISTRIBUTION,
GRAPH, and PERCENTILE
© aSup-2011
14
Sampling Survey
 
 A class of 40 students has just returned the
Perceptual Speed test score. Aside from the
primary question about your grade, you’d like
to know more about how you stand in the
class
 How does your score compare with other in
the class? What was the range of performance
 What more can you learn by studying the
scores?
© aSup-2011
15
 
Sampling Survey
Score of PERCEPTUAL SPEED Test
29
40
36
32
46
47
49
53
51
50
45
48
25
48
28
40
37
58
54
44
48
48
33
47
52
48
46
33
40
49
49
55
43
38
56
45
67
42
44
48
Taken from Guilford p.55
© aSup-2011
16
Sampling Survey
 
OVERVIEW
 When a researcher finished the data collect phase of
an experiment, the result usually consist pages of
numbers
 The immediate problem for the researcher is to
organize the scores into some comprehensible form
so that any trend in the data can be seen easily and
communicated to others
 This is the jobs of descriptive statistics; to simplify
the organization and presentation of data
 One of the most common procedures for organizing
a set of data is to place the scores in a FREQUENCY
DISTRIBUTION
© aSup-2011
17
 
Sampling Survey
GROUPED SCORES
 After we obtain a set of measurement (data), a
common next step is to put them in a
systematic order by grouping them in classes
 With numerical data, combining individual
scores often makes it easier to display the data
and to grasp their meaning. This is especially
true when there is a wide range of values.
© aSup-2011
18
Sampling Survey
 
TWO GENERAL CUSTOMS IN
THE SIZE OF CLASS INTERVAL
1. We should prefer not fewer than 10 and more
than 20 class interval.
○ More commonly, the number class interval
used is 10 to 15.
○ An advantage of a small number class interval
is that we have fewer frequencies which to
deal with
○ An advantage of larger number class interval is
higher accuracy of computation
© aSup-2011
19
Sampling Survey
 
TWO GENERAL CUSTOMS IN
THE SIZE OF CLASS INTERVAL
2. Determining the choice of class interval is that
certain ranges of units (scores) are preferred.
Those ranges are 2, 3, 5, 10, and 20.
These five interval sizes will take care of
almost all sets of data
© aSup-2011
20
 
Sampling Survey
Score of PERCEPTUAL SPEED Test
29
40
36
32
46
47
49
53
51
50
45
48
25
48
28
40
37
58
54
44
48
48
33
47
52
48
46
33
40
49
49
55
43
38
56
45
67
42
44
48
Taken from Guilford p.55
© aSup-2011
21
Sampling Survey
 
HOW TO CONSTRUCT A GROUPED
FREQUENCY DISTRIBUTION
Step 1 : find the lowest score and the highest
score
Step 2 : find the range by subtracting the lowest
score from the highest
Step 3 : divide the range by 10 and by 20 to
determine the largest and the smallest
acceptable interval widths. Choose a
convenient width (i) within these limits
© aSup-2011
22
 
Sampling Survey
Score of PERCEPTUAL SPEED Test
29
47
45
40
48
48
49
45
40
49
48
37
48
46
55
67
36
53
25
58
33
33
43
42
32
51
48
54
47
40
38
44
46
50
28
44
52
49
56
48
Range = 42  42 : 10 = 4,2 and 42 : 20 = 2,1
© aSup-2011
23
Sampling Survey
 
WHERE TO START CLASS INTERVAL
 It’s natural to start the interval with their
lowest scores at multiples of the size of the
interval.
 When the interval is 3, to start with 24, 27, 30,
33, etc.; when the interval is 4, to start with 24,
28, 32, 36, etc.
© aSup-2011
24
Sampling Survey
 
HOW TO CONSTRUCT A GROUPED
FREQUENCY DISTRIBUTION
Step 4 : determine the score at which the lowest
interval should begin. It should ordinarily be a
multiple of the interval.
Step 5 : record the limits of all class interval, placing
the interval containing the highest score value
at the top. Make the intervals continuous and
of the same width
Step 6 : using the tally system, enter the raw scores in
the appropriate class intervals
Step 7 : convert each tally to a frequency
© aSup-2011
25
 
Sampling Survey
FREQUENCY DISTRIBUTION TABLE
SCORE
66 - 68
X max = 67
63 - 65
60 - 62
Interval = 3
C.i = 15
60 - 63
RANGE = 42
56 - 59
51 - 53
52 - 55
Interval = 4
48 - 50
48 - 51
45 - 47
C.i = 11
44 - 47
42 - 44
39 - 41
40 - 43
36 - 38
36 - 39
33 - 35
32 - 35
30 - 32
© aSup-2011
64 - 67
X min = 25
57 -59
54 - 56
SCORE
27 - 29
28 - 31
24 - 26
24 - 27
26
 
Sampling Survey
P
E
R
C
E
P
T
U
A
L
S
P
E
E
D
SCORE
f
Xc
Lower Exact Limit Upper Exact Limit
64 -67
1 65.5
63.5
67.5
60 - 63
0 61.5
59.5
63.5
56 - 59
2 57.5
55.5
59.5
52 - 55
4 53.5
51.5
55.5
48 - 51
11 49.5
47.5
51.5
44 - 47
8 45.5
43.5
47.5
40 - 43
5 41.5
39.5
43.5
36 - 39
3 37.5
35.5
39.5
32 - 35
3 33.5
31.5
35.5
28 - 31
2 29.5
27.5
31.5
24 - 27
1 25.5
23.5
27.5
© aSup-2011
27
 
Sampling Survey
WARNING!!
 Although grouped frequency distribution can make
easier to interpret data, some information is lost.
 In the table, we can see that more people scored in
the interval 48 – 51 than in any other interval
 However, unless we have all the original scores to
look at, we would not know whether the 11 scores in
this interval were all 48s, all, 49s, all 50s, or all 51 or
were spread throughout the interval in some way
 This problem is referred to as GROUPING ERROR
 The wider the class interval width, the greater the
potential for grouping error
© aSup-2011
28
 
Sampling Survey
STEM and LEAF DISPLAY
 In 1977, J.W. Tukey presented a technique for
organizing data that provides a simple
alternative to a frequency distribution table or
graph
 This technique called a stem and leaf display,
requires that each score be separated into two
parts.
 The first digit (or digits) is called the stem, and
the last digit (or digits) is called the leaf.
© aSup-2011
29
 
Sampling Survey
Data
83
62
71
76
85
32
56
74
82
93
68
52
42
57
73
81
© aSup-2011
63
78
33
97
46
59
74
76
Stem & Leaf Display
3
4
5
6
7
8
9
2 3
2 6
6 2 7 9
2
1
3
3
2 8 3
6 4 3 8 4 6
5 2 1
7
30
Sampling Survey
 
7 9
3
4 3 8 4 6
2 1
GROUPED FREQUENCY DISTRIBUTION
HISTOGRAM AND A STEM AND LEAF DISPLAY
2
2
6
2
1
3
3
30 40 50 60 70 80 90
© aSup-2011
3
4
5
6
7
8
9
0
3
6
2
8
6
5
7
7
6
5
4
3
2
1
31
Sampling Survey
 
MAKING GRAPH
POLIGON and HISTOGRAM
© aSup-2011
32
 
Sampling Survey
MAKING GRAPH
POLIGON
© aSup-2011
33
 
Sampling Survey
P
E
R
C
E
P
T
U
A
L
S
P
E
E
D
SCORE
f
Xc
64 -67
1
65.5
63.5
67.5
60 - 63
0
61.5
59.5
63.5
56 - 59
2
57.5
55.5
59.5
52 - 55
4
53.5
51.5
55.5
48 - 51
11 49.5
47.5
51.5
44 - 47
8
45.5
43.5
47.5
40 - 43
5
41.5
39.5
43.5
36 - 39
3
37.5
35.5
39.5
32 - 35
3
33.5
31.5
35.5
28 - 31
2
29.5
27.5
31.5
24 - 27
1
25.5
23.5
27.5
© aSup-2011
Lower Exact Limit Lower Exact Limit
34
 
Sampling Survey
f
12
POLIGON
10
8
6
Class
Interval’s
MIDPOINT
4
2
0
© aSup-2011
X
21.5 29.5 37.5 45.5 53.5 61.5 69.5
25.5 33.5 41.5 49.5
57.5 65.5
35
Sampling Survey
f
12
 
PERCEPTUAL SPEED
10
8
6
4
2
0
© aSup-2011
X
21.5 29.5 37.5 45.5 53.5 61.5 69.5
25.5 33.5 41.5 49.5
57.5 65.5
36
 
Sampling Survey
MAKING GRAPH
HISTOGRAM
© aSup-2011
37
 
Sampling Survey
P
E
R
C
E
P
T
U
A
L
S
P
E
E
D
SCORE
f
Xc
64 -67
1
65.5
63.5
67.5
60 - 63
0
61.5
59.5
63.5
56 - 59
2
57.5
55.5
59.5
52 - 55
4
53.5
51.5
55.5
48 - 51
11 49.5
47.5
51.5
44 - 47
8
45.5
43.5
47.5
40 - 43
5
41.5
39.5
43.5
36 - 39
3
37.5
35.5
39.5
32 - 35
3
33.5
31.5
35.5
28 - 31
2
29.5
27.5
31.5
24 - 27
1
25.5
23.5
27.5
© aSup-2011
Lower Exact Limit Lower Exact Limit
38
 
Sampling Survey
f
12
HISTOGRAM
10
8
Class
Interval’s
EXACT
LIMIT
6
4
2
0
© aSup-2011
X
27.5 35.5 43.5 51.5 59.5 67.5
23.5 31.5 39.5 47.5
55.5 63.5
39
Sampling Survey
f
12
 
POLIGON and HISTOGRAM
10
8
6
4
2
0
© aSup-2011
X
27.5 35.5 43.5 51.5 59.5 67.5
23.5 31.5 39.5 47.5
55.5 63.5
40
 
Sampling Survey
THE SHAPE OF A FREQUENCY DISTRIBUTION
Symmetrical
positive
It is possible to draw a
vertical line through the
middle so that one side of
the distribution is a mirror
image of the other
Skewed
negative
The scores tend to pile up toward one end of the
scale and taper off gradually at the other end
© aSup-2011
41
 
Sampling Survey
LEARNING CHECK
 Describe the shape of distribution for the data
in the following table
© aSup-2011
X
f
5
4
3
2
1
4
6
3
1
1
The distribution is
negatively skewed
42
Sampling Survey
 
PERCENTILES and PERCENTILE RANKS
 The percentile system is widely used in educational
measurement to report the standing of an individual
relative performance of known group. It is based on
cumulative percentage distribution.
 A percentile is a point on the measurement scale
below which specified percentage of the cases in the
distribution falls
 The rank or percentile rank of a particular score is
defined as the percentage of individuals in the
distribution with scores at or below the particular
value
 When a score is identified by its percentile rank, the
score called percentile
© aSup-2011
43
Sampling Survey
 
 Suppose, for example that A have a score of
X=78 on an exam and we know exactly 60%
of the class had score of 78 or lower….…
 Then A score X=78 has a percentile of 60%,
and A score would be called the 60th
percentile
Percentile Rank refers to a percentage
Percentile refers to a score
© aSup-2011
44
Sampling Survey
 
CENTRAL TENDENCY
Mean, Median, and Mode
© aSup-2011
45
Sampling Survey
 
OVERVIEW
 The general purpose of descriptive statistical
methods is to organize and summarize a set
score
 Perhaps the most common method for
summarizing and describing a distribution is
to find a single value that defines the average
score and can serve as a representative for the
entire distribution
 In statistics, the concept of an average or
representative score is called central tendency
© aSup-2011
46
Sampling Survey
 
OVERVIEW
 Central tendency has purpose to provide a
single summary figure that best describe the
central location of an entire distribution of
observation
 It also help simplify comparison of two or
more groups tested under different conditions
 There are three most commonly used in
education and the behavioral sciences: mode,
median, and arithmetic mean
© aSup-2011
47
 
Sampling Survey
The MODE
 A common meaning of mode is ‘fashionable’,
and it has much the same implication in
statistics
 In ungrouped distribution, the mode is the
score that occurs with the greatest frequency
 In grouped data, it is taken as the midpoint of
the class interval that contains the greatest
numbers of scores
 The symbol for the mode is Mo
© aSup-2011
48
 
Sampling Survey
The MEDIAN
 The median of a distribution is the point along
the scale of possible scores below which 50%
of the scores fall and is there another name for
P50
 Thus, the median is the value that divides the
distribution into halves
 It symbols is Mdn
© aSup-2011
49
Sampling Survey
 
The ARITHMETIC MEAN
 The arithmetic mean is the sum of all the
scores in the distribution divided by the total
number of scores
 Many people call this measure the average,
but we will avoid this term because it is
sometimes used indiscriminately for any
measure of central tendency
 For brevity, the arithmetic mean is usually
called the mean
© aSup-2011
50
Sampling Survey
 
The ARITHMETIC MEAN
 Some symbolism is needed to express the mean
mathematically. We will use the capital letter X as a
collective term to specify a particular set of score (be
sure to use capital letters; lower-case letters are used
in a different way)
 We identify an individual score in the distribution by
a subscript, such as X1 (the first score), X8 (the eighth
score), and so forth
 You remember that n stands for the number in a
sample and N for the number in a population
© aSup-2011
51
 
Sampling Survey
Properties of the Mode
 The mode is easy to obtain, but it is not very stable
from sample to sample
 Further, when quantitative data are grouped, the
mode maybe strongly affected by the width and
location of class interval
 There may be more than one mode for a particular set
of scores. In rectangular distribution the ultimate is
reached: every score share the honor! For these
reason, the mean or the median is often preferred
with numerical data
 However, the mode is the only measure that can be
used for data that have the character of a nominal
scale
© aSup-2011
52
 
Sampling Survey
Properties of the Mean
 Unlike the other measures of central tendency,
the mean is responsive to the exact position of
reach score in the distribution
 Inspect the basic formula ΣX/n. Increasing or
decreasing the value of any score changes ΣX
and thus also change the value of the mean
 The mean may be thought of as the balance point
of the distribution, to use a mechanical analogy.
There is an algebraic way of stating that the mean
is the balance point:
( X  X )  0
© aSup-2011
53
Sampling Survey
 
Properties of the Mean
 The sums of negative deviation from the mean
exactly equals the sum of the positive
deviation
 The mean is more sensitive to the presence (or
absence) of scores at the extremes of the
distribution than are the median or (ordinarily
the mode
 When a measure of central tendency should
reflect the total of the scores, the mean is the
best choice because it is the only measure
based of this quantity
© aSup-2011
54
 
Sampling Survey
The MEAN of Ungrouped Data
 The mean (M), commonly known as the
arithmetic average, is compute by adding all
the scores in the distribution and dividing by
the number of scores or cases
M=
© aSup-2011
ΣX
N
55
 
Sampling Survey
The MEAN of Grouped Data
 When data come to us
grouped, or
M
 when they are too lengthy
for comfortable addition
without the aid of a
calculating machine, or
X
 when we are going to
20 - 24
group them for other
purpose anyway,
15 - 19
 we find it more convenient 10 - 14
to apply another formula
5-9
for the mean:
0-4
© aSup-2011
Σ f.Xc
=
N
Xc
f
f.Xc
22
17
12
7
2
1
4
7
5
3
22
68
84
35
6
56
Sampling Survey
 
The MEDIAN of Ungrouped Data
 Method 1: When N is an odd number
 list the score in order (lowest to highest),
and the median is the middle score in the list
 Method 2: When N is an even number
 list the score in order (lowest to highest),
and then locate the median by finding the
point halfway between the middle two scores
© aSup-2011
57
Sampling Survey
 
The MEDIAN of Ungrouped Data
 Method 3: When there are several scores with
the same value in the middle of the
distribution
 1, 2, 2, 3, 4, 4, 4, 4, 4, 5
 There are 10 scores (an even number), so you
normally would use method 2 and average the
middle pair to determine the median
 By this method, the median would be 4
© aSup-2011
58
 
Sampling Survey
f
f
5
5
4
4
3
3
2
2
1
1
0
© aSup-2011
1
2
3
4
5
X 0
1
2
3
4
5
X
59
Sampling Survey
 
The MEDIAN of Grouped Data
 There are 10 scores (an even number), so you
normally would use method 2 and average the
middle pair to determine the median. By this method
the median would be 4
 In many ways, this is a perfectly legitimate value for
the median. However when you look closely at the
distribution of scores, you probably get the clear
impression that X = 4 is not in the middle
 The problem comes from the tendency to interpret
the score of 4 as meaning exactly 4.00 instead of
meaning an interval from 3.5 to 4.5
© aSup-2011
60
 
Sampling Survey
THE MODE
 The word MODE means the most common
observation among a group of scores
 In a frequency distribution, the mode is the
score or category that has the greatest
frequency
© aSup-2011
61
Sampling Survey
 
SELECTING A MEASURE OF CENTRAL TENDENCY
 How do you decide which measure of central
tendency to use? The answer depends on
several factors
 Note that the mean is usually the preferred
measure of central tendency, because the mean
uses every score score in the distribution, it
typically produces a good representative value
 The goal of central tendency is to find the
single value that best represent the entire
distribution
© aSup-2011
62
Sampling Survey
 
SELECTING A MEASURE OF CENTRAL TENDENCY
 Besides being a good representative, the mean
has the added advantage of being closely
related to variance and standard deviation, the
most common measures of variability
 This relationship makes the mean a valuable
measure for purposes of inferential statistics
 For these reasons, and others, the mean
generally is considered to be the best of the
three measure of central tendency
© aSup-2011
63
Sampling Survey
 
SELECTING A MEASURE OF CENTRAL TENDENCY
 But there are specific situations in which it is
impossible to compute a mean or in which the
mean is not particularly representative
 It is in these condition that the mode an the
median are used
© aSup-2011
64
Sampling Survey
 
WHEN TO USE THE MEDIAN
1. Extreme scores or skewed distribution
When a distribution has a (few) extreme
score(s), score(s) that are very different in
value from most of the others, then the mean
may not be a good representative of the
majority of the distribution.
The problem comes from the fact that one or
two extreme values can have a large
influence and cause the mean displaced
© aSup-2011
65
Sampling Survey
 
WHEN TO USE THE MEDIAN
2. Undetermined values
Occasionally, we will encounter a situation in
which an individual has an unknown or
undetermined score
Person Time (min.)
1
2
3
4
5
6
© aSup-2011
8
11
12
13
17
Never finished
Notice that person 6 never
complete the puzzle. After one
hour, this person still showed no
sign of solving the puzzle, so the
experimenter stop him or her
66
Sampling Survey
 
WHEN TO USE THE MEDIAN
2. Undetermined values
There are two important point to be noted:
 The experimenter should not throw out this
individual’s score. The whole purpose to use a
sample is to gain a picture of population, and this
individual tells us about that part of the population
cannot solve this puzzle
 This person should not be given a score of X = 60
minutes. Even though the experimenter stopped the
individual after 1 hour, the person did not finish the
puzzle. The score that is recorded is the amount of
time needed to finish. For this individual, we do not
know how long this is
© aSup-2011
67
 
Sampling Survey
WHEN TO USE THE MEDIAN
3. Open-ended distribution
A distribution is said to be open-ended when there
is no upper limit (or lower limit) for one of the
categories
Number of
children (X)
5 or more
4
3
2
1
0
© aSup-2011
f
3
2
2
3
6
4
Notice that is impossible to
compute a mean for these data
because you cannot find ΣX
68
Sampling Survey
 
WHEN TO USE THE MEDIAN
4. Ordinal scale
when score are measured on an ordinal scale,
the median is always appropriate and is
usually the preferred measure of central
tendency
© aSup-2011
69
Sampling Survey
 
WHEN TO USE THE MODE
 Nominal scales
Because nominal scales do not measure quantity, it is
impossible to compute a mean or a median for data
from a nominal scale
 Discrete variables  indivisible categories
 Describes shape
the mode identifies the location of the peak (s). If you
are told a set of exam score has a mean of 72 and a
mode of 80, you should have a better picture of the
distribution than would be available from mean
alone
© aSup-2011
70
Sampling Survey
 
CENTRAL TENDENCY AND THE SHAPE
OF THE DISTRIBUTION
 Because the mean, the median, and the mode
are all trying to measure the same thing
(central tendency), it is reasonable to expect
that these three values should be related
 There are situations in which all three
measures will have exactly the same or
different value
 The relationship among the mean, median,
and mode are determined by the shape of the
distribution
© aSup-2011
71
Sampling Survey
 
SYMMETRICAL DISTRIBUTION SHAPE
 For a symmetrical distribution, the right-hand
side will be a mirror image of the left-hand
side
 By definition, the mean and the median will be
exactly at the center because exactly half of the
area in the graph will be on either side of the
center
 Thus, for any symmetrical distribution, the
mean and the median will be the same
© aSup-2011
72
Sampling Survey
 
SYMMETRICAL DISTRIBUTION SHAPE
 If a symmetrical distribution has only one
mode, it will also be exactly in the center of the
distribution. All three measures of central
tendency will have same value
 A bimodal distribution will have the mean and
the median together in the center with the
modes on each side
 A rectangular distribution has no mode
because all X values occur with the same
frequency. Still the mean and the median will
be in the center and equivalent in value
© aSup-2011
73
Sampling Survey
 
MEASURES OF
VARIABILITY
© aSup-2011
74
Sampling Survey
 
 Knowing the central value of a set of
measurement tells us much, but it does not by
any means give us the total pictures of the
sample we have measured
 Two groups of six-year-old children may have
the same average IQ of 105. One group contain
no individuals with IQs below 95 or above 115,
and that the other includes individuals with
IQs ranging from 75 to 135
 We recognize immediately that there is a
decided difference between the two groups in
variability or dispersion
© aSup-2011
75
 
Sampling Survey
75
85
95
105
115
125
135
The BLUE group is decidedly more homogenous
than the RED group with respect to IQ
© aSup-2011
76
Sampling Survey
 
Purpose of Measures of Variability
 To explain and to illustrate the methods of
indicating degree of variability or dispersion
by the use of single numbers
 The three customary values to indicate
variability are
○ The total range
○ The semi-interquartile range Q, and
○ The standard deviation S
© aSup-2011
77
 
Sampling Survey
The TOTAL RANGE
 The total range is the easiest and most quickly
ascertained value, but it also the most
unreliable
 The BLUE group (from an IQ of 95 to one of
115) is 20 points. The range of RED group from
75 to 135, or 60 points
 The range is given by the highest score minus
the lowest score
 The RED group has three times the range of
the BLUE group
© aSup-2011
78
Sampling Survey
 
The SEMI-INTERQUARTILE RANGE Q
 The Q is one-half the range of the middle 50
percent of the cases
 First we find by interpolation the range of the
middle 50 percent, or interquartile range, the
divide this range into 2
© aSup-2011
79
 
Sampling Survey
Low
Middle
Quarter
Lowest
Quarter
Q1
High
Middle
Quarter
Q2
Highest
Quarter
Q3
Q2 – Q1 Q3 – Q2
Q3 – Q1 = 2Q
Q=
© aSup-2011
Q3 – Q1
2
80
 
Sampling Survey
The STANDARD DEVIATION S
 Standard deviation is by far the most
commonly used indicator of degree of
dispersion and is the most dependable
estimate of the variability in the population
from which the sample came
 The S is a kind of average of all deviation from
the mean
S=
© aSup-2011
√
∑ x2
n-1
81
Sampling Survey
 
 As a general concept, the standard deviation
is often symbolized by SD, but much more
often by simply S
 In verbal terms, a S is the square root of the
arithmetic mean of the squared deviations of
measurements from their means
© aSup-2011
82
Sampling Survey
 
Interpretation of a Standard Deviation
 The usual and most accepted interpretation of
a S is in percentage of cases included within
the range from one S below the mean to one S
above the mean
 In a normal distribution the range from -1σ to
+1σ contains 68,27 percent of the cases
 If the mean = 29,6 and S = 10,45; we say about
two-third of the cases lies from 19,15 to 40,05
© aSup-2011
83
Sampling Survey
 
Interpretation of a Standard Deviation
 One of the most common source of variance in
statistical data is individual differences, where
each measurement comes from a different
person
© aSup-2011
84
Sampling Survey
 
Interpretation of a Standard Deviation
 Giving a test of n items to a group of person
Before the first item is given to the group, as
far as any information from this test is
concerned, the individuals are all alike. There
is no variance
 Now administer the first item to the group.
Some pass it and some fail. Some now have
score of 1, and some have scores of zero
 There are two groups of individuals. There is
much variation, this much variance
© aSup-2011
85
Sampling Survey
 
Interpretation of a Standard Deviation
 Give a second item. Of those who passed the first,
some will past the second and some will fail it.
Etc.
 There are now three possible scores : 0, 1, and 2.
 More variance has been introduced
 Carry the illustration further, adding item by item
 The differences between scores will keep
increasing, and also, by computation, the variance
and variability
© aSup-2011
86
Sampling Survey
 
 Another rough check is to compare the S
obtained with the total range of measurement
 In very large samples (N=500 or more) the S is
about one-sixth of the total range
 In other word, the total range is about six S
 In smaller samples the ratio of range to S can
be expected to be smaller (see Guilford &
Fruchter p.71)
© aSup-2011
87
Sampling Survey
 
Ratios of the Total Range to the Standard Deviation
in a Distribution for Different Values of N
 Rough check for a computed SD
○ The actual percentage of a case between +1 SD and 1 SD deviates 68 percents
○ In very large sample (N = 500 or more) the SD as
about one-sixth of the total range
N Range/S N Range/S N Range/S
5
2.3
40
4.3
400
5.9
10
3.1
50
4.5
500
6.1
15
3.5
100
5.0
700
6.3
20
3.7
200
5.5
1000
6.5
© aSup-2011
88
 
Sampling Survey
z-Score:
Location of Scores and
Standardized Distribution
© aSup-2011
89
 
Sampling Survey
PREVIEW
 In particular, we will convert each individual
score into a new, standardize score, so that the
standardized score provides a meaningful
description of its exact location within the
distribution
 We will use the mean as a reference point to
determine whether the individual is above or
below average
 The standard deviation will serve as yardstick
for measuring how much an individual differ
from the group average
© aSup-2011
90
 
Sampling Survey
EXAMPLE
 Suppose you received a score of X = 76 on a
statistics exam. How did you do?
 It should be clear that you need more
information to predict your grade
 Your score could be one of the best score in
class, or it might be the lowest score in the
distribution
© aSup-2011
91
Sampling Survey
 
X = 76, the best score or the
lowest score?
 To find the location of your score, you must
have information about the other score in the
distribution
 If the mean were μ = 70 you would be in better
position than the mean were μ = 86
 Obviously, your position relative to the rest of
the class depends on mean
© aSup-2011
92
 
Sampling Survey
X = 76 and μ = 70
 However, the mean by itself is not sufficient to
tell you the exact location of your score
 At this point, you know that your score is six
points above the mean
 Six points may be a relatively big distance and
you may have one of the highest score in class,
or
 Six points may be a relatively small distance
and you are only slightly above the average
© aSup-2011
93
 
Sampling Survey
THE z-SCORE FORMULA
X-μ
z= σ
© aSup-2011
94
Sampling Survey
 
z-Score and Location In a Distribution
 One of the primary purpose of a z-Score is to
describe the exact location of a score within a
distribution
 The z-Score accomplishes this goal by
transforming each X value into a signed
number (+ or -), so that:
○ The sign tells whether the score is located above
(+) or below (-) the mean, and
○ The number tells the distance between the score
and the mean in term of the number of standard
deviation
© aSup-2011
95
Sampling Survey
 
If every X value is transformed into a z-score, then
the distribution of z-score will have the following
properties:
 Shape of the z-score distribution will be the same as
the original distribution of raw scores. Each
individual has exactly the same relative position in
the X distribution and the z-score distribution
 The Mean will always have a mean of zero. The
subject with score same as the mean is transformed
into z = 0
 The Standard Deviation will always have a standard
deviation of 1. The subject with score same as the
+1S from the mean is transformed into z = +1
© aSup-2011
96
 
Sampling Survey
PROBABILITY and NORMAL DISTRIBUTION
σ
μ
In simpler terms, the normal distribution is
symmetrical with a single mode in the middle.
The frequency tapers off as you move farther
from the middle in either direction
© aSup-2011
97
Sampling Survey
 
THE DISTRIBUTION
OF SAMPLE MEANS
© aSup-2011
98
Sampling Survey
 
OVERVIEW
 Whenever a score is selected from a
population, you should be able to compute a zscore
 And, if the population is normal, you should
be able to determine the probability value for
obtaining any individual score
 In a normal distribution, a z-score of +2.00
correspond to an extreme score out in the tail
of the distribution, and a score at least large
has a probability of only p = .0228
© aSup-2011
99
Sampling Survey
 
THE DISTRIBUTION OF SAMPLE MEANS
 Two separate samples probably will be
different even though they are taken from the
same population
 The sample will have different individual,
different scores, different means, and so on
 The distribution of sample means is the
collection of sample means for all the possible
random samples of a particular size (n) that
can be obtained from a population
© aSup-2011
100
 
Sampling Survey
COMBINATION
n!
nCr =
r! (n-r)!
 Consider a population that consist of 5
scores: 3, 4, 5, 6, and 7
 Mean population = ?
 Construct the distribution of sample
means for n = 1, n = 2, n = 3, n = 4, n = 5
© aSup-2011
101
Sampling Survey
 
SAMPLING DISTRIBUTION
… is a distribution of statistics obtained by
selecting all the possible samples of a specific
size from a population
CENTRAL LIMIT THEOREM
For any population with mean μ and standard
deviation σ, the distribution of sample means for
sample size n will have a mean of μ and a standard
deviation of σ/√n and will approach a normal
distribution as n approaches infinity
© aSup-2011
102
Sampling Survey
 
The STANDARD ERROR OF MEAN
 The value we will be working with is the
standard deviation for the distribution of
sample means, and it called the σM
 Remember the sampling error
 There typically will be some error between the
sample and the population
 The σM measures exactly how much difference
should be expected on average between
sample mean M and the population mean μ
© aSup-2011
103
 
Sampling Survey
The MAGNITUDE of THE σM
 Determined by two factors:
○The size of the sample, and
○The standard deviation of the population
from which the sample is selected
M 
© aSup-2011

n
104
Sampling Survey
 
PROBABILITY AND THE DISTRIBUTION
OF SAMPLE MEANS
 The primary use of the standard distribution
of sample means is to find the probability
associated with any specific sample
 Because the distribution of sample means
present the entire set of all possible Ms, we can
use proportions of this distribution to
determine probabilities
© aSup-2011
105
 
Sampling Survey
EXAMPLE
 The population of scores on the SAT forms a
normal distribution with μ = 500 and σ = 100.
If you take a random sample of n = 16
students, what is the probability that sample
mean will be greater that M = 540?
σM =
σ
√n
= 25
M-μ
z= σ
M
= 1.6
z = 1.6  Area C  p = .0548
© aSup-2011
106
Sampling Survey
© aSup-2011
 
107
Sampling Survey
 
Tipe-tipe Pengambilan Sampel
Desain pengambilan sampel random/probabilita
 Untuk desain pengambilan sampel random
atau probabilita, setiap elemen dalam populasi
harus memiliki kesempatan yang sama dan
bebas untuk dipilih sebagai sampel.
© aSup-2011
108
Sampling Survey
 
Terdapat dua keuntungan dari sampel
acak/probabilita:
1. Sebagai representasi pengambilan sampel
populasi total, penarikan kesimpulan dari
sampel seperti ini dapat digeneralisasikan ke
pengambilan sampel populasi total.
2. Pengujian statistik yang didasarkan pada
teori probabilita dapat diaplikasikan hanya
pada data yang dikumpulkan dari sampel
acak.
© aSup-2011
109
Sampling Survey
 
Metode-metode mengambil sampel acak
 The fishbowl draw: jika jumlah total populasi
kecil, prosedur yang mudah adalah
menuliskan setiap elemen pada secarik kertas
tiap elemennya, masukan pada sebuah kotak,
dan ambil satu-persatu tanpa dilihat, sampai
kertas yang dipilih sesuai dengan ukuran
sampel yang telah ditetapkan
© aSup-2011
110
Sampling Survey
 
Metode-metode mengambil sampel acak
 Program komputer
 Tabel acak: kebanyakan buku metodologi
penelitian dan statistik memasukan tabel acak
pada bagian lampirannya. Sampel dapat
dipilih dengan menggunakan tabel sesuai
prosedur
© aSup-2011
111