Download Central Tendency Central Tendency

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
1
Central Tendency
CENTRAL TENDENCY:
• A statistical measure that identifies a single
score that is most typical or representative of
the entire group
• Usually, a value that reflects the middle of
the distribution is used, because this is where
most of the scores pile up
• No single measure of central tendency works
best in all circumstances, so there are 3
different measures -- mean, median, and
mode. Each works best in a specific
situation
Central Tendency
$450,000
Examples:
The average height of
men in Montgomery
County is 5’ 6”.
The average salary in
ACME Inc is $57,000.
$150,000
$100,000
ARITHMETICAL AVERAGE
$57,000
$50,000
$37,000
$30,000
(
MEDIAN
)
The one in the middle
12 above him, 12 below
$20,000 MODE
(
)
Occurs most
frequently
2
Central Tendency (Mode)
MODE:
• The score or category that has the greatest frequency; the
most common score
• To find the mode, simply locate the score that appears most
often
– In a frequency distribution table, it will be the score with the
largest frequency value
– In a frequency graph, it will be the tallest bar or point
Example: A sample of class ages is given. . .
Ages
f
23
1
22
0
21
2
20
0
19
3
18
2
* The age with the highest frequency is 19, with a frequency
of 3; therefore, the mode is 19.
Central Tendency (Mode)
• A distribution may have more than one mode,
or peak: A distribution with 2 modes is said
to be bimodal; A distribution with more than
2 modes is said to be multimodal
Example: A sample of class ages. . .
Age
23
22
21
20
19
18
f
1
3
1
1
3
2
* age 22 and age 19 both
have a frequency of 3; if
this distribution were
graphed, there would be
2 peaks; therefore this
distribution is bimodal -both 22 and 19 are modes
3
Central Tendency (Mode)
Advantages:
• Easiest to determine
• The only measure of central tendency that
can be used with nominal (categorical) data
Disadvantages:
• Sometimes is not a unique point in the
distribution (bimodal or multimodal)
• Not sensitive to the location of scores in a
distribution
• Not often used beyond the descriptive level
Central Tendency (Median)
MEDIAN:
• The score that divides the distribution exactly
in half; 50% of the individuals in a
distribution have scores at or below the
median
4
Central Tendency (Median)
Method when N is an odd number:
List the scores from lowest to highest; the middle score on the list
is the median
Example: The ages of a sample of class members are 24, 18, 19,
22, and 20. What is the median value?
• List the scores from lowest to highest:
18, 19, 20, 22, 24
• The middle score is 20 - therefore, that is the median
Method when N is an even number:
List scores in order from lowest to highest and locate the point
halfway between the middle two scores
Example: The ages of a sample of class members are 18, 19, 20,
22, 24 and 30. What is the median age?
The scores are already listed from lowest to highest; select the
middle two scores (20, 22) and find the middle point:
20 + 22
median =
2
= 21
Central Tendency (Median)
Advantages:
• Is less affected by extreme scores than the mean is; is
better for skewed distributions
Example: compare 2 samples of class ages
1) 18, 19, 20, 22, 24
2) 18, 19, 20, 22, 47
• The median in both cases is 20 - it’s not thrown off by the
extreme score of 47
• Can be used with ordinal data
• The only index that can be used with open-ended
distributions (distributions without a lower or upper limit
for one of the categories
Disadvantages:
• Can’t be used with nominal level data
• Not sensitive to the location of all scores within a
distribution
5
Central Tendency (Mean)
MEAN (µ, x):
• The arithmetical average of the scores
• The amount that each individual would receive if
the total (Σx) were divided up equally between
everyone in the distribution
• Computed by adding all of the scores in the
distribution and dividing that sum by the total
number of scores
∑x
• Population mean: µ = N
• Sample mean:
x=
∑x
n
Central Tendency (Mean)
• Note that, while the computations would yield
the same answer, the symbols differ for a
population (µ, N) and a sample (x,n)
Example:
x=
∑ x = 18 +19 + 19 + 21 + 23 = 100 = 20
n
5
5
6
Central Tendency (Mean)
Advantages
• Sensitive to the location of every score in a
distribution
• Least sensitive to sample fluctuation (if we were to
take several samples, these sample means would
differ less than if we compared the medians or the
modes from the samples)
Disadvantages
• May only be used with interval or ratio level data
• Sensitive to extreme scores, and therefore may not
be desirable when working with highly skewed
distributions
Example: compare 2 samples of class ages
1) 18, 19, 20, 22, 24
x1 = 20.6 Vs. x 2
2) 18, 19, 20, 22, 47
= 25.2
Central Tendency
• Select the method of central tendency that
gives you the most information, yet is
appropriate for the type of data you have. In
general, use the mean if it’s appropriate. If
your data are skewed, or if they are not
measured on an interval or ratio scale, use
the median. If your data are measured on a
nominal scale, the mode is the only
appropriate measure of central tendency
7
Selecting the best measure
of central tendency
4
• the mean, median,
and mode for a
‘normal’ distribution
are the same
3
2
1
0
mean
median
mode
4
3
2
1
0
mode
mean
median
mode
•this distribution is
symmetrical, but
•the mean and the
median are the same
•and it is “bi-modal”,
it has two modes
Selecting the best measure
of central tendency
4
3
2
1
0
mean
median
•it is also possible to have a distribution
with the same mean and median, but no
mode
•In the case of a rectangular distribution
8
Selecting the best measure
of central tendency
•for positively skewed
distributions the mode
would be the lowest,
followed by the median
then the mean
4
3
2
1
0
mode mean
median
•for negatively skewed
distributions the mean
would be the lowest,
followed by the median
then the mode
4
3
2
1
0
mean
mode
median
Measures of Variability
VARIABILITY (a.k.a. “Spread”):
• The degree to which scores in a distribution
are spread out or clustered together
• This is important because we need to not
only know what the average score is in a
distribution, we also need to know how near
or far the majority of the scores are in
relation to this central value
• Measures of variability include: range,
sums of squares (including deviation
and mean deviation scores), variance,
and standard deviation
9
Measures of Variability (Range)
RANGE: The distance between the largest score and the
smallest score in the distribution
There are 2 methods for computing the range:
1) Subtract the lower real limit (LRL) for the lowest score
in the distribution (xmin) from the upper real limit (URL)
for the highest score (xmax)
Range=urlxmax - lrlxmin
2) Subtract the minimum score from the maximum score
and add 1 to the difference
Range=xmax - xmin +1
Example: find the range for the following set of scores: 6, 6, 8, 9, 10
Xmax = 10
Xmin = 6
urlxmax= 10.5
lrlxmin = 5.5
Range = 10 - 6 + 1 = 5 or Range = 10.5 - 5.5 = 5
Measures of Variability (Range)
Advantages:
• Easy to obtain
• Gives a quick approximation of variability
Disadvantages:
• Only sensitive to the 2 extreme scores -insensitive to all intermediate scores
– For 2 sets of scores, 1) 1, 8, 9, 9, 10, and 2) 1, 3, 5, 7, 10,
the range is identical although the scores are distributed
very differently
• Substantial sample fluctuation -- can easily
change from sample to sample
• Little used beyond the descriptive level
10
Measures of Variability
• Deviation:score - mean
(X i − X)
• Mean deviation: the average absolute deviation score
∑ Xi − X
n
• Sums of Squares: The sum of the squared deviations
around the mean
ss = ∑ ( x − µ )
2
Measures of Var. (Sums of Squares)
Example:Two sets of quiz scores:
x
x- µ
(x-µ)2
1
-4
16
3
-2
4
4
-1
1
7
2
4
7
2
4
8
3
9
∑X=30 0
SS=38
µ=5.00
x
x- µ
4
-1
5
0
5
0
5
0
5
0
6
1
∑x=30
0
µ=5.00
(x-µ)2
1
0
0
0
0
1
SS=2
Both distributions have the same means but the actual
scores are dispersed differently
11
Measures of Variability (Variance)
VARIANCE:
• The average squared deviation from the
mean
• Provides a control for sample size (as N
increases, SS will naturally increase)
σ
2 denotes population variance
•
Formulae:
SS
σ2 =
N
or σ
2
∑( x − µ)
=
N
From the previous example,
σ2 =
38
= 6.333
6
&
2
σ2 =
2
= 0.4
5
Measures of Var. (Standard Deviation)
STANDARD DEVIATION:
• Measure of variability that approximates the
average deviation (distance from the mean)
for a given set of scores
• σ denotes population standard deviation
Definitional formula:
σ = σ2
For the first problem in the previous example,
σ = 6.33 = 2.517
• Our scores differ from the mean an average
of 2.517 points
12
Measures of Var. (Standard Deviation)
Properties of the standard deviation:
• Standard deviation provides a measure of the
average distance from the mean
• When the standard deviation is small, the
scores are close to the mean (the curve is
narrow), and when the standard deviation is
large, scores are typically spread out farther
from the mean (the curve is wide)
• Standard deviation is a very important
component of inferential statistics
Measures of Var. (Standard Deviation)
• If a constant is added or subtracted to each
score, the standard deviation does not
change
• For example, if an instructor chooses to
“curve” a set of test scores by adding 10
points to each score, the distance between
individual scores doesn’t change. All of the
scores are just shifted up 10 points.
• The mean would increase by 10 points.
However, the standard deviation remains the
same
13
Measures of Var. (Standard Deviation)
• If each score in a distribution is multiplied or
divided by a constant, the standard deviation of
that distribution would also be multiplied or
divided by the same constant
• Thus, if an instructor changes a 50-point exam
into a 100-point exam by multiplying everyone’s
score by 2, the “spread” of the scores also is
multiplied by 2. For example, on the old scale,
the scores could range from 27-50 (a difference
of 23 points), while on the new scale the scores
range from 54-100 (a difference of 46
points)(this is not the standard deviation)
• In this example, the standard deviation would be
multiplied by 2.
Measures of Var. (Standard Deviation)
Advantages:
• Sensitive to the location of all scores
• Less sample fluctuation -- changes less from
sample to sample
• Widely used in both descriptive and advanced
statistical procedures
Disadvantages:
• Sensitive to extreme scores -- highly skewed
distributions can have a negative impact
• Both the range and standard deviation can
only be applied to interval or ratio level scales
of measurement
14
Measures of Variability
Population vs. Sample variability
• The variance and standard deviation formulas
we have examined so far are population
formulas. These tend to underestimate the
population variability when used on a sample;
in other words, these are biased statistics
• Thus, when we are computing variances and
standard deviations on samples, we correct
for this bias by altering the formula; the
corrected formula provides a more accurate
estimate of the population values
Measures of Variability
Variance formula for a sample estimating a
population:
s2 =
SS
n −1
Standard deviation formula for a sample
estimating a population:
SS
s=
n −1
or
s = s2
15
Degrees of Freedom
• we know that in order to calculate variance we
must know the mean (
X)
• this limits the number of scores that are free to
vary
• degrees of freedom ( df ) are defined as the
number of scores in a sample that are free to
vary
•
where n is the number of
df = n − 1
scores in the sample
Degrees of Freedom Cont.
Picture Example
•There are five balloons:
one blue, one red, one
yellow, one pink, & one
green.
•If 5 students (n=5) are
each to select one
balloon only 4 will have
a choice of color (df=4).
The last person will get
whatever color is left.
16
Degrees of Freedom Cont.
Statistical Example
• Given that there are 5 students ( n = 5 ) with
a mean score of 10 ( X = 10 )
• There are four degrees of freedom
df = n − 1 = 5 − 1 = 4
• In other words, four of the scores are free to
vary, but the fifth is determined by the mean
• If we make the first four scores 9, 10, 11, &
12, then the fifth score must be 8.
Measures of Variability
Statistical term
population value
sample value
∑x
µ=
x
∑
x=
Mean
Variance
N
SS
σ2 =
N
Standard deviation
σ = σ2
n
s2 =
SS
n −1
s = s2
17
Measures of Variability
Choosing a Measure of Variability:
• When selecting a measure of variability,
choose the one that gives you the most
information, but is appropriate for your data
situation. In general, use the standard
deviation. If your data are skewed, or if they
are not measured on an interval or ratio
scale, use the range
• (Sums of squares and variance are
derivatives of the standard deviation - they’re
used to compute the standard deviation, but
provide little useful information on their own.)
Standard Scores and Distributions
Standard scores:
• Transform individual scores (raw scores) into
standard (transformed) scores that give a precise
description of where the scores fall within a
distribution
• Use standard deviation units to describe the location
of a score within a distribution
• when tests are said to be ‘curved’, the scores are
transformed
• one way of transforming scores is to add (or
subtract) a constant to each score
• when a constant is added or subtracted the mean will
also change the same amount as the rest of the
scores, but the standard deviation will be unaffected
18
Transformations of scores
X
X+3
3
6
4
7
5
8
6
9
• the mean of the X distribution is 4.5
and the standard deviation is 1.29
• the mean of the X+3 distribution is 7.5
and the standard deviation is still 1.29
• When a constant is added
or subtracted
to every score in a
distribution, the shape
of a distribution does not
change, it simply
shifts along the x-axis.
4
3
2
1
0
Transformations of scores
• a second way of transforming scores is to multiply (or
divide) a constant to every score in the distribution
• this will change the mean as well as the standard
deviation the same as the rest of the scores
X
3
4
5
6
4
3
2
1
0
X(3)
9
12
15
18
• the mean of the X’s is 4.5 and the
standard deviation is 1.29
• the mean of the X’s multiplied by 3
is 13.5 and the standard deviation is 3.87
• When a constant is
multiplied or divided
to every score in a
distribution, the shape
of a distribution changes.
19
Standard Scores and Distributions
STANDARDIZED DISTRIBUTIONS: are
composed of transformed scores with
predetermined values for µ and σ (regardless
of the values in the raw score distribution
Examples:
• IQ scores are standardized with a µ=100 and
σ = 15
• SAT scores are standardized with a µ=500
and a σ =100
Standard Scores and Distributions
Z-scores:
• Standard scores that specify the precise
location of each raw score in a normal
distribution in terms of standard deviation
units
• Consist of 2 parts:
– The sign (+ or -) indicates whether the
score is located above or below the mean
– The magnitude of the actual number
indicates how far the score is from the
mean in terms of standard deviations
20
Standard Scores and
Distributions
Examples:
• In a distribution of test scores with µ=100, σ =15,
what is the z-score for a score of 130? For a score of
85?
– With a score of 130, z=2; it is a positive z-score
because 130 is above the mean -- it’s higher than
100; the magnitude is 2 because we can add
exactly 2 standard deviations to the mean
(100+15+15) and obtain our score (130)
– With a score of 85, z=-1; it is below the mean
(making it a negative z-score) and it is exactly 1
standard deviation from the mean (100-15=85)
Standard Scores and Distributions
x−µ
σ
• The numerator is a deviation (distance) score
that indicates how far away from the mean
your score of interest is, thus providing the
sign (+ or -) of the z-score
• Dividing by σ expresses the “distance” score
in standard deviation units (a z-score) -- this
works the same way as if you knew the
number of gallons a bucket held, but you
wanted to express that amount in terms of
quarts -- you just divide the number of
gallons by four to get the number of quarts
Formula: z =
21
Standard Scores and
Distributions
Examples:
• A distribution of exam scores has µ=25 and σ =3.6.
You scored 29. What is your z-score?
z=
x−µ
σ
z=
29 − 25
= 1.11
3.6
– Thus, you scored 1.11 standard deviations above
the mean
• What is your z-score if you scored 22 on the exam?
z=
x−µ
σ
z=
22 − 25
= −.83
3.6
– This time, you fell .83 standard deviations below the
mean
Standard Scores and Distributions
You can also calculate a person’s raw score (x) when
you are provided with a z-score
Formula:
x = µ + zσ
Example: A person has a z-score of 1.5 for the SAT
math test (µ=500, σ =100). What is his raw score?
X = 500+1.5(100)
= 500 + 150 = 650
Thus, his z-score indicates that his SAT math score
was 650
22
Standard Scores and Distributions
Characteristics of a z-score distribution:
• Shape: The distribution will be exactly the
same shape as the distribution of raw scores
• Mean: The mean is always 0, regardless of
the raw score distribution
• Standard deviation: The standard deviation
of a z-score distribution will always be 1,
regardless of the raw score distribution
Standard Scores and Distributions
Using z-scores for making comparisons:
• One benefit to using z-scores is that they allow comparisons
between distributions with different characteristics by
providing a standard metric or scale (standard deviation units)
EXAMPLE: A student score 29 on a statistics exam (µ=24, σ
=3), and a 50 on a biology exam (µ=50, σ =5). On which
exam did the student perform better?
• By standardizing scores using standard deviation units, we
can compare scores in 2 completely different distributions
(compare apples and oranges); Simply convert both raw
scores into z-scores, then compare the z-scores to each other
Statistics
Biology
z=(29-24)/3=1.67
z=(50-50)/5=0
23
Standard Scores and Distributions
Other standardized distributions:
• Many people don’t like the fact that z-score
distributions have negative scores and
decimal places, so they use other, similar
standardized distributions which avoid the
negative connotations of a “-” sign
• For example, IQ scores are viewed in terms
of
– a standardized distribution with µ=100, s=50
– t-scores distribution (which we’ll learn more about
later) with µ=50, s =10
Standard Scores and Distributions
How do we standardize raw scores into a distribution we want?
EXAMPLE: A set of exam scores have µ=43, σ =4. We would
like to create a new standard distribution with µ=60, s =20.
What would the new standardized value for a score of 41 on
the exam?
First, change the raw score into a z-score (using the
procedure we just learned about)
z=
x−µ
σ
z=
41 − 43
= −.50
4
Second, change the z-score into the new standardized score
x = µ + zσ
Std new = µ new + zσ new
Std new = 60 + −.50(20) = 50
24
The Normal Distribution
• the normal distribution is not a single
distribution, rather it is an infinite set of
distributions that can be described
using the mean ( X ) and standard
deviation ( s ).
• the shape of the distribution describes
many existing variables variables, i.e.
weight
The Normal Distribution
• by definition the area under a normal
distribution = 1.0
• the normal shape can also be used to
determine the proportion of an area in the
distribution
34% • for example, the area
between the mean and
one standard deviation
is about 34%
−2s −1s
µ
1s 2s
25
The Normal Distribution
68%
50%
34%
13.5%
2.5%
34%
13.5%
95%
2.5%
• each line represent 1 standard deviation, the
percentages refer to the entire shaded area
The Normal Distribution
• the areas have been calculated for all zscores
• remember that z-scores transform raw
scores into the number of standard
deviations it is away from the mean
• using z-scores allows us to determine
proportions or probabilities for normal
distributions
26
The Unit Normal Table
• the unit normal table is a table that contains
proportions in a normal distribution
associated with z-score values
• See Table A from Pagano
• there are two important things to note
– the table includes the area in the “body” and in
the “tail”
– there are no negative z-values
The Unit Normal Table
• remember that the normal distribution
is symmetrical, because of this the
proportion will be the same whether the
score is positive or negative
.0062
.0062
-2.5
2.5
• whether the z-score is 2.5 or -2.5, the area beyond, or
the area in the tail, is still .0062
27
The Unit Normal Table
• when you are dealing with the unit
normal table it is sometimes confusing
whether you are looking at the tail or
the body, especially when you have a
negative z-score
• perhaps the best way do deal with this
confusion it to draw a picture and shade
the area you are looking for
The Unit Normal Table
What proportion of people had a score higher than
z=1.5?
1.
2.
3.
draw a picture of the area you are looking for
draw the line where the z score is located (z=1.5)
shadow the area you are asked for (people who scored
higher so shadow right of line)
4. then look up in tables the area in the tail for z=1.5
ANSWER: the proportion of people with a z-score higher
4
than 1.5 is .0668
3
2
1
0
Z=1.5
Proportion=.0668
28
The Unit Normal Table
You can also use the table to determine areas between scores,
i.e.
How many people are between z=-.5 & z=.5?
1.
Draw the picture of the area you are looking for (We know that the
whole area under the curve equals 1)
2.
Draw the lines where the z-scores are located
3.
Shadow the area you are asked for
4.
The light areas are the tails provided in the tables Area A= .31, Area
B= .31.
So the shadowed area equals 1- Area A - Area B = 1 - .31 - .31 = .38
ANSWER: the proportion of people between z=-.5 & z=.5 is 0.38
4
Z= -.5
Z= .5 Proportion= .38
3 Area B= .31
Area A= .31
2
1
0
The Unit Normal Table
Another method would be
1.
2.
3.
4.
5.
Draw the picture of the area you are looking for
Draw the lines where the z-scores are located
Shadow the area for the body of z= .5 (Area C= .69)
Shadow the area for the tail of z= -.5 (Area D= .31)
Substract the tail of z=-.5 from the body of z=.5
Area C - Area D = .69 - .31 = .38
4
3
2
1
04
3
2
4
1
3
0
2
1
0
ANSWER: the proportion of people between z=-.5 & z=.5 is 0.38
4
3 Area D= .31
2
1
0
Z= -.5
Z= .5
Area C= .69
Proportion= .38