Download Social Science Reasoning Using Statistics

Document related concepts

Foundations of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Statistics for the Social Sciences
Psychology 340
Spring 2010
Describing Distributions
PSY 340
Statistics for the
Social Sciences
Announcements
• Homework #1: will accept these on Th (Jan
21) without penalty
• Quiz problems
– Quiz 1 is now posted, due date extended to Tu,
Jan 26th (by 11:00)
• Don’t forget Homework 2 is due Tu (Jan 26)
PSY 340
Statistics for the
Social Sciences
Outline (for week)
• Characteristics of Distributions
– Finishing up using graphs
– Using numbers (center and variability)
• Descriptive statistics decision tree
• Locating scores: z-scores and other transformations
PSY 340
Statistics for the
Social Sciences
Distributions
• Three basic characteristics are used to describe
distributions
– Shape
• Many different ways to display distribution
– Frequency distribution table
– Graphs
– Center
– Variability
PSY 340
Statistics for the
Social Sciences
Shapes of Frequency Distributions
 Unimodal, bimodal, and
rectangular
PSY 340
Statistics for the
Social Sciences
Shapes of Frequency Distributions
 Symmetrical and skewed distributions
Positively
 Normal and kurtotic distributions
Negatively
PSY 340
Statistics for the
Social Sciences
Frequency Graphs
 Histogram
 Plot the
different
values against
the frequency
of each value
PSY 340
Statistics for the
Social Sciences
Frequency Graphs
 Histogram by hand
 Step 1: make a frequency
distribution table (may use
grouped frequency tables)
 Step 2: put the values along the
bottom, left to right, lowest to
highest
 Step 3: make a scale of
frequencies along left edge
 Step 4: make a bar above each
value with a height for the
frequency of that value
PSY 340
Statistics for the
Social Sciences
Frequency Graphs
 Histogram using SPSS (create one for class height)
 Graphs -> Legacy -> histogram
 Enter your variable into ‘variable’
 To change interval width, double click the graph to get into the
chart editor, and then double click the bottom axis. Click on
‘scale’ and change the intervals to desired widths
 Note: you can also get one from the descriptive statistics
frequency menu under the ‘charts’ option
PSY 340
Statistics for the
Social Sciences
Frequency Graphs
 Frequency polygon - essentially the
same, put uses lines instead of bars
PSY 340
Statistics for the
Social Sciences
Displaying two variables
 Bar graphs
 Can be used in a number of ways (including
displaying one or more variables)
 Best used for categorical variables
 Scatterplots
 Best used for continuous variables
PSY 340
Bar graphs
Statistics for the
Social Sciences
• Plot a bar graph of men and women in the
class
–
–
–
–
Graphs -> bar
Simple, click define
N-cases (the default)
Enter Gender into Category axis, click ‘okay’
PSY 340
Statistics for the
Social Sciences
Bar graphs
• Plot a bar graph of shoes in closet crossed
with men and women
– What should we plot? (and why?)
• Average number of shoes for each group?
–
–
–
–
Graphs -> bar
Simple, click define
Other statistic (default is ‘mean’) – enter pairs of shoes
Enter Gender into Category axis, click ‘okay’
PSY 340
Scatterplot
Statistics for the
Social Sciences
• Useful for seeing the relationship between
the variables
–
–
–
–
Graphs -> Legacy Dialogs
Scatter/Dot
Simple Scatter, click ‘define’
Enter your X & Y variables, click ‘okay’
• Can add a ‘fit line’ in the chart editor
• Plot a scatterplot of soda and bottled water drinking
PSY 340
Statistics for the
Social Sciences
Describing distributions
• Distributions are typically described with three
properties:
– Shape: unimodal, symmetric, skewed, etc.
– Center: mean, median, mode
– Spread (variability): standard deviation, variance
PSY 340
Statistics for the
Social Sciences
Describing distributions
• Distributions are typically described with three
properties:
– Shape: unimodal, symmetric, skewed, etc.
– Center: mean, median, mode
– Spread (variability): standard deviation, variance
PSY 340
Statistics for the
Social Sciences
Which center when?
• Depends on a number of factors, like scale of
measurement and shape.
– The mean is the most preferred measure and it is closely
related to measures of variability
– However, there are times when the mean isn’t the
appropriate measure.
PSY 340
Statistics for the
Social Sciences
Which center when?
• Use the median if:
• The distribution is skewed
• The distribution is ‘open-ended’
– (e.g. your top answer on your questionnaire is ‘5 or more’)
• Data are on an ordinal scale (rankings)
• Use the mode if:
– The data are on a nominal scale
– If the distribution is multi-modal
PSY 340
Statistics for the
Social Sciences
The Mean
• The most commonly used measure of center
• The arithmetic average
– Computing the mean
– The formula for the population
mean is (a parameter):
X

N
– The formula for the sample
mean is (a statistic):
X
X
n

Divide by the
total number in
the population
Add up all of
the X’s
Divide by the
total number in
the sample
• Note: your book uses ‘M’ to denote the mean in formulas

PSY 340
Statistics for the
Social Sciences
The Mean
• Number of shoes:
– 5, 7, 5, 5, 5
– 30, 11, 12, 20, 14, 12, 15, 8, 6, 8, 10, 15, 25, 6, 35, 20, 20, 20, 25, 15
X
57555
 5.4
X

n
5
X
327
X

 16.35
n
20
• Suppose we want the mean of the entire group? Can we
simply add the two means together and divide by 2?
• NO. Why not?
PSY 340
Statistics for the
Social Sciences
The Weighted Mean
• Number of shoes:
– 5, 7, 5, 5, 5, 30, 11, 12, 20, 14, 12, 15, 8, 6, 8, 10, 15, 25, 6, 35, 20,
20, 20, 25, 15
X  5.4
X  16.35
X1n1  X2 n2
5.4 * 5   16.35 * 20 

XN 

 14.16
n1  n2
5  20
• Suppose we want the mean of the entire group? Can we
simply add the two means together and divide by 2?
• NO. Why not?
Need to take into account the number of
scores in each mean
PSY 340
The Weighted Mean
Statistics for the
Social Sciences
• Number of shoes:
– 5, 7, 5, 5, 5, 30, 11, 12, 20, 14, 12, 15, 8, 6, 8, 10, 15, 25, 6, 35, 20,
20, 20, 25, 15
X1n1  X2 n2
5.4 * 5   16.35 * 20 

XN 

 14.16
n1  n2
5  20
Let’s check:
X
X
n
354

25
 14.16
• Both ways give the same answer


PSY 340
Statistics for the
Social Sciences
The median
• The median is the score that divides a distribution exactly
in half. Exactly 50% of the individuals in a distribution
have scores at or below the median.
– Case1: Odd number of scores in the distribution
Step1: put the scores in order
Step2: find the middle score
– Case2: Even number of scores in the distribution
Step1: put the scores in order
Step2: find the middle two scores
Step3: find the arithmetic average of the two
middle scores
PSY 340
The mode
Statistics for the
Social Sciences
• The mode is the score or category that has the
greatest frequency.
– So look at your frequency table or graph and pick the
variable that has the highest frequency.
major mode
minor
mode
4
3
3
3
2
2
2
1
1
1
1 2 3
4
5 6 7
8 9
so the mode is 5
1 2 3
4
5 6 7
8 9
so the modes
are 2 and 8
1 2 3
4
5 6 7
8 9
Note: if one were bigger
than the other it would be
called the major mode and
the other would be the
minor mode
PSY 340
Statistics for the
Social Sciences
Describing distributions
• Distributions are typically described with three
properties:
– Shape: unimodal, symmetric, skewed, etc.
– Center: mean, median, mode
– Spread (variability): standard deviation, variance
PSY 340
Statistics for the
Social Sciences
Variability of a distribution
• Variability provides a quantitative measure of the degree to
which scores in a distribution are spread out or clustered
together.
– In other words variabilility refers to the degree of “differentness” of
the scores in the distribution.
• High variability means that
the scores differ by a lot
• Low variability means that the scores
are all similar
PSY 340
Statistics for the
Social Sciences
Standard deviation
• The standard deviation is the most commonly
used measure of variability.
– The standard deviation measures how far off all of the
scores in the distribution are from the mean of the
distribution.
– Essentially, the average of the deviations.


PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• Step 1: To get a measure of the deviation we need to
subtract the population mean from every individual in our
distribution.
Our population
2, 4, 6, 8
 X 2  4  6  8 20


  5.0
N
4
4
X - μ = deviation scores
2 - 5 = -3
-3
1 2 3 4 5 6 7 8 9 10


PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• Step 1: To get a measure of the deviation we need to
subtract the population mean from every individual in our
distribution.
Our population
2, 4, 6, 8
 X 2  4  6  8 20


  5.0
N
4
4
X - μ = deviation scores
2 - 5 = -3
4 - 5 = -1
-1
1 2 3 4 5 6 7 8 9 10


PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• Step 1: To get a measure of the deviation we need to
subtract the population mean from every individual in our
distribution.
Our population
2, 4, 6, 8
 X 2  4  6  8 20


  5.0
N
4
4
X - μ = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
1
1 2 3 4 5 6 7 8 9 10


PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• Step 1: Compute the deviation scores: Subtract the
population mean from every score in the distribution.
Our population
2, 4, 6, 8
 X 2  4  6  8 20


  5.0
N
4
4
X - μ = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
8 - 5 = +3
3
1 2 3 4 5 6 7 8 9 10

Notice that if you add up
all of the deviations they
must equal 0.
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• Step 2: Get rid of the negative signs. Square the deviations
and add them together to compute the sum of the squared
deviations (SS).
X - σ = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
8 - 5 = +3
SS = Σ (X - μ)2
= (-3)2 + (-1)2 + (+1)2 + (+3)2
= 9 + 1 + 1 + 9 = 20
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• Step 3: Compute the Variance (the average of the squared
deviations)
• Divide by the number of individuals in the population.
variance = σ2 = SS/N
• Note: your book uses ‘SD2’ to denote the variance in
formulas
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• Step 4: Compute the standard deviation. Take the square
root of the population variance.
X  
2
standard deviation = σ =  
2
N

• Note: your book uses ‘SD’ to denote the standard deviation
in formulas
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (population)
• To review:
– Step 1: compute deviation scores
– Step 2: compute the SS
• SS = Σ (X - μ)2
– Step 3: determine the variance
• take the average of the squared deviations
• divide the SS by the N
– Step 4: determine the standard deviation
• take the square root of the variance
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (sample)
• The basic procedure is the same.
– Step 1: compute deviation scores
– Step 2: compute the SS
– Step 3: determine the variance
• This step is different
– Step 4: determine the standard deviation
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (sample)
• Step 1: Compute the deviation scores
– subtract the sample mean from every individual in our distribution.
Our sample
2, 4, 6, 8
 X 2  4  6  8 20
X

  5.0
n
4
4
X - X = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
8 - 5 = +3
1 2 3 4 5 6 7 8 9 10
X
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (sample)
• Step 2: Determine the sum of the squared deviations (SS).
X - X = deviation scores
2 - 5 = -3
4 - 5 = -1
6 - 5 = +1
8 - 5 = +3
SS = Σ (X - X)2
= (-3)2 + (-1)2 + (+1)2 + (+3)2
= 9 + 1 + 1 + 9 = 20
Apart from notational differences the procedure is
the same as before
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (sample)
• Step 3: Determine the variance
Recall:
Population variance = σ2 = SS/N
The variability of the samples is
typically smaller than the
population’s variability
X4
X1  X3
X2
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (sample)
• Step 3: Determine the variance
Recall:
Population variance = σ2 = SS/N
The variability of the samples is
typically smaller than the
population’s variability
To correct for this we divide by (n-1) instead of just n
Sample variance =
s2
SS

n 1
PSY 340
Statistics for the
Social Sciences
Computing standard deviation (sample)
• Step 4: Determine the standard deviation
X  X 
2
standard deviation = s = s 
2

n 1
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
Standard deviation
changes
changes
– Changes the total and the number of scores, this will change the
mean and the standard deviation
X

N

2
X





N
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
– All of the scores change by the same constant.
Xold
Standard deviation
changes
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
– All of the scores change by the same constant.
Xold
Standard deviation
changes
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
– All of the scores change by the same constant.
Xold
Standard deviation
changes
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
– All of the scores change by the same constant.
Xold
Standard deviation
changes
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
– All of the scores change by the same constant.
– But so does the mean
Xnew
Standard deviation
changes
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
Standard deviation
changes
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
Standard deviation
changes
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
Standard deviation
changes
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
Standard deviation
changes
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
Standard deviation
changes
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
Standard deviation
changes
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
• Change/add/delete a given score
changes
• Add/subtract a constant to each
score
changes
Standard deviation
changes
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
Standard deviation
• Change/add/delete a given score
changes
changes
• Add/subtract a constant to each
score
changes
No change
– It is as if you just pick up the distribution and move it over, but the
spread (variability) stays the same
Xold Xnew
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
Standard deviation
• Change/add/delete a given score
changes
changes
• Add/subtract a constant to each
score
• Multiply/divide a constant to
each score
changes
No change
21 - 22 = -1
23 - 22 = +1
20 21 22 23 24
(-1)2
(+1)2
X  X 
2
s=
X

n 1
 2  1.41
PSY 340
Statistics for the
Social Sciences
Properties of means and standard deviations
Mean
Standard deviation
• Change/add/delete a given score
changes
changes
• Add/subtract a constant to each
score
• Multiply/divide a constant to
each score
– Multiply scores by 2
changes
No change
changes
changes
42 - 44 = -2
46 - 44 = +2
40 42 44 46 48
(-2)2
(+2)2
X  X 
2
s=
X

n 1
 8  2.82
Sold=1.41
PSY 340
Statistics for the
Social Sciences
Locating a score
• Where is our raw score within the distribution?
– The natural choice of reference is the mean (since it is usually easy
to find).
• So we’ll subtract the mean from the score (find the deviation score).
X 
– The direction will be given to us by the negative or
positive sign on the deviation score
– Thedistance is the value of the deviation score
PSY 340
Statistics for the
Social Sciences
Locating a score
Reference
point

  100
X1 = 162
X2 = 57

X 
X
1 - 100 = +62
X2 - 100 = -43
Direction
PSY 340
Statistics for the
Social Sciences
Locating a score
Reference
point
Below
X1 = 162
X2 = 57


  100
X 
X
1 - 100 = +62
X2 - 100 = -43
Above
PSY 340
Transforming a score
Statistics for the
Social Sciences
– The distance is the value of the deviation score
• However, this distance is measured with the units of
measurement of the score.
• Convert the score to a standard (neutral) score. In this case a
z-score.
Raw score
z

X 

Population mean
Population standard deviation
PSY 340
Transforming scores
Statistics for the
Social Sciences
  100
  50

z
X   


X1 = 162
X1 - 100 = +1.20
50
X2 = 57
X2 - 100 = -0.86
50
A z-score specifies the precise location
of each X value within a distribution.
• Direction: The sign of the z-score (+
or -) signifies whether the score is
above the mean or below the mean.
• Distance: The numerical value of the
z-score specifies the distance from the
mean by counting the number of
standard deviations between X and σ.
PSY 340
Statistics for the
Social Sciences
Transforming a distribution
• We can transform all of the scores in a distribution
– We can transform any & all observations to z-scores if
we know either the distribution mean and standard
deviation.
– We call this transformed distribution a standardized
distribution.
• Standardized distributions are used to make dissimilar
distributions comparable.
– e.g., your height and weight
• One of the most common standardized distributions is the Zdistribution.
PSY 340
Statistics for the
Social Sciences
Properties of the z-score distribution
  100
  50
0
z
X 

transformation
50
150
 

zmean 
Xmean = 100


100 100
50
=0
PSY 340
Statistics for the
Social Sciences
Properties of the z-score distribution
  100
  50
0
z
X 

transformation
50
150
 


100 100
50
150 100

50
Xmean = 100
zmean 
=0
X+1std = 150
z1std
= +1


+1
PSY 340
Properties of the z-score distribution
Statistics for the
Social Sciences
  100
  50
z
0
 1
X 



transformation
50
150
 
100 100
50
150 100
z1std 
50
50 100
z1std 
50
zmean 
Xmean = 100
X+1std = 150
X-1std = 50
-1



=0
= +1
= -1
+1
PSY 340
Statistics for the
Social Sciences
Properties of the z-score distribution
• Shape - the shape of the z-score distribution will be exactly
the same as the original distribution of raw scores. Every
score stays in the exact same position relative to every other
score in the distribution.
• Mean - when raw scores are transformed into z-scores, the
mean will always = 0.
• The standard deviation - when any distribution of raw
scores is transformed into z-scores the standard deviation
will always = 1.
PSY 340
Statistics for the
Social Sciences
From z to raw score
• We can also transform a z-score back into a raw score if we know the
mean and standard deviation information of the original distribution.
Z
X   

  100
  50
Z    X   
X  Z    
0
 1
X  Z  
transformation
50

X = 70

150
-1
X = (-0.60)( 50) + 100


 +1
Z = -0.60
PSY 340
Statistics for the
Social Sciences
Why transform distributions?
• Known properties
– Shape - the shape of the z-score distribution will be exactly the
same as the original distribution of raw scores. Every score stays in
the exact same position relative to every other score in the
distribution.
– Mean - when raw scores are transformed into z-scores, the mean
will always = 0.
– The standard deviation - when any distribution of raw scores is
transformed into z-scores the standard deviation will always = 1.
• Can use these known properties to locate scores relative to
the entire distribution
– Area under the curve corresponds to proportions (or probabilities)