Download Chapter 2

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Mean field particle methods wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
Describing distributions with
numbers
William P. Wattles
Psychology 302
1
Measuring the Center of a
distribution

Mean
–
–

Median
–

The arithmetic average
Requires measurement data
The middle value
Mode
–
The most common value
2
Measuring the center with the
Mean
x
X

n
3
Our first formula
X  the mean
X  the individual score
n  the number of individuals
  sum of
4
The Mean
One number that tells us about the
middle using all the data.
 The group not the individual has a
mean.

5
Population
Sample
Sample mean
X
6

Mu, the population mean
7
Population

Sample
X
Calculate the mean with Excel

Save the file psy302 to your hard drive
– right click on the file
– save to desktop or temp
Open file psy302
 Move flower trivia score to new sheet

8
Calculate the mean with Excel

Rename Sheet
–

Calculate the sum
–

type label: total
Calculate the mean
–

double click sheet tab, type flower
type label: mean
Check with average function
9
Measuring the center with the
Median
Rank order the values
 If the number of observations is odd the
median is the center observation
 If the number of observations is even
the median is the mean of the middle
two observations. (half way between
them)

10
Measuring the center with the
Median
n 1
Median 
2
11
The mean versus the median

The Mean
–
–

uses all the data
has arithmetic properties
The Median
–
less influenced by Outliers and extreme
values
12
Mean vs. Median
Betty
Mike
Tom
Miriam
Stacy
David
Mary Lou
John
Gail
Arthur
$ 28,514.00
$ 22,316.00
$ 30,112.00
$ 29,521.00
$ 21,555.00
$ 125,366.00
$ 22,132.00
$ 27,561.00
$ 24,635.00
$ 30,125.00
mean
median
$ 36,183.70
$ 28,037.50
The Mean
The mean uses all the data.
 The group not the individual has a
mean.
 We calculate the mean on
Quantitative Data

Three things to remember
5
The mean tells us where the middle of
the data lies.
 We also need to know how spread out
the data are.

Measuring Spread
Knowing about the middle only tells us
part of the story of the data.
 We need to know how spread out the
data are.

Variability


Variety is the spice
of life
Without variability
things are just
boring
exam3 Psy314 Health Psychology
69%
61%
79%
100%
54%
60%
85%
83%
58%
75%
85%
73%
87%
57%
80%
83%
65%
68%
58%
50%
83%
55%
59%
79%
89%
74%
85%
63%
Why is the mean alone not
enough to describe a
distribution?
Outliers
is NOT
the answer!!!!
The mean tells us the middle
but not how spread out the
scores are.
0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
Example of Spread


New York
mean annual high
temperature 62
14
Example of Spread


San Francisco
mean annual high
temperature
65
14
Example of Spread
New York
 mean max min range
 62
84 39 45
 San Francisco
 65
73 55 18

sd
17.1
6.4
16
Example of Variability
Psy 302 Spring 2003
120%
100%
Grade
80%
Final
60%
Quiz 7
40%
20%
0%
1
2
3
4
5
6
7
8
9
10
11
12
13
Student
14
15
16
17
18
19
20
21
22
23
Measuring Spread




Range
Quartiles
Five-number summary
– Minimum
– first quartile
– median
– third quartile
– Maximum
Standard Deviation
17

Mean 50.63%

Std Dev 21.4%

Mean 33.19%

Std Dev 13.2%
Deviation score
Each individual has a deviation score. It
measures how far that individual
deviates from the mean.
 Deviation scores always sum to zero.
 Deviation scores contain information.

–
How far and in which direction the
individual lies from the mean
19
Measuring spread with the
standard deviation
Measures spread by looking at how far
the observations are from their mean.
 The standard deviation is the square
root of the variance.
 The variance is also a measure of
spread

18
Mean
X
$28,756 $ 32,092 The average teacher $28,756, John $32.092
64.5
68 The average woman is 5 4 1/2, mary is 5'8"
110
90 The average IQ is 110 and Bubba has a 90
deviation
score
Individual deviation scores
deviation
score
The average teacher $28,756, John $32.092 $3,336 dollars
The average woman is 5 4 1/2, mary is 5'8"
3.5 inches
The average IQ is 110 and Bubba has a 90
-20 points
Standard deviation
One number that tells us about the
spread using all the data.
 The group not the individual has a
standard deviation.

Note!!
Standard Deviation
s
(
x

x
)

2
n 1
23
Variance
s 
2
(
x

x
)

2
n 1
22
Properties of the standard
deviation
s measures the spread about the mean
 s=0 only when there is no spread. This
happens when all the observations have
the same value.
 s is strongly influenced by extreme
values

24
New Column headed deviation
 Deviation score = X – the mean

Calculate Standard Deviation
with Excel
In new column type heading: dev2
 Enter formula to square deviation
 Total squared deviations

–

type label: sum of squares
Divide sum of squares by n-1
–
type label: variance
25
Moore page 50
Example 2.7page 50
subject
subject1
subject2
subject3
subject4
subject5
subject6
subject7
MetabolicRate
1792
1666
1362
1614
1460
1867
1439
Example 2.6page 42
subject
subject1
subject2
subject3
subject4
subject5
subject6
subject7
total
mean
MetabolicRate dev
1792
1666
1362
1614
1460
1867
1439
11200
1600
dev2
192
66
-238
14
-140
267
-161
36864
4356
56644
196
19600
71289
25921
0
214870
35811.6667
189.239707
189.239707
ss
var
stdev
stdev check
To Calculate Standard Deviation:
Total raw scores
divide by n to get mean
calculate deviation score for each subject (X
minus the mean)
Square each deviation score
Sum the deviation scores to obtain sum of
squares
Divide by n-1 to obtain variance
Take square root of variance to get standard
deviation.








Population
Sample
Sample variance
s
2
26
Population variance

2
27

Population
Variance
2
Sample
Variance
s
2
Little sigma, the
Population standard deviation

28
Sample standard deviation
s
29
Population
Standard Deviation

Sample
Standard
Deviation
s
To analyze data



1. Make a frequency
distribution and plot
the data
Look for overall
pattern and outliers
or skewness
Create a numerical
summary: mean and
standard deviation.
Start with a list of scores
Cathy
Paula
Sandy
Lois
Anne
Miriam
June
David
400
300
500
400
500
600
400
500
Alice
Mitzi
Jack
Mike
Dawn
Vicki
George
Ashley
300
200
700
500
600
400
500
800
41
Make a frequency distribution
200 xxxxxxxxxxx
300 xxxxxxxxxxxxxxxxxxxxxxxx
400 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
500 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
600 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
700 xxxxxxxxxxxxxxxxxxxxxxxxx
800 xxxxxxxxxxxxx
42
Frequency distribution
score
200
300
400
500
600
700
800
frequency
50
100
150
250
150
100
50
43
Represent with a chart
(histogram)
SAT Scores
250
150
100
50
800
700
600
500
400
300
0
200
Frequency
200
Score
44
Represent with line chart
SAT Scores
250
150
100
50
800
700
600
500
400
300
0
200
Frequency
200
Score
45
Density Curve

Replaces the
histogram when we
have many
observations.
Transform a score



Hotel Atlantico
200 pesos
Peso a unit of
measure
Transform a score



1 dollar = 28.38
pesos
200/28.38=$7.05
Dollar a unit of
measure
standardized observations or values.
 To standardize is to transform a score
into standard deviation units.

Frequently referred to as z-scores
 A z-score tells how many standard
deviations the score or observation falls
from the mean and in which direction

31
Standard Scores (Z-scores)
individual scores expressed in terms of
the mean and standard deviation of the
sample or population.
 Z = X minus the mean/standard
deviation

32
Z-score
z
x

33
new symbols
  the population mean
  the population s tan dard deviation
z  the s tan dardized value
34
Calculate Z-scores for trivia
data
Label column E as Z-score
 Type formula deviation score/std dev
 Make std dev reference absolute (use
F4 to insert dollar signs)
 Copy formula down.
 Check: should sum to zero

35
File extensions
Word .doc
 Excel
.xls
 Text files
.txt

To view File extensions
Open Windows Explorer
 Choose Tools/Folder Options/View
 uncheck “hide extensions for known file
types.

Z Scores

Height of young women
–
–
Mean = 64
Standard deviation = 2.7
How tall in deviations is a woman 70
inches?
 A woman 5 feet tall (60 inches) is how
tall in standard deviations?

37
Z scores

Height of young women
–
–
Mean = 64
Standard deviation = 2.7
How tall in deviations is a woman 70
inches? z = 2.22
 A woman 5 feet tall (60 inches) is how
tall in standard deviations? z = -1.48

38
Calculating Z scores
height   70  64
z

 2.22

2.7
z
height  

60  64

 1.48
2.7
39
Calculating X from Z scores
X  z *  
Types of data

Categorical or Qualitative data
– Nominal: Assign individuals to mutually
exclusive categories.

exhaustive: everyone is in one category
– Ordinal: Involves putting individuals in rank
order. Categories are still mutually
exclusive and exhaustive, but the order
cannot be changed.
72
Types of data

Measurement or Quantitative Data
– Interval data: There is a consistent interval
or difference between the numbers. Zero
point is arbitrary
– Ratio data: Interval scale plus a meaningful
zero. Zero means none. Weight, money
and Celsius scales exemplify ratio data
– Measurement data allows for arithmetic
operations.
73
Review

Video2
The End
60
Mean vs. Median