Download Statistics 203

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Statistics 203
Solutions to Assignment #2
1. In a psychological experiment, the time on task was recorded for ten subjects
under a 5-minute time constraint. The measurements are in seconds:
175 190 250 230 240 200 185 190 225 265
(a) Find the mean and range.
(b) Find the mean deviation without using a computer.
(c) Find the standard deviation and variance without using a computer.
(d) Check all your results using a computer. Comment on any differences you may
see and why they occur. To check your value of MD in SPSS use Transform >
compute. Create a new variable (target variable) called MD whose values are
abs(time-215). Then use Analyze > Descriptive Statistics > Descriptives to find the
mean deviation.
1.
a)
Range = H - L = Max - Min = 265 - 175 = 90
b)
MD = (|175-215| + |190-215| + ... + |265-215|)/10 = 27
x =215
____
10
c)
Standard deviation=s =
[ ( xi  x ) 2 ] / 10  29.41
i 1
Variance=s2 = 865
d)
refer to the SPSS instruction
N
Valid
10
Missing
0
Mean
215.0000
Median
212.5000
Std. Deviation
31.00179
Variance
961.111
Range
90.00
MD
N
Mean
Valid
10
Missing
0
27.0000
NOTE: If you multiply the variance you got by hand by 10/9 you will get the one from SPSS. This
is because SPSS divides by N-1 instead of N.
1
2. Consider the EmployeeData.
(a) Determine the value of the most appropriate measure of central tendency and
spread for variables Salary and Jobcat.
(b) What is the mean and median of the data?
(c) Hand construct a frequency and relative frequency histogram for the Salary
data using classes with width 10,000 starting with class [15,000-25,000).
(d) Using the grouped histograms constructed above, estimate the mean and
median of the salary data. How close fare these to the true values? What is the
variance and standard deviation of the grouped data?
a)
Salary: Since the level of Salary is interval and the histogram of Salary is skewed to
the right, the median would be appropriate measures of centre and spread to use.
median= $28,875.00
(note:
N 1
is the position of the median and not the value of median)
2
Jobcat : Since the level of Jobcat is nominal the best measures to use would be the
mode.
Note: salary is interval data and the histogram is greatly skewed to the right, so
that the mean will be greatly influenced by the extreme data and will not
represent most of the data very well. However the median is not as sensitive to
the extreme data so that it will represent most of the data well.
b)
Mean = $34,419.57
Mdn = $28,875.00
c)
class
( 15,000- 25,000]
( 25,000- 35,000]
( 35,000- 45,000]
( 45,000- 55,000]
( 55,000- 65,000]
( 65,000- 75,000]
( 75,000- 85,000]
( 85,000- 95,000}
( 95,000-105,000]
(105,000-115,000]
(115,000-125,000]
(125,000-135,000]
f
Cum f
143
195
53
26
21
19
7
4
4
1
0
1
143
338
391
417
438
457
464
468
472
473
473
474
rf
%
Cum rf
Cum %
.302
.411
.112
.055
.044
.040
.015
.008
.008
.002
.000
.002
30.2
41.1
11.2
5.5
4.4
4.0
1.5
0.8
0.8
0.2
0.0
0.2
.302
.713
.825
.880
.924
.964
.979
.987
.996
.998
.998
1.000
30.2
71.3
82.5
88.0
92.4
96.4
97.9
98.7
99.6
99.8
99.8
100.0
2
d)
Mean= X 
 fm = (143*20000 + 195*30000 + ... + 1*120000)/474 = $34,345.99
N
m=midpoint of the class interval
f=frequency of a class interval
N=total number of scores
N
 cf b
)i = 25,000 + ((474/2 - 143)/195)*10000 = $29820.51
median= L  ( 2
f
N=number of cases in the distribution (474)
cf b =cumulative frequency below the lower limit of the critical interval
L=lower limit of the critical interval
f=frequency within the critical interval
i=class-interval size
Variance:
S
2
 f (m  X )

2
 fm

2
X2
N
N
2
143 * 20,000  195 * 30,000 2  ...  1 *130,000 2

- (34,345.99) 2
474
6.974 *1011

 34345.99 2
474
 2.9166  10 8
Standard deviation:
S
 f (m  X )
N
2

 fm
N
2
X2
 17078.09
Note: the formula weused to construct the mean and median using grouped histograms (refer to
page 93 of text book) is different from the formula we used to calculate the true mean and median
from the raw data (refer to page 111 of text book) even though all these values are fairly close to
their ungrouped values (true values). Make sure of the differences between the formulas.
3
frequency histogram
Count
15 0
10 0
50
0
$2 5,000
$5 0,000
$7 5,000
$1 00,00 0
$1 25,00 0
Current Salary
% relative frequency histogram
40%
Percent
30%
20%
10%
$25,000
$50,000
$75,000
$100,000
$125,000
Current Salary
Note: relative frequency histogram is different from the frequency histogram and cumulative
frequency histogram or cumulative frequency polygon. Be care for that the y-axis should be the
percentage of the frequency.
4
3. Open the 1991 U.S. General Social Survey. (It is found in the same place that
EmployeeData is found in SPSS. A grad student is getting it for me to place on my
webpage.)
(a) Construct a percentage bar chart for the variable Life. How do most people in
this survey view life?
(b) Determine the range, mean deviation and standard deviation for the age
variable. Note: You will have to use Transform > Compute again and the formula
for MD to calculate the MD value.
(c) What are the most appropriate measures of central tendency and spread for
the variables Happy and Age.
a)
40
30
20
Percent
10
0
Missing
Exciting
Routine
Dull
Is Life Exciting or Dull
b)
Age of Respondent
N
Valid
Missing
Mean
Median
Std.
Deviation
Variance
Range
1514
3
45.63
41.00
17.808
317.140
71
From the Transform > Compute menu we can calculate the Mean Deviation to be:
MD = 15.0554
5
c)
Since happy is ordinal level the most appropriate measures of centre and spread would
be the median and range.
Since age is interval level and skewed to the right, mean is seriously affected by the
extreme value. Consequently, Mean could not represent most of the data properly,
whereas the median would be the appropriate measures for central tendency and the
standard deviation or variance are appropriate measures for the spread.
Note: measure of centre tendency is different from measure of spread. Mode, mean and median
could be used to measure centre tendency, but MD, standard deviation and variance are used to
measure spread. Refer to P117 on the text book “Compare Measures of Variability”, the standard
deviation (like the mean deviation) has the characteristic of being an interval-level measure and,
therefore, can not be used with nominal or ordinal data.
Range is regarded generally as preliminary or rough index of the variability of distribution. It is
quick and simple to obtain but not very reliable, and it can be applied to interval or ordinal data.
6