Download exam1bkey

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Health Data Analysis
Spring 2008
Exam#1B “Brief” Answer Key
Dr. Robert Jantzen
Department of Economics
1. We’d want to draw a random sample of each group of infants. We’d have to assign a
unique ID # to each infant, from 1 to 1489 for normal weights and 1 to 184 for low
weights. Then we’d use either a random # table or number generator to find either 4 digit
#s to select from the first group and 3 digit #s from the second group, e.g.,
Population Size:
1489
Population Size:
184
Random Numbers:
1079
246
1286
268
198
1148
Random Numbers:
70
143
34
113
22
169
2. The summary table (see below) shows that Medicare is the biggest insurer (36%),
followed by EmpireBlue (28%) & Medicaid (27%). These comprise 90% of the total.
3. A Bar chart, Pie chart or Pareto diagram could be used to show that that Medicare is
the biggest insurer (36%), followed by EmpireBlue (28%) & Medicaid (27%). These
comprise 90% of the total. See Bar chart below.
Valid
Empire Blue
HIP
Frequency
25
Percent
27.8
Valid Percent
27.8
Cumulative
Percent
27.8
4
4.4
4.4
32.2
Medicaid
24
26.7
26.7
58.9
Medicare
32
35.6
35.6
94.4
1
1.1
1.1
95.6
100.0
Oxford
United
Total
4
4.4
4.4
90
100.0
100.0
4. The range of values is 82 (85-3) so we can divide that up into 8 intervals of
width = 10. We can find out how many #s are in each interval by using SPSS to
generate “raw” frequencies and then filling in the table (see below raw frequencies). The
frequency table shows that the #s are centered around 20 months and aren’t
symmetrically distributed.
Distribution of Survival Months
Value
>0-10
>10-20
>20-30
>30-40
>40-50
>50-60
>60-70
>70-80
>80-90
#
17
27
15
12
10
4
2
2
1
90
%
18.9%
30.0%
16.7%
13.3%
11.1%
4.4%
2.2%
2.2%
1.1%
100.0%
Cum.%
18.9%
48.9%
65.6%
78.9%
90.0%
94.4%
96.7%
98.9%
100.0%
5. We could construct either a histogram or boxplot. The histogram (by SPSS) is skewed
to the high numbers centered around 20 months (see below).
6. a. The mean is 26 which is the arithmetic average. The median is 21 which is the #
which separates the lowest 50% of the #s from the highest 50%.
b. The Range is 82 (85-3) which is the interval that contains all of the #s. The
Interquartile Range is 25.25 (37.25-12) which is the interval that contains the middle 50%
of the #s. The Std. Deviation is 18.5 which is a measure of how much the #s differ from
the mean.
c. The Pearson measure of skewness is .27 [= (26.02 – 21)/18.452] which is > .1 in
absolute value so the #s are skewed to the high numbers.
N
Valid
Missing
90
0
Mean
26.02
Median
21.00
Std. Deviation
18.452
Range
82
Minimum
3
Maximum
Percentiles
85
25
12.00
50
21.00
75
37.25
7. This is a Binomial problem: probability that more than 0 are defective is .651, so
65.1% chance that shipment will be rejected.
Binomial Probabilities: This spreadsheet (adapted from PHSTAT2) makes some
common calculations for outcomes that can be modeled with the binomial distribution.
NOTES: Edit the values in BLUE to reflect the sample size
and
probability of success in your sample. (Note: the sample
size
must be <= 20). The spreadsheet will then calculate the
probability of observing exactly X successes in the sample,
as well as <= X successes, > X successes, etc.
Data
Sample size
Probability of
success
10
0.1
Statistics
Mean
Variance
Standard deviation
1
0.9
0.948683
Binomial Probabilities Table
X
0
1
2
3
4
5
6
7
8
9
10
P(X)
0.348678
0.38742
0.19371
0.057396
0.01116
0.001488
0.000138
8.75E-06
3.65E-07
9E-09
1E-10
P(<=X)
0.348678
0.736099
0.929809
0.987205
0.998365
0.999853
0.999991
1
1
1
1
P(<X)
0
0.348678
0.736099
0.929809
0.987205
0.998365
0.999853
0.999991
1
1
1
P(>X)
0.651322
0.263901
0.070191
0.012795
0.001635
0.000147
9.12E-06
3.74E-07
9.1E-09
1E-10
0
P(>=X)
1
0.651322
0.263901
0.070191
0.012795
0.001635
0.000147
9.12E-06
3.74E-07
9.1E-09
1E-10
8. The 95% Confidence Interval for the population mean survival time is 22.2 to 29.9
months. We’re 95% sure that true mean is in that interval. We need a random sample w/
#s that aren’t highly skewed. The values could be generated by SPSS’ Explore or from
the Excel calculator.
Descriptives
Statistic
Months of Survival
Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
26.02
Lower Bound
Upper Bound
22.16
29.89
24.65
Std. Error
1.945
Median
21.00
Variance
340.471
Std. Deviation
18.452
Minimum
3
Maximum
85
Range
82
Interquartile Range
25
Skewness
Kurtosis
1.010
.254
.597
.503
Confidence Interval Estimate for the Mean (sigma
unknown)
Data
Sample Standard
Deviation
Sample Mean
Sample Size
Confidence Level
18.452
26.02
90
95%
Intermediate Calculations
Standard Error of the Mean
Degrees of Freedom
t Value
Interval Half Width
Confidence Interval
Interval Lower Limit
Interval Upper Limit
1.94501158
89
1.986978657
3.864696496
22.16
29.88
9. Using the normal probability calculator:
a.
the probability of surviving >24 months is .543 or 54.3%
b. the maximum number of months that 90% will survive is actually the 10th
percentile which is 2.29 months (tricky question).
Normal Probabilities: This spreadsheet (Adapted from PHSTAT2) makes some
common calculations for normal distributions.
Common Data
Mean
Standard
Deviation
NOTES: Edit the values in BLUE to reflect the Mean
and Standard
Deviation Values in your data. Also edit the other
values in
BLUE to calculate the probability of an individual
score (X) being
<= or > than a particular value or to find a particular
percentile score.
26
18.5
Probability for a Range
Probability for X <=
X Value
From X Value
To X Value
8
12
Z Value
P(X<=9)
Z Value for 8
Z Value for 12
P(X<=8)
P(X<=12)
P(8<=X<=12)
-0.972973
-0.756757
0.1653
0.2246
0.0593
9
0.918919
0.179069
Probability for X >
X Value
Z Value
P(X>24)
24
0.108108
0.5430
Probability for X<9 or X >24
P(X<9 or X >24)
0.7221
Find X and Z Given Cum. Pctage.
Cumulative Percentage
10.00%
Z Value
-1.281552
X Value
2.291296
10. The normal probability plot is not a straight line, so the #s aren’t normally distributed
(see below plot).
11. This is a Bayes problem:
a. 40% of employees received internal recommendations
b. w/ internal recommendation, prob of being prized is .175, ok is .75 and only .075 for
fired.
Employee Type:
Prized
OK
Fired
Simples:
0.1
0.6
0.3
Conditionals
0.7
0.5
0.1
Joints
0.07
0.3
0.03
0.4
New
conditionals
0.175
0.75
0.075
12. Given prob of distress = .3, reading = .1 and joint = .05
a. prob of distress given reading = joint/simple = .05/ .1 = .5
b. prob of reading given distress = joint/simple = .05/.3 = .17
Months of Survival
Valid
Cumulative
Percent
3.3
3
Frequency
3
Percent
3.3
Valid Percent
3.3
4
3
3.3
3.3
6.7
5
3
3.3
3.3
10.0
6
1
1.1
1.1
11.1
7
2
2.2
2.2
13.3
8
1
1.1
1.1
14.4
9
2
2.2
2.2
16.7
10
2
2.2
2.2
18.9
11
3
3.3
3.3
22.2
12
3
3.3
3.3
25.6
13
4
4.4
4.4
30.0
14
6
6.7
6.7
36.7
15
2
2.2
2.2
38.9
16
1
1.1
1.1
40.0
17
1
1.1
1.1
41.1
18
4
4.4
4.4
45.6
19
1
1.1
1.1
46.7
20
2
2.2
2.2
48.9
21
3
3.3
3.3
52.2
22
1
1.1
1.1
53.3
23
1
1.1
1.1
54.4
24
1
1.1
1.1
55.6
25
1
1.1
1.1
56.7
26
2
2.2
2.2
58.9
27
2
2.2
2.2
61.1
28
2
2.2
2.2
63.3
29
1
1.1
1.1
64.4
30
1
1.1
1.1
65.6
31
1
1.1
1.1
66.7
32
1
1.1
1.1
67.8
33
2
2.2
2.2
70.0
34
2
2.2
2.2
72.2
35
1
1.1
1.1
73.3
36
1
1.1
1.1
74.4
37
1
1.1
1.1
75.6
38
1
1.1
1.1
76.7
39
1
1.1
1.1
77.8
40
1
1.1
1.1
78.9
41
1
1.1
1.1
80.0
42
1
1.1
1.1
81.1
43
1
1.1
1.1
82.2
44
1
1.1
1.1
83.3
45
1
1.1
1.1
84.4
46
1
1.1
1.1
85.6
47
1
1.1
1.1
86.7
48
1
1.1
1.1
87.8
49
1
1.1
1.1
88.9
50
1
1.1
1.1
90.0
52
1
1.1
1.1
91.1
55
1
1.1
1.1
92.2
58
1
1.1
1.1
93.3
60
1
1.1
1.1
94.4
61
1
1.1
1.1
95.6
69
1
1.1
1.1
96.7
71
1
1.1
1.1
97.8
75
1
1.1
1.1
98.9
85
1
1.1
1.1
100.0
90
100.0
100.0
Total