Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Health Data Analysis Spring 2008 Exam#1B “Brief” Answer Key Dr. Robert Jantzen Department of Economics 1. We’d want to draw a random sample of each group of infants. We’d have to assign a unique ID # to each infant, from 1 to 1489 for normal weights and 1 to 184 for low weights. Then we’d use either a random # table or number generator to find either 4 digit #s to select from the first group and 3 digit #s from the second group, e.g., Population Size: 1489 Population Size: 184 Random Numbers: 1079 246 1286 268 198 1148 Random Numbers: 70 143 34 113 22 169 2. The summary table (see below) shows that Medicare is the biggest insurer (36%), followed by EmpireBlue (28%) & Medicaid (27%). These comprise 90% of the total. 3. A Bar chart, Pie chart or Pareto diagram could be used to show that that Medicare is the biggest insurer (36%), followed by EmpireBlue (28%) & Medicaid (27%). These comprise 90% of the total. See Bar chart below. Valid Empire Blue HIP Frequency 25 Percent 27.8 Valid Percent 27.8 Cumulative Percent 27.8 4 4.4 4.4 32.2 Medicaid 24 26.7 26.7 58.9 Medicare 32 35.6 35.6 94.4 1 1.1 1.1 95.6 100.0 Oxford United Total 4 4.4 4.4 90 100.0 100.0 4. The range of values is 82 (85-3) so we can divide that up into 8 intervals of width = 10. We can find out how many #s are in each interval by using SPSS to generate “raw” frequencies and then filling in the table (see below raw frequencies). The frequency table shows that the #s are centered around 20 months and aren’t symmetrically distributed. Distribution of Survival Months Value >0-10 >10-20 >20-30 >30-40 >40-50 >50-60 >60-70 >70-80 >80-90 # 17 27 15 12 10 4 2 2 1 90 % 18.9% 30.0% 16.7% 13.3% 11.1% 4.4% 2.2% 2.2% 1.1% 100.0% Cum.% 18.9% 48.9% 65.6% 78.9% 90.0% 94.4% 96.7% 98.9% 100.0% 5. We could construct either a histogram or boxplot. The histogram (by SPSS) is skewed to the high numbers centered around 20 months (see below). 6. a. The mean is 26 which is the arithmetic average. The median is 21 which is the # which separates the lowest 50% of the #s from the highest 50%. b. The Range is 82 (85-3) which is the interval that contains all of the #s. The Interquartile Range is 25.25 (37.25-12) which is the interval that contains the middle 50% of the #s. The Std. Deviation is 18.5 which is a measure of how much the #s differ from the mean. c. The Pearson measure of skewness is .27 [= (26.02 – 21)/18.452] which is > .1 in absolute value so the #s are skewed to the high numbers. N Valid Missing 90 0 Mean 26.02 Median 21.00 Std. Deviation 18.452 Range 82 Minimum 3 Maximum Percentiles 85 25 12.00 50 21.00 75 37.25 7. This is a Binomial problem: probability that more than 0 are defective is .651, so 65.1% chance that shipment will be rejected. Binomial Probabilities: This spreadsheet (adapted from PHSTAT2) makes some common calculations for outcomes that can be modeled with the binomial distribution. NOTES: Edit the values in BLUE to reflect the sample size and probability of success in your sample. (Note: the sample size must be <= 20). The spreadsheet will then calculate the probability of observing exactly X successes in the sample, as well as <= X successes, > X successes, etc. Data Sample size Probability of success 10 0.1 Statistics Mean Variance Standard deviation 1 0.9 0.948683 Binomial Probabilities Table X 0 1 2 3 4 5 6 7 8 9 10 P(X) 0.348678 0.38742 0.19371 0.057396 0.01116 0.001488 0.000138 8.75E-06 3.65E-07 9E-09 1E-10 P(<=X) 0.348678 0.736099 0.929809 0.987205 0.998365 0.999853 0.999991 1 1 1 1 P(<X) 0 0.348678 0.736099 0.929809 0.987205 0.998365 0.999853 0.999991 1 1 1 P(>X) 0.651322 0.263901 0.070191 0.012795 0.001635 0.000147 9.12E-06 3.74E-07 9.1E-09 1E-10 0 P(>=X) 1 0.651322 0.263901 0.070191 0.012795 0.001635 0.000147 9.12E-06 3.74E-07 9.1E-09 1E-10 8. The 95% Confidence Interval for the population mean survival time is 22.2 to 29.9 months. We’re 95% sure that true mean is in that interval. We need a random sample w/ #s that aren’t highly skewed. The values could be generated by SPSS’ Explore or from the Excel calculator. Descriptives Statistic Months of Survival Mean 95% Confidence Interval for Mean 5% Trimmed Mean 26.02 Lower Bound Upper Bound 22.16 29.89 24.65 Std. Error 1.945 Median 21.00 Variance 340.471 Std. Deviation 18.452 Minimum 3 Maximum 85 Range 82 Interquartile Range 25 Skewness Kurtosis 1.010 .254 .597 .503 Confidence Interval Estimate for the Mean (sigma unknown) Data Sample Standard Deviation Sample Mean Sample Size Confidence Level 18.452 26.02 90 95% Intermediate Calculations Standard Error of the Mean Degrees of Freedom t Value Interval Half Width Confidence Interval Interval Lower Limit Interval Upper Limit 1.94501158 89 1.986978657 3.864696496 22.16 29.88 9. Using the normal probability calculator: a. the probability of surviving >24 months is .543 or 54.3% b. the maximum number of months that 90% will survive is actually the 10th percentile which is 2.29 months (tricky question). Normal Probabilities: This spreadsheet (Adapted from PHSTAT2) makes some common calculations for normal distributions. Common Data Mean Standard Deviation NOTES: Edit the values in BLUE to reflect the Mean and Standard Deviation Values in your data. Also edit the other values in BLUE to calculate the probability of an individual score (X) being <= or > than a particular value or to find a particular percentile score. 26 18.5 Probability for a Range Probability for X <= X Value From X Value To X Value 8 12 Z Value P(X<=9) Z Value for 8 Z Value for 12 P(X<=8) P(X<=12) P(8<=X<=12) -0.972973 -0.756757 0.1653 0.2246 0.0593 9 0.918919 0.179069 Probability for X > X Value Z Value P(X>24) 24 0.108108 0.5430 Probability for X<9 or X >24 P(X<9 or X >24) 0.7221 Find X and Z Given Cum. Pctage. Cumulative Percentage 10.00% Z Value -1.281552 X Value 2.291296 10. The normal probability plot is not a straight line, so the #s aren’t normally distributed (see below plot). 11. This is a Bayes problem: a. 40% of employees received internal recommendations b. w/ internal recommendation, prob of being prized is .175, ok is .75 and only .075 for fired. Employee Type: Prized OK Fired Simples: 0.1 0.6 0.3 Conditionals 0.7 0.5 0.1 Joints 0.07 0.3 0.03 0.4 New conditionals 0.175 0.75 0.075 12. Given prob of distress = .3, reading = .1 and joint = .05 a. prob of distress given reading = joint/simple = .05/ .1 = .5 b. prob of reading given distress = joint/simple = .05/.3 = .17 Months of Survival Valid Cumulative Percent 3.3 3 Frequency 3 Percent 3.3 Valid Percent 3.3 4 3 3.3 3.3 6.7 5 3 3.3 3.3 10.0 6 1 1.1 1.1 11.1 7 2 2.2 2.2 13.3 8 1 1.1 1.1 14.4 9 2 2.2 2.2 16.7 10 2 2.2 2.2 18.9 11 3 3.3 3.3 22.2 12 3 3.3 3.3 25.6 13 4 4.4 4.4 30.0 14 6 6.7 6.7 36.7 15 2 2.2 2.2 38.9 16 1 1.1 1.1 40.0 17 1 1.1 1.1 41.1 18 4 4.4 4.4 45.6 19 1 1.1 1.1 46.7 20 2 2.2 2.2 48.9 21 3 3.3 3.3 52.2 22 1 1.1 1.1 53.3 23 1 1.1 1.1 54.4 24 1 1.1 1.1 55.6 25 1 1.1 1.1 56.7 26 2 2.2 2.2 58.9 27 2 2.2 2.2 61.1 28 2 2.2 2.2 63.3 29 1 1.1 1.1 64.4 30 1 1.1 1.1 65.6 31 1 1.1 1.1 66.7 32 1 1.1 1.1 67.8 33 2 2.2 2.2 70.0 34 2 2.2 2.2 72.2 35 1 1.1 1.1 73.3 36 1 1.1 1.1 74.4 37 1 1.1 1.1 75.6 38 1 1.1 1.1 76.7 39 1 1.1 1.1 77.8 40 1 1.1 1.1 78.9 41 1 1.1 1.1 80.0 42 1 1.1 1.1 81.1 43 1 1.1 1.1 82.2 44 1 1.1 1.1 83.3 45 1 1.1 1.1 84.4 46 1 1.1 1.1 85.6 47 1 1.1 1.1 86.7 48 1 1.1 1.1 87.8 49 1 1.1 1.1 88.9 50 1 1.1 1.1 90.0 52 1 1.1 1.1 91.1 55 1 1.1 1.1 92.2 58 1 1.1 1.1 93.3 60 1 1.1 1.1 94.4 61 1 1.1 1.1 95.6 69 1 1.1 1.1 96.7 71 1 1.1 1.1 97.8 75 1 1.1 1.1 98.9 85 1 1.1 1.1 100.0 90 100.0 100.0 Total