Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
HOMEWORK #3 1) Consider the number of red petals for bluebonnets that are near the highway. Assume that the population standard deviation is = 2. a) Using the actual data, what is the sample mean? - The sample mean is defined as (the sum of all data) divided by (sample size). You can see the result from SPSS. The sample mean is 3.08. b) Construct by hand the 95% confidence interval for the population mean using formulas from class. - the 95% confidence interval for the population mean is X 1.96 * / (square root of n) = 3.08 1.96 * 2 / (square root of 100) = 3.08 1.96 * 2 / 10 = 3.08 0.392 = (2.688, 3.472) c) Now use SPSS to get the same confidence interval. SPSS should give you a slightly different answer, because SPSS does not assume that the population standard deviation is known, and because the sample standard deviation is not the same as the population standard deviation. - From SPSS, the 95% confidence interval is (2.73, 3.43) 2) Repeat the previous exercise using 99% confidence. - the 99% confidence interval for the population mean is X 2.58 * / (square root of n) = 3.08 2.58 * 2 / (square root of 100) = 3.08 2.58 * 2 / 10 = 3.08 0.516 = (2.564, 3.524) 3) What sample size would I need for bluebonnets near the highway to have a 95% confidence interval of length 2xE = 0.20? 2 * 1.96 * / (square root of n) = 0.20 1.96 * 2 / (square root of n) = 0.10 ( square root of n ) = 1.96 * 2 / 0.10 = 39.2 n = (39.2) 2 n = 1536.64 1537 4) In problem #5, would the sample size increase or decrease if you wanted the 99% CI to have the same length? (square root of n) = 2.58 * 2 / 0.10 = 51.6 n = (51.6) 2 n = 2662.56 : the sample size increases 5) In problem #5, would the sample size increase or decrease if you wanted the 95% CI to have the length 0.05(= 2*E)? - The sample size increases. You want more precision, you have to pay the price. 6) In problem #5, would the sample size increase or decrease if the population standard deviation were = 4? - The sample size increases. If your data are more variable, you need a larger sample size to get any given amount of precision. PROBLEM #7 In the armspan data for males, the sample mean difference of height - armspan is -0.26, with a sample size of n = 50. Suppose that the population standard deviation is = 1.7. Compute by hand a 99% confidence interval for . (Solution) The 99% confidence interval for is X Z/2 * / (square root of n) i.e. – 0.26 z(0.005) * 1.7 / (square root of 50) = - 0.26 2.58 * 1.7 / (square root of 50) = - 0.26 0.62 = (- 0.88, 0.36 ) PROBLEM #8 In the armspan data for males as above, use hypothesis testing terminology to test the null hypothesis of no difference with a type I error of = 0.01. State the null hypothesis, the alternative hypothesis, and your decision. (Solution) a) The null hypothesis: There is no difference between the population mean armspan and height for males, i.e., the population mean of their difference equals zero b) The alternative hypothesis: The null hypothesis is false c) The decision: 0 is in the 99% confidence interval, so we cannot reject the null hypothesis PROBLEM #9 In problem #4, I claim that the p--value is p < 0.01. Is my claim correct? (Solution) No, your claim is not correct. In problem #4, we can not reject the null hypothesis. It means p-value is greater than 0.01. PROBLEM #10 What happens to confidence intervals as sample sizes increase. Choose one. (a) Nothing; (b) They get shorter; (c) They get longer. (Solution) The confidence intervals get shorter as sample sizes increase PROBLEM #11 Look up the Framingham data on the web, for this homework assignment. I have put together an SPSS data set, but I have not labeled the variables at all. a. Give “age” the label “Patient Age b. Give “sbpexam1” the label “Systolic Blood Pressure, Exam 1” c. Give “sbpexam2” the label “Systolic Blood Pressure, Exam 2” d. Give “smoke” the label “Smoking Status” e. Give “chol1” the label “Cholesterol, Exam 1” f. Give “chol2” the label “Cholesterol, Exam 2” g. Give “chd” the label “Coronary Heart Disease Status” h. Set up the values for Smoking Status: 0 = “Nonsmoker”, 1 = Smoker” i. Set up the values for Coronary Heart Disease Status: 0 = “No Heart Disease”, 1 = “Heart Disease” j. Now save your data set. PROBLEM #12 Consider the Framingham data. The major variable for you to analyze in this assignment is the change in Systolic Blood Pressure (SBP) over the two-year period from Exam #1 to Exam #2. a. Create a variable sbdiff which is the difference in systolic blood pressure: Exam #1 – Exam #2. b. Create a simple box plot for this variable. 200 100 1246 1156 173 1535 354 893 1567 92 1422 1300 1353 1365 939 645 1601 1380 0 460 22 1477 452 234 232 720 537 523 298 441 81 -100 N= 1615 SBDIFF a. What is the 25th percentile, approximately? -7 b. What is the 75th percentile, approximately? 9.5 c. Is there any evidence of a massive outlier? No c. Create a histogram of the data. Do the data look reasonably bell--shaped? - Doesn’t look too bad Histogram 600 500 400 300 Frequency 200 Std. Dev = 12.94 100 Mean = 1.5 N = 1615.00 0 -60.0 -40.0 -50.0 -20.0 -30.0 0.0 -10.0 20.0 10.0 40.0 30.0 60.0 50.0 80.0 70.0 90.0 SBDIFF d. Create a Q-Q of the data. Do the data look reasonably bell—shaped? - It’s clearly not exactly normally distributed, but still not as bad as other examples we’ve seen Normal Q-Q Plot of SBDIFF 4 3 2 1 Expected Normal 0 -1 -2 -3 -4 -80 -60 -40 Observed Value -20 0 20 40 60 80 100 e. Obtain summary statistics for this variable. What is the sample size? - The sample size is 1615 Descriptives SBDIFF Mean 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skewness Kurtos is Lower Bound Upper Bound Statis tic 1.4954 .8639 Std. Error .32193 2.1268 1.4333 1.5000 167.377 12.93742 -61.50 90.00 151.50 16.5000 .170 2.296 .061 .122 f. What is the confidence interval for the population mean change as outputed from SPSS? - the 95% confidence interval for the population means change is (0.8639, 2.1268) g. Interpret this confidence interval. - The chance is 95% that the population mean change is between 0.8639 and 2.1268. Also, we can think that since this interval doesn’t include 0 (no mean change), we can say that there is mean change in blood pressure over two years (with 95% confidence, of course) h. Is the p--value p < 0.05? - Yes, because 0 (:no mean change) is not included in the 95% confidence interval. It means we can reject the null hypothesis and pvalue is less than 0.05. i. What is the p-value above a test of? State the null hypothesis, the alternative hypothesis, and the Type I error. (Solution) The null hypothesis: There is no population mean change The alternative hypothesis: The null hypothesis is false Type I error : = 0.05 j. Do you think there is any practically significant mean change in blood pressure over two years? - I have no clue is probably not a bad answer. It’s a change of about 1 unit in the log scale, which is a change of about 3 units in the original systolic blood pressure scale. My guess is that such a change is not clinically significant, but I do not know.