Download HOMEWORK #5, due Lecture \#7

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
HOMEWORK #3
1) Consider the number of red petals for bluebonnets that are near the highway. Assume
that the population standard deviation is = 2.
a) Using the actual data, what is the sample mean?
- The sample mean is defined as (the sum of all data) divided by (sample size).
You can see the result from SPSS. The sample mean is 3.08.
b) Construct by hand the 95% confidence interval for the population mean  using
formulas from class.
- the 95% confidence interval for the population mean is
X  1.96 *  / (square root of n)
= 3.08  1.96 * 2 / (square root of 100)
= 3.08  1.96 * 2 / 10
= 3.08  0.392
= (2.688, 3.472)
c) Now use SPSS to get the same confidence interval. SPSS should give you a
slightly different answer, because SPSS does not assume that the population
standard deviation is known, and because the sample standard deviation is not the
same as the population standard deviation.
- From SPSS, the 95% confidence interval is (2.73, 3.43)
2) Repeat the previous exercise using 99% confidence.
- the 99% confidence interval for the population mean is
X  2.58 *  / (square root of n)
= 3.08  2.58 * 2 / (square root of 100)
= 3.08  2.58 * 2 / 10
= 3.08  0.516
= (2.564, 3.524)
3) What sample size would I need for bluebonnets near the highway to have a 95%
confidence interval of length 2xE = 0.20?
2 * 1.96 *  / (square root of n) = 0.20

1.96 * 2 / (square root of n) = 0.10

( square root of n ) = 1.96 * 2 / 0.10 = 39.2

n = (39.2) 2

n = 1536.64  1537
4) In problem #5, would the sample size increase or decrease if you wanted the 99% CI
to have the same length?
(square root of n) = 2.58 * 2 / 0.10 = 51.6

n = (51.6) 2

n = 2662.56 : the sample size increases
5) In problem #5, would the sample size increase or decrease if you wanted the 95% CI
to have the length 0.05(= 2*E)?
- The sample size increases. You want more precision, you have to pay the price.
6) In problem #5, would the sample size increase or decrease if the population standard
deviation were = 4?
- The sample size increases. If your data are more variable, you need a larger sample
size to get any given amount of precision.
PROBLEM #7
In the armspan data for males, the sample mean difference of height - armspan is -0.26,
with a sample size of n = 50. Suppose that the population standard deviation is  = 1.7.
Compute by hand a 99% confidence interval for .
(Solution)
The 99% confidence interval for  is X  Z/2 *  / (square root of n)
i.e. – 0.26  z(0.005) * 1.7 / (square root of 50)
= - 0.26  2.58 * 1.7 / (square root of 50)
= - 0.26  0.62
= (- 0.88, 0.36 )
PROBLEM #8
In the armspan data for males as above, use hypothesis testing terminology to test the null
hypothesis of no difference with a type I error of  = 0.01. State the null hypothesis, the
alternative hypothesis, and your decision.
(Solution)
a) The null hypothesis: There is no difference between the population
mean armspan and height for males, i.e., the population mean of their
difference equals zero
b) The alternative hypothesis: The null hypothesis is false
c) The decision: 0 is in the 99% confidence interval, so we cannot reject
the null hypothesis
PROBLEM #9
In problem #4, I claim that the p--value is p < 0.01. Is my claim correct?
(Solution)
No, your claim is not correct. In problem #4, we can not reject the null
hypothesis. It means p-value is greater than 0.01.
PROBLEM #10
What happens to confidence intervals as sample sizes increase. Choose one. (a) Nothing;
(b) They get shorter; (c) They get longer.
(Solution)
The confidence intervals get shorter as sample sizes increase
PROBLEM #11
Look up the Framingham data on the web, for this homework assignment.
I have put together an SPSS data set, but I have not labeled the variables at all.
a. Give “age” the label “Patient Age
b. Give “sbpexam1” the label “Systolic Blood Pressure, Exam 1”
c. Give “sbpexam2” the label “Systolic Blood Pressure, Exam 2”
d. Give “smoke” the label “Smoking Status”
e. Give “chol1” the label “Cholesterol, Exam 1”
f. Give “chol2” the label “Cholesterol, Exam 2”
g. Give “chd” the label “Coronary Heart Disease Status”
h. Set up the values for Smoking Status: 0 = “Nonsmoker”, 1 = Smoker”
i. Set up the values for Coronary Heart Disease Status: 0 = “No Heart Disease”, 1 =
“Heart Disease”
j. Now save your data set.
PROBLEM #12
Consider the Framingham data.
The major variable for you to analyze in this assignment is the change in Systolic Blood
Pressure (SBP) over the two-year period from Exam #1 to Exam #2.
a. Create a variable sbdiff which is the difference in systolic blood pressure: Exam
#1 – Exam #2.
b. Create a simple box plot for this variable.
200
100
1246
1156
173
1535
354
893
1567
92
1422
1300
1353
1365
939
645
1601
1380
0
460
22
1477
452
234
232
720
537
523
298
441
81
-100
N=
1615
SBDIFF
a. What is the 25th percentile, approximately? -7
b. What is the 75th percentile, approximately? 9.5
c. Is there any evidence of a massive outlier? No
c. Create a histogram of the data. Do the data look reasonably bell--shaped?
- Doesn’t look too bad
Histogram
600
500
400
300
Frequency
200
Std. Dev = 12.94
100
Mean = 1.5
N = 1615.00
0
-60.0
-40.0
-50.0
-20.0
-30.0
0.0
-10.0
20.0
10.0
40.0
30.0
60.0
50.0
80.0
70.0
90.0
SBDIFF
d. Create a Q-Q of the data. Do the data look reasonably bell—shaped?
- It’s clearly not exactly normally distributed, but still not as bad as other
examples we’ve seen
Normal Q-Q Plot of SBDIFF
4
3
2
1
Expected Normal
0
-1
-2
-3
-4
-80
-60
-40
Observed Value
-20
0
20
40
60
80
100
e. Obtain summary statistics for this variable. What is the sample size?
- The sample size is 1615
Descriptives
SBDIFF
Mean
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtos is
Lower Bound
Upper Bound
Statis tic
1.4954
.8639
Std. Error
.32193
2.1268
1.4333
1.5000
167.377
12.93742
-61.50
90.00
151.50
16.5000
.170
2.296
.061
.122
f. What is the confidence interval for the population mean change as outputed from
SPSS?
- the 95% confidence interval for the population means change is
(0.8639, 2.1268)
g. Interpret this confidence interval.
- The chance is 95% that the population mean change is between 0.8639 and
2.1268. Also, we can think that since this interval doesn’t include 0 (no mean
change), we can say that there is mean change in blood pressure over two years
(with 95% confidence, of course)
h. Is the p--value p < 0.05? - Yes, because 0 (:no mean change) is not included in
the 95% confidence interval. It means we can reject the null hypothesis and pvalue is less than 0.05.
i. What is the p-value above a test of? State the null hypothesis, the alternative
hypothesis, and the Type I error.
(Solution)
The null hypothesis: There is no population mean change
The alternative hypothesis: The null hypothesis is false
Type I error :  = 0.05
j. Do you think there is any practically significant mean change in blood pressure
over two years?
- I have no clue is probably not a bad answer. It’s a change of about 1 unit in the
log scale, which is a change of about 3 units in the original systolic blood pressure
scale. My guess is that such a change is not clinically significant, but I do not
know.