Download STAT 2507 Solutions for Assignment # 4 Fall 2008 Note: 1. Some

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
STAT 2507 Solutions for Assignment # 4
Fall 2008
Note: 1. Some answers to lab part may vary from one student to another. For such cases, the
answers given here should serve as guidelines only as they correspond to one replication that
one instructor made.
2. A typo was made in lab question 5. In step 3, we should read
let c1=(c2 <= 4 and c3 >= 4)
instead of
let c1=(c2 >= 4 and c3 <= 4).
Consequently, the TAs are asked to give each student they marked, the full 6 marks for that
question.
Part I. Lab questions. Use only the blanks left to answer lab questions. Provide all histograms, boxplots you are asked to print, but DO NOT print the data
you are asked to generate.
1. Continuous distributions:
Generate and store in column c1 10,000 values from the uniform distribution on the interval [3,7] as follows:
random 10000 c1;
uniform 3 7.
[3] a. Use mean command to find the sample mean x̄ of these data 5.002
Note: The mean µ of a uniform distribution over an interval [a, b] is simply the middle of
this interval, i.e. µ = (a + b)/2.
[3] b. What is the mean µ of the uniform distribution on the interval [3,7]? 5 Compare
µ to the value x̄ you found in part a). both are very close
Generate and store in column c2 1,000 values from exponential distribution with parameter λ = .125 as follows:
random 1000 c2;
exponential 8.
Note: The mean µ and the standard deviation σ of such distribution are both equal to
1/λ = 8 and this is the value you are asked to enter in the command above.
[3] c. Use desc command to find the sample mean x̄ and sample standard deviation s for
these 1,000 data X̄ = 8.170 and s = 8.319
Are x̄ and s close to the value 1/λ = 8? Yes. Fairly close.
[3] d. Print (and include in your assignment) the histogram of the 1,000 values you generated from this exponential distribution. What is the shape of this distribution? Skewed
to the right.
1
2. Normal distribution: Generate and store in column c3 10,000 values from the standard
normal distribution as follows:
random 10000 c3;
normal.
[3] a. Print (and include in your assignment) the histogram for these data. What is the
shape of this histogram? Bell-shaped (symmetric)
[3] b. What is the value on the horizontal axis around which the histogram seems to be
symmetric? x=0
[3] c. Use Minitab to find the sample mean x̄ the standard deviation s for these data
X̄ = 0.00615
s = 0.98863
[3] d. What are the mean µ and the standard deviation σ of the standard normal distribution? 0 and 1, respectively
3. Standardization procedure: Generate and store in column c4 10,000 values from the
normal distribution with µ = 6.5 and σ = 3 as follows:
random 10000 c4;
normal 6.5 3.
a. Print (and include in your assignment) the histogram for these data [1]. What is the
value on the horizontal axis around which the histogram seems to be symmetric? x =6.5
[2]
Construct and store in column c5 the data set zi (i = 1, . . . , 10, 000) obtained from the
previously generated data set xi by the standardization procedure zi = (xi − µ)/σ by
typing:
let c5=(c4-6.5)/3
b. Print (and include in your assignment) the histogram for the zi s [1]. Around which
value does it seem to be symmetric? x=0[1] What are the sample mean and standard
deviation z̄ and s for this new data set?-0.0018 and 1.0021 respectively[2] Why are
they close to 0 and 1? Since if X has normal (µ, σ) then Z := (X − µ)/σ will have
normal (0,1). [2]
4. Central limit theorem (CLT) at work (You can use a new Minitab worksheet). Generate and store in columns c3-c902 100 samples, of size n = 900 each, from Poisson
distribution with parameter µ = 9 as follows:
random 100 c3-c902;
poisson 9.
Note This may take a few moments as you are generating 900x100=90,000 values
Create and store in column c1 the 100 values of x̄ based on the 100 samples of same size
n = 900 as follows:
rmean c3-c902 c1
a. Print (and include in your assignment) the boxplot of c3 [1]. According to the position
2
of the median, what can you conclude about the shape of this data set? Rather symmetric, as the median seems to be in the middle of the box. Note: Your graph
might also be skewed to the right.[2]
[3] b. Use desc command to find sample mean and sample standard deviation of c3 8.890
and 3.203, respectively.
[3] c. Print (and include in your assignment) the boxplot for the data in column c1. What
can you conclude about the shape of data in c1? Fairly symmetric as its median is
very close to the mean=9
[3] d. Use desc to find sample mean and sample standard deviation of c1 9.016 and .095,
respectively. Are they close to 9 and 3/30? Yes. .095. close to .1 Why?According
to CLT, X̄ will be approximately normally distributed with mean µ and stan√
dard deviation σ/ n = 3/30, since for Poisson (9), σ 2 = µ = 9, and the sample
sizes are equal 900.
5. Confidence interval (CI) for a mean: We want to build 100 confidence intervals (CIs)
with confidence level (1 − α)100% = 95% for the mean µ of a Poisson distribution via the
following steps:
Step 1. Generate and store in columns c6-c405 100 samples of size 400 each from Poisson
with parameter µ = 4 as follows:
random 100 c6-c405;
poisson 4.
Step 2. Use columns c4 and c5 to store respectively the means and the standard deviations of the 100 samples you generated in step 1, as follows:
rmean c6-c405 c4
rstd c6-c405 c5
Step 3. Store the lower bound and the upper bound of your 95% CIs in c2 and c3 respectively by typing successively:
let c2=c4-1.96*c5/20
let c3=c4+1.96*c5/20
Then create a column c1 containing 1 or 0 according to whether the corresponding interval
[c2 , c3] covers µ or not, by typing:
let c1=(c2 >= 4 and c3 <= 4)
Finally sum up the entries of column c1 to find how many CIs cover the value µ = 4 by
typing:
tally c1
[3] a. What is the percentage of confidence intervals that contain the true value µ = 4?96%
[3] b. How do you compare this percentage to the confidence level 95%?The two percentages are very close.
3
Part II. Long-answer questions
1. The variable X has binomial distribution with parameters n = 50, 000 and p = 1/1, 000.
Hence we can √approximate it by a normal distribution with µ = np = 50 and σ =
p
np(1 − p) = 49.95 = 7.07. We get
a.
!
X − 50
60 − 50
≥
P (X ≥ 60) = P
= P (Z ≥ 1.41) = 1−P (Z < 1.41) = 1−.9207 = .0793
7.07
7.07
.b As the probability of observing 60 or more is very small, we would say that observing
60 children with genetic defect is rather unusual. If we observe 60, we might cast doubt
as to the ratio of children affected. May be it is higher than 1 per 1,000.
√
2. a. X̄ has mean µ = 106 and standard deviation σ/ n = 12/6 = 2.
b. Since n = 36 > 30, we can use the CLT (central limit theorem) and consider X̄ as
normally distributed.
!
110 − 106 X̄ − µ
110 − µ
√
√
=P Z>
P (X̄ > 110) = P
>
= 1−P (Z ≤ 2) = 1−.9772 = .0228
2
σ/ n
σ/ n
c.
P (|X̄ − µ| ≤ 4) = P (|Z| ≤ 2) = P (Z ≤ 2) − P (Z ≤ −2) = .9772 − .0228 = .9544.
3. a. As n = 50 > 30, then by the CLT, X̄ can be considered as√
having normal distribution
√
with mean µ = 68, 500 and standard deviation σ/ n = 3500/ 50 = 495 dollars.
√
b. Since Z = (X̄ − µ)/(σ/ n) has approximately standard normal distribution, with
√
probability .95, we would expect X̄ to fall in µ ± 1.96σ/ n = 68, 500 ± 1.96(495) =[67530
, 69470].
c.
!
70, 000 − 68, 500
P (X̄ ≥ 70, 000) = P Z ≥
= 1 − P (Z < 3.03) = 1 − .9988 = .0012.
495
d. Yes, this would be rather unusual, since, according to part c), the odds for that
happening are very slim. If we happen to observe a sample mean of $70,000, we might
conclude that the true average salary is above $68,500.
4. The sample mean and sample standard deviation are respectively X̄ = 4.3 and s = 2.6.
The sapmle size is n = 40 > 30. We want an 80% confidence interval (CI). Hence α = .2
and zα/2 = z.1 = 1.28 (from normal table). Hence the desired CI is
2.6
s
= [3.78 , 4.82]
X̄ ± 1.28 √ = 4.3 ± 1.28
6.320
n
4
5. We have n = 600, p̂ = 250/600 = .42 and 1 − p̂ = .58. here α = .05 and therefore
zα/2 = 1.96. The 95% CI for p is then given by
r
p̂ ± 1.96
p̂(1 − p̂)
= .42 ± 1.96
n
r
.42 × .58
= [.38 , .46].
600
6. For 90% CI, zα/2 = 1.64. Note to TAs: please also accept any of the two other approximations of zα/2 = 1.645 or zα/2 = 1.65. For 98% CI, zα/2 = 2.33. p̂ = 25/200 = .125 and
1 − p̂ = .875. We get the desired CIs as follows:
a.
r
.125 × .875
= [.086 , .163]
.125 ± 1.64
200
b.
r
.125 × .875
= [.071 , .179]
.125 ± 2.33
200
The last CI is wider since it has larger confidence level: 98% instead of 90%.
5
6
7
8
9
10
11