Download HW#1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Regression toward the mean wikipedia , lookup

Taylor's law wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Homework #1
Name:
1.
_
Date:
_
An investigator decides to characterize the average lifetime of light bulbs for the
company. To carryout this characterization, s/he installs 10 light bulbs into an
apparatus that connects each bulb to a timer. When the bulb burns out, the timer
stops. Unknown to the investigator, the distribution of lifetimes for the
population is exponential with mean 30 days and standard deviation (SD) 30 days.
Assuming the investigator carries out the experiment until all of the bulbs burn
out, s/he obtains the following data (in days): T={41, 17, 42, 21, 13, 11, 17, 47,
71, 19}.
(A) Calculate the mean and SD for this dataset. Do they seem close to the population
values (no need to use a statistical analysis here; intuition will suffice)?
(B) Suppose the investigator is told by management that s/he has three weeks to
characterize the lifetimes, so the observed lifetimes can only be seen up to 21
days. The data observed in this case are T={21*, 17, 21*, 21, 13, 11, 17, 21*,
21*, 19}. A star indicates that the bulb was still burning at the end of the study.
Calculate the mean and SD for this dataset. How do these values compare to
those in part (A)? Would you consider these estimates to be reliable for the
population parameters?
(C) After consulting results of past experiments that estimated the lifetimes of light
bulbs, s/he decided to focus on those light bulbs that had observed failure times
while ignoring the censored observations. The reduced dataset now being used is
T={17, 21, 13, 11, 17, 19}. Calculate the mean and SD for this dataset. The
investigator believed that using the observed failure times would provide a more
accurate estimate of the parameters of interest. What are your thoughts about this
method?
(D) After reading more carefully into the methods used to estimate the mean and SD
from the previous results, s/he discovers an alternative approach:
n
t

i 1 i
ˆ  ˆ 
events
where all of the times are summed, regardless if they are censored, but only the
number of events is used in the denominator. The SD is the same estimate. Try
using this method and state your thoughts.
2.
Use SAS to do the following:
(A) Create a dataset used in part (A) of 1 and use “Proc Means” to obtain the mean
and SD. Print the results and attach to HW.
Homework #1
(B) Create a dataset used in part (B) of 1 and use “Proc Means” to obtain the mean
and SD. Print the results and attach to HW.
(C) Create a dataset used in part (C) of 1 and use “Proc Means” to obtain the mean
and SD. Print the results and attach to HW.
(D) Create a dataset from part (A) of 1 that has the variables t2 = t + 5 and t3 = t 2.
Print this dataset.
3.
Many inferential techniques in survival analysis use theoretical asymptotic results.
In many cases, the distributions of a statistic approaches a theoretical distribution
regardless of the underlining population distribution when the sample size is
large. One example is the Central Limit Theorem (CLT). Given a population
mean and SD equal to  and  respectively, the CLT states (in simplified terms)
that the sample mean of n individuals is distributed normally with population
mean  and variance 2/n, i.e. N(,2/n) when n is sufficiently large. Use
simulation to look at the estimates of the true distribution of the sample mean and
compare it to the theoretical distribution under CLT. Then state whether you
believe the distribution is relatively normally distributed:
Distribution
Exp(=1)
Uniform(0,1)
Gamma(2,1)
Population Mean, 
1
0.5
2
Population 
1
0.288675
1.414213
Sample Size, n
1, 2, 10, 20, 50
1, 2, 3, 4, 5
1, 5, 10, 20, 50
Hint: Use the following SAS program and modify as needed using n for the sample size,
x for the population distribution, and normal(mu= sigma=)for the theoretical
distribution of the sample mean when n is sufficiently large. If the normal density is
close to the histogram, n is sufficiently large so that CLT applies.
data three;
n=1;
do i=1 to 5000;
sm=0;
do j=1 to n;
x=ranexp(-1);
sm+x;
end;
xbar=sm/n;
output;
end;
run;
proc univariate noprint data=three;
histogram xbar /normal(mu=1 sigma=1);
run;
Homework #1
Exp(=1) is provided with x=ranexp(-1)
Uniform(0,1) is provided with x=ranuni(-1)
Gamma(2,1) is provided with x=rangam(-1,2)