Download Exam 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Your first and last name:___________________________________ SSN:________________________
CODE:______________________ if you want me to post your grade on the WEB. Write either
numeric or alphanumeric 6 characters.
STAT 211
SUMMER 2002
You have 90 minutes to complete this exam. You can only use your own calculator and t, z, chisquare
tables. There is penalty of 5 pt. if you separate your exam with any reason.
There are 40 questions (100 pts. total) and there is no partial credit on this exam.
If you did not mark your final answer on your scantron, your answer will be counted incorrect.
If you did not mark your exam form on the scantron , the lowest grade of two forms will be assigned by the
computer and your grade will not be corrected later.
If you caught cheating, you will get a grade of zero. Good Luck.
EXAM 2 - FORM A
1.
How would you determine if the given data come from an exponential distribution?
(a) I would graph data, x versus any f(x) to see if they make a symmetric graph.
(b) I would graph data, x versus any f(x) to see if they make a 45 line.
(c) I would graph the ordered data with their expected normality values to see if they make a
symmetric graph.
(d) I would graph the ordered data with their expected normality values to see if they make a 45 line.
(e) I would graph ordered data versus exponential data values computed using cumulative percentiles
to see if they make a 45 line.
2.
If time between failures are exponentially distributed with =2, what is the median time between
failures?
(a) -0.6932
(b) -0.3466
(c) 0.5
(d) 0.3466
(e) 0.6932
3.
If time between failures are exponentially distributed with =2, what is the probability that the next
failure occur in less than 1 minutes?
(a) 0.0183
(b) 0.1353
(c) 0.5144
(d) 0.8647
(e) 0.9817
4.
If E(X)=5 and Var(X)=10, find the variance of Y=5X-2.
(a) 25
(b) 48
(c) 125
(d) 248
(e) 250
5.
If X1 and X2 are normally independently distributed random variables with the mean 3 and the
variance 9, What is the probability that
(a) 0.1587
(b) 0.6826
(c) 0.8413
(d) 0.9544
(e) 0.9772
( X 1  X 2 ) / 3 is between  2 and
2?
Let X be a continuous random variable with the legitimate probability density function (pdf),
kx,
f ( x)  
0,
0  x 1
.
otherwise
Answer questions 6 to 8 using the information above.
6. Which of the following makes f(x) a legitimate pdf?
(a) k=0.25
(b) k=0.5
(c) k=1
(d) k=1.41
(e) k=2
7.
Which of the following is the F(x) (cumulative distribution function) when 0  x  1 using the
legitimate pdf?
(a) k
(b) kx
(c) kx
2
2
(d) kx / 2
(e) 1
8.
Which of the following is the expected value of Y=X+2 using the legitimate pdf?
(a) kx
(b) kx +2
(c) (k  4) / 2
(d) ( k  3) / 3
(e) ( k  6) / 3
9.
Suppose that you are interested in monitoring air pollution in Los Angeles, California, over one-week
period. Let X be a random variable that represents the number of days out of seven on which the
concentration of carbon monoxide surpasses a specified level. Which of the following distributions
explains this random variable the best?
(a) Binomial Distribution
(b) Hypergeometric Distribution
(c) Poisson distribution
(d) Exponential Distribution
(e) Geometric distribution
10. The number of cases of tetanus reported in the United States during a single month in 1989 has a
expected number of cases 4.5. Let X be a random variable that represents the number of cases of
tetanus will be reported in 3 months. Which of the following distributions explains this random
variable the best?
(a) Binomial Distribution
(b) Hypergeometric Distribution
(c) Poisson distribution
(d) Exponential Distribution
(e) Geometric distribution
11. Which of the following is correct?
(a) Prediction intervals are narrower than confidence intervals.
(b) Prediction intervals demonstrates the confidence interval for the true mean.
(c) Prediction intervals are wider than confidence intervals.
(d) Prediction intervals demonstrates the confidence interval for the true proportion.
(e) Tolerance interval and the prediction interval are the same.
Among females in the United States between 18 and 74 years of age, diastolic blood pressure is normally
distributed with mean =77 mm Hg and standard deviation =11.6 mm Hg. Use this information to answer
questions 12 to 17.
12. What is the probability that a randomly selected woman has a diastolic blood pressure less than 60 mm
Hg?
(a) 0.0708
(b) 0.1470
(c) 0.5241
(d) 0.8530
(e) 0.9292
13. What is the probability that a randomly selected woman has a diastolic blood pressure more than the
mean?
(a) 0
(b) 0.25
(c) 0.50
(d) 0.75
(e) 1
14. What is the probability that a randomly selected woman has a diastolic blood pressure between 60 and
90 mm Hg?
(a) 0.1314
(b) 0.1470
(c) 0.7216
(d) 0.7978
(e) 0.8686
15. What is the lowest 2.5% of the diastolic blood pressure?
(a) 54.264
(b) 57.918
(c) 96.082
(d) 99.736
(e) 106
16. If you choose a random sample of 100 women, what is the probability that total diastolic blood
pressure less than 7500 mm Hg?
(a) 0.0172
(b) 0.0427
(c) 0.0953
(d) 0.9528
(e) 0.9828
17. If you choose a random sample of 100 women, what is the probability that average diastolic blood
pressure less than 75 mm Hg?
(a) 0.0172
(b) 0.0427
(c) 0.0953
(d) 0.9528
(e) 0.9828
18. Which of the following is incorrect?
(a) Chisquare distribution is used to construct the confidence interval for the variance or the standard
deviation.
(b) T and Z distributions are used to construct the confidence interval for the mean values.
(c) No assumptions are necessary for us to use the chisquare distribution
(d) Large sample rules should be checked out for us to use Z distribution to construct the confidence
interval for the population proportion.
(e) Exactly one of the above is incorrect.
_
19. What is the confidence level for the interval
x  2.33 

?
n
(a) 0.90
(b) 0.95
(c) 0.98
(d) 0.99
(e) 0.999
20. Find the value of c for P(Z  c)=0.3669 where z is the standard normal random variable.
(a) -0.34
(b) -0.36
(c) 0.34
(d) 0.36
(e) 0.37
21. Find the value of c for P(-c  Z  c)=0.7498 where z is the standard normal random variable.
(a) –1.15
(b) –0.67
(c) 0.49
(d) 0.67
(e) 1.15
22. Which of the following can be the critical values for constructing the confidence interval for the true
variance with the confidence level 95% and the sample size 19?
(a) 8.231 and 31.526
(b) 8.906 and 32.852
(c) 9.39 and 28.869
(d) 9.591 and 34.170
(e) 10.117 and 30.143
_
23. If the lower confidence bound for testing the true mean is
x 2.65 
s
, which of the following is
14
the confidence level.
(a) More than 0.005 and less than 0.01
(b) 0.01
(c) 0.99
(d) More than 0.99 and less than 0.995
(e) 0.995
24. If the 95% confidence interval for testing the true mean is (128, 150), which of the following sample
means might have been used to construct this interval?
(a) 11
(b) 22
(c) 130
(d) 139
(e) 278
25. If the 95% confidence interval for testing the true mean is (128, 150), which of the following values
might be the bound on the error estimation used to construct this interval?
(a) 11
(b) 22
(c) 130
(d) 139
(e) 278
A joint pdf for x and y is defined by
1
 x(1  y ),
f ( x, y )   3
0,
if
0 x2
and
0  y 1
otherwise
Answer questions 26 to 29 using the information above.
26. Which of the following is the marginal pdf for X?
(a) x/2 for all x
(b) x/2 when 0<x<1
(c) x/2 when 0<x<2
(d) 2x when 0<x<2
(e) y+x when 0<y<1
27. Which of the following is the f( y | x)?
(a) 2(1+y)/3 when 0<y<1
(b) 2(1+y)/3 when 0<x<2
(c) x(1+y)/3 when 0<y<1
(d) x(1+y)/3 when 0<x<2
(e) none of the above
28. Are X and Y independent?
(a) Yes because f(x,y)=f(x)f(y) for all x and y
(b) No because f(x,y)f(x)f(y) for all x and y
(c) Yes because f(x,y)f(x)f(y) for all x and y
(d) No because f(x,y)=f(x)f(y) for all x and y
(e) It is not possible to determine
29. Which of the following is the covariance between X and Y?
(a) –0.504
(b) 0
(c) 0.504
(d) 0.910
(e) 1
30. Let X1 and X2 be a random sample of size 2 with the mean 5 and the variance 4, which of the following
is the variance of Y=2X1-X2 where Corr(X1,X2)=0.5?
(a) 2
(b) 4
(c) 12
(d) 16
(e) 20
31. In estimating the mean time taken by a sheetrocker to nail one 8-foot sheet already in place, a sample
of 25 observations were taken at random times. The worker was found to be nailing during just 5 of
these observations. During the 200 minute span of time covering the observations, the sheetrocker
hung 20 sheets. Which of the following is the point estimate of the mean time per sheet to do nailing?
(a) 2
(b) 4
(c) 8
(d) 20
(e) 40
32. In estimating the mean time taken by a sheetrocker to nail one 8-foot sheet already in place, a sample
of 25 observations were taken at random times. The worker was found to be nailing during just 5 of
these observations. Which of the following is the point estimate of the proportion of all working time
spent nailing?
(a) 0.05
(b) 0.20
(c) 0.25
(d) 5
(e) 25
33. We would like to estimate the true proportion using confidence intervals. In which case, the test
procedure is valid using normal distribution?
^
(a) n=200,
p =0.03
(b) n=200,
p =0.97
^
^
(c) n=1000,
p =0.02
(d) n=1000,
p =0.0025
(e) n=100,
p =0.012
^
^
34. Which of the following does change the width of a large sample confidence interval for ?
_
(a)
(b)
(c)
(d)
(e)
x.
The standard deviation of the population.
The confidence level.
The sample size.
All of the above except (a)
35. You wish to estimate the mean contents in a shipment of 1000 cans of asparagus. After weighing a
sample of 200 cans, you compute the sample mean 15.9 ounces and the standard deviation 0.3 ounces.
Which of the following is the corresponding 95% confidence interval for the mean contents in a
shipment of 1000 cans of asparagus?
(a) (15.8584 , 15.9416)
(b) (15.8651 , 15.9349)
(c) (15.8814 , 15.9186)
(d) (15.8844 , 15.9156)
(e) need more information to answer this question
36. Which of the following is the conservative sample size to estimate the mean contents in a shipment of
cans of asparagus to within 0.1 ounces with 95% confidence where the population standard deviation is
0.4?
(a) 43
(b) 44
(c) 60
(d) 61
(e) 62
37. The proportion of voters in a large state approving a referendum is to be estimated. On July 1, 75 out
of 130 persons sampled approved of the referendum. Which of the following is the point estimate for
the true proportion of all voters not approving on July 1?
(a) 0.25
(b) 0.42
(c) 0.58
(d) 0.75
(e) 0.95
38. The proportion of voters in a large state approving a referendum is to be estimated. On July 1, 75 out
of 130 persons sampled approved of the referendum. Which of the following is the 95% confidence
interval for the true proportion of all voters approving?
(a) (0.352 , 0.494)
(b) (0.492 , 0.662)
(c) (0.506 , 0.648)
(d) (0.576 , 0.703)
(e) (0.665 , (0.814)
39. If the 95% confidence interval for the true average change of temperature is computed as (79F,99F)
and I claim that the true average change of temperature is 85F, am I right based on the given
confidence interval?
(a) Yes
(b) No
40. If I am interested in the number of children goes to school (X) in families of size 20 and the number of
children goes to work in the same families (Y). Which of the following may be the values max(X,Y)
can take?
(a) 0 to 20 (integers)
(b) 1 to 20 (integers)
(c) 0 to infinity (integers)
(d) Any number from 0 to 20
(e) Any number from 1 to 20
Answer Key:
1.e
2.d
13.c
14.d
25.a
26.c
37.b
38.b
3.d
15.a
27.a
39.a
4.e
16.b
28.a
40.a
5.b
17.b
29.b
6.e
18.c
30.c
7.d
19.c
31.a
8.e
20.c
32.b
9.a
21.e
33.c
10.c
22.a
34.e
11.c
23.c
35.a
12.a
24.d
36.e
FORMULAS
Binomial Distribution: Approximate probability model for sampling without replacement from a finite
dichotomous population. X~Binomial(n,p).
 n fixed trials
 each trial is identical and results in success or failure
 independent trials
 the probability of success (p) is constant from trial to trial

X is the number of successes among n trials
n
P( X  x)     p x  (1  p) nx ,
 x
E(X) = np
and
x  0,1,2,...., n
Var(X) = np(1-p)
Hypergeometric Distribution: Exact probability model for the number of successes in the sample.
X~Hyper(M,N,n)
 M  N  M 
 

x
n

x
,
P( X  x)   
N
 
n 
max( 0, n  N  M )  x  min( n, M )
Let X be the number of successes in the sample, n be the sample size, N be the population size, and M be
the number of successes in the population
M
M
where
is the proportion of successes in the population.
N
N
N n
N n
M  M
 n   1   where
Var(X) =
is the finite population correction factor.
N 1
N 1
N 
N
E(X) = n 
Poisson Distribution: The probability of an arrival is proportional to the length of waiting time.
P( X  x) 
e   x
, x  0,1,2,3,...........,
x!
 0
: intensity parameter (mean rate, expected number of occurrences).
X : number of occurrences per given period

i
i 0
i!
e  
Note that
and E(X)=Var(X)=.
Continuous probability distribution, f(x) is legitimate
(i)
f(x)  0, for all x

(ii)
 f ( x)dx  1 = area under the entire graph of f(x).

x
Cumulative Distribution Function for continuous X (cdf) : F(x)= P ( X  x) 
 f ( y)dy.

F(-)=0, F()=1, P(a  X  b)  P(a  X  b)  F (b)  F (a ) , P(X>a)=P(Xa)=1-F(a)
Obtaining f(x) from F(x) : If X is a continuous r.v. with pdf f(x) and cdf F(x), then at every x at which the
derivative exists, F`(x)=f(x).
Percentile of a continuous distribution: Let p be a number between 0 and 1. the (100p)th percentile of
the
distribution
of
a
continuous
r.v.
X,
denoted
by
r(p),
is
defined
by
r ( p)
p  F (r ( p ))  P( X  r ( p)) 
 f ( y)dy.
where P(Xmedian)=P(X>median)=0.50

Expected value for the continuous random variable, X:
  E( X ) 

 x  f ( x)dx.

Variance for the continuous random variable, X:
 2  Var( X )  E ( X 2 )   2

E( X 2 ) 
where
x
2
 f ( x)dx.

1
,
ba
X 
Uniform Distribution: X ~U[a,b] then f ( x) 
Normal Distribution: X ~ N ( , 2 ) and z 
axb
~ N (0 , 1)

Normal Approximation to the Binomial Distribution:
Let X be a binomial r.v. based on n trials with success probability p. Then if the binomial probability
histogram is not too skewed, X has approximately a normal distribution with  = np and
 x  0.5  np 
 (check if np10 and n(1-p)10 to use the

 np(1  p) 
  np(1  p) then P( X  x)  
formula).
The Gamma Distribution:X~ Gamma (  ,  ) then (1 / 2) 

( )   x  1 e  x dx,
 , E ( X )   , Var ( X )   2
 0
0
 (  1)  (  1),
 1
 (  1)!,
 is any positive integer


If =1 then it is called standard gamma distribution.
When the random variable is a standard gamma r.v. then the cdf is called the incomplete gamma
function (Appendix Table A.4). F(x;,)=F(x/;)

If  = 1 then it is Exponential(=1/).
f ( x)    e    x , x>0 and P( X  x*)  1  e x* .
Continuous Data: If f(x,y) is the joint pdf for x and y, the marginal pdf for x can be computed as
 p( x, y )dy and the marginal pdf for y can be computed as  p( x, y)dx .
y
E(X)=
x
  x  f ( x, y)dydx   x  f ( x)dx
x y
E(XY)=
and E ( X ) 
2
x
  x  y  f ( x, y)dydx
x
2
 f ( x, y )dydx   x 2  f ( x, y )dx
x y
and
E(h(X,Y))=
x y
x
  h( x, y)  f ( x, y)dydx
x y
f(x | y)=f(x,y) / f(y) where f(y)>0 and E(X|Y)=
 x  f ( x | y)dx
x
Cov(X,Y)=E(XY) -E(X)E(Y) and
Corr(X,Y)=
Cov( X , Y )
Var ( X ) Var (Y ))
Var(aXbY)=a Var(x)+b Var(Y) 2abCov(X,Y)
2
2
Random Sample:The random variables X1, X2, ….,Xn are said to form a random sample of size n if
(i) The Xi's are independent random variables.
(ii) Every Xi's has the same probability distribution.
If X1, X2, ….,Xn are said to form a random sample of size n with the mean  and the variance 2, the sampling
n
_
distribution of
x
has the mean  and the variance 2/n, the sampling distribution of
x
i 1
variance n2, and so on.
i
has the mean n and the
Point estimate of a parameter : single number that can be regarded as the most plausible value of . A point
^
estimator,

^
=  + error of estimation.
^

is an unbiased estimator of  if E(  )=  for every possible value of .
^
Otherwise, it is biased and Bias = E(  )- .
Minimum Variance Unbiased Estimator (MVUE): Among all estimators of  that are unbiased, choose the one that has
^
minimum variance. The resulting

^
^
The Invariance Principle: Let
is MVUE.
^
 1 .,  2 ,...,  m
 1 ,  2 ,...,  m .
be the MLE's of the parameters
^
^
Then the MLE of any
^
function h(  1 ,  2 ,...,  m ) of these parameters is the function h(  1 .,  2 ,..., 
) of the MLE's
m
Confidence Interval for a Population Mean, 
Suppose that the parameter of interest is the population mean,  and that
a. the population distribution is normal
b. the value of the population standard deviation  is known
Let
X1,
X2,
....,Xn
be
a
_
 _
 
 x  z / 2
, x  z / 2

n
n

random
sample.
Then
100(1-)%
confidence
where
z / 2

_
. I mean
n
x
will be within
The sample size required to estimate a population mean  to within an amount B= z / 2
 z / 2 

 B 
 2 z / 2 
n= 

 w 

z / 2

is
of .
n
with 100(1)%
n
2
.

for


x 


P  z / 2 
 z / 2   1  
/ n




Choosing the sample size: Bound on the error estimation is
confidence is n= 
interval
_
The same formula can be written using the interval width, w= 2 z / 2

then
n
2
.
Large Sample Confidence Interval for 
Suppose that the parameter of interest is the population mean,  and that
a. X1, X2, ...,Xn is a random sample from a population distribution with mean,  and standard deviation, .
_
For the large sample size n, the CLT implies that x has approximately a normal distribution for any population
distribution.
The value of the population standard deviation  may not be known. Instead, the value of the sample standard
deviation s may be known.
b.
c.
If
n
is
sufficiently
large

s
s 
 x  z / 2
, x  z / 2

n
n

_
(n>40),
_
100(1-)%
where
large
sample
confidence
interval

s
s 
P x  z / 2
   x  z / 2
  1  
n
n

_
_
A General Large Sample Confidence Interval
for

is
When the estimator satisfies the following properties, the confidence interval can be constructed.
a. The estimator has approximately a normal population distribution
b. It is at least unbiased
c. standard deviation of the estimator is known
Large Sample Confidence Interval for a population proportion, p
^
If n is sufficiently large ( n
^

 p  z / 2


^
^
p  10
p(1  p) ^
, p  z / 2
n
^


n1  p   10 ),


^
^ 
p(1  p) 

n


and
100(1-)% large sample confidence interval for p is
^
Choosing the sample size: Bound on the error estimation is
^
^
.
I mean
p
will be within
^
p(1  p)
n
z / 2
z / 2
^
p(1  p)
n
of p. The sample size required to estimate a population proportion p to within an amount
^
^


z2 / 2 p1  p 
p(1  p)

 . The same formula can be written using
B= z / 2
with 100(1)% confidence is n=
2
n
B
^
^


^
^
4 z2 / 2 p1  p 
p(1  p)

 . The conservative sample size can be found
the interval width, w= 2 z / 2
then n=
n
w2
^
^
when
^
^
p = 1  p =0.5
Intervals based on a Normal Population Distribution:
When the sample size is small and the population of interest is normal, so that X 1, X2, ...,Xn constitutes a random
sample from a normal distribution with both  and  unknown. 100(1-)% confidence interval for  is
_
s _
s 
 x  t / 2;n 1
, x  t / 2;n 1

n
n

.
Prediction Interval for a Single Future Value:
Let X1, X2, ...,Xn be a random sample from a normal population distribution and we wish to predict the value of X n+1, a
single future observation. 100(1-)% prediction interval for Xn+1 is
_
1 _
1
 x  t / 2;n 1 s 1  , x  t / 2;n 1 s 1  

n
n 

Confidence Intervals for the Variance, 2 and Standard Deviation,  of a Normal Population :
The population of interest is normal, so that X 1, X2, ...,Xn constitutes a random sample from a normal distribution with
parameters  and 2. Then 100(1-)% confidence interval for 2 is
 (n  1)  s 2 (n  1)  s 2

, 2
 2
  / 2;n 1  1 / 2;n 1

.

