Download assignments given so far

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
ASSIGNMNETS IE504 – FALL 2007
PS 1: Probability (due: Monday 1st of October)
1) For married couples living in a certain suburb the probability that the husband will vote on a bond referendum is 0.25, the
probability that his wife will vote in the referendum is 0.32, and the probability that both the husband and wife will vote is
0.15. What is the probability that
a) At least one member of a married couple will vote?
b) A wife will vote, given that her husband will vote?
c) A husband will vote, given that his wife does not vote?
d) Are the events “the wife will vote” and “the husband will vote” independent?
2) Suppose that the four inspectors at a film factory are supposed to stamp the expiration date on each package of film at the
end of the assembly line. John, who stamps 25% of the packages, fails to stamp the expiration date once in every 250
packages; Tom, who stamps 40% of the packages, fails to stamp the expiration date once in every 100 packages; Jeff, who
stamps 25% of the packages, fails to stamp the expiration date once in every 90 packages; and Pat, who stamps 10% of the
packages, fails to stamp the expiration date once in every 200 packages. If a customer complains that her package of film
does not show the expiration date, what is the probability that it was inspected by John?
3) A truth serum has the property that 90% of the guilty suspects are properly judged while, of course, 10% of guilty suspects
are improperly found innocent. On the other hand, innocent suspects are misjudged 2% of the time. If the suspect was
selected from a group of suspects of which only 5% have ever committed a crime, and the serum indicates that he is guilty,
what is the prob. that he is innocent?
4) The probability that a patient recovers from a delicate heart operation is 0.85. What is the probability that
a) Exactly 2 of the next 3 patients who have this operation survive?
b) All of the next 3 patients who have this operation survive?
c) What is a necessary assumption that we can solve part a) and b) of this question.
5) In a certain federal prison it is known that 2/3 of the inmates are under 25 years of age. It is also known that 3/5 of the
inmates are male and that 5/8 of the inmates are female or 25 years of age or older. What is the probability that a prisoner
selected at random from this prison is female and at least 25 years old?
6) From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn in succession, each ball being
replaced in the box before the next draw is made. What is the probability that
a) All 3 are the same colour?
b) Each color is represented?
7) From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn in succession without replacement.
What is the probabilty that the third ball drawn is black?
8) If the events A and B are independent and the events A and C are independent what can we say about the events B and C?
Proof that your assertion is correct.
9)
(*)From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn with replacement.
a) Find the probability function f(x) of the random variate X = “Sum of red balls drawn”.
b) Calculate F(2) and F(0.17).
10) From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn without replacement.
a) Find the probability function f(x) of the random variate X = “Sum of red balls drawn”.
b) Calculate F(2) and F(0.17).
11) For the discrete R.V. X with f(0) = 0.4, f(1) = 0.2, f(2) = f(3) = f(4) = f(5) = 0.1 find
a) P(1 ≤ X < 6) b) the expectation, c) the variance .
12) The waiting time, in hours, between successive speeders spotted by a radar unit is a continuous random variable with
cumulative distribution
0
x≤0
F(x)=
1 − e−3x x > 0
Find the probability of waiting less than 12 minutes between successive speeders
a) using the cumulative distribution of X;
b) using the probability density function of X.
13) Consider the density function
kx
0<x<4
f(x)=
0
elsewhere
a) Evaluate k,
b) Find F(x) and use it to evaluate P(0.3 < X < 0.6).
c) Find the expectation and the variance of the random variable.
14) A continuous random variable has density
1+x for -1 < x ≤ 0
f(x) = 0.5 for 0 < x ≤ 1
0
else
a)
b)
c)
d)
Check that f is a density.
Find the CDF F(x).
Compute the probability P(- 0.2 < X ≤ 0.5).
Compute the expectation.
15) Consider the (continuous) random variate
X=”time in minutes that it takes a randomly selected student of our class to solve question 10”.
Assume that the time is always between 0.5 and 10 minutes and most students work about 3 minutes.
a) Construct a function that can be used as density for the random variate X.
b) Calculate its CDF.
16) 3 balls are selected at random from an urn with 3 blue, 2 red and 1 green ball. Let X be the total number of blue balls
selected and let Y be the number of red balls selected.
a) Find the joint probability mass function f(x,y).
b) Find the marginal distribution of X.
c) Find the conditional distribution of Y given that X is equal to 1.
d) Are X and Y independent?
e) Find: E(X), E(Y), V(Y), E(XY+X^2)
f) Find: E(Y | X=1), V(Y | X=1) and E(X | Y=0)
g) Find Cov(X,Y)
17) A continuous random variable has density
1+x for -1 < x ≤ 0
f(x) = 1 for 0 < x ≤ 0.5
0
else
a) Check that f is a density.
b) Find the CDF F(x).
c) Compute the expectation and variance of X.
d) Find the conditional density of X: f(x|X < -0.5)
e) Find the conditional expectation of X: E(X| X < -0.5)
18) Let X be a random variable with the following probability distribution:
x
-3
6
9
f(x)
1/6
1/3
½
Find E(X) and E(X2) and then, using these values evaluate E(2X+1)2.
19) If X and Y are independent random variables with expectations μx = 1 and μy = 2 and variances 2x=5 and 2y=3, find the
expectation and variance of the random variable Z= -2X + 4Y - 3.
20) Repeat question 30 for the case that X and Y are not independent and xy = 1.
21) Suppose that X and Y have the following joint probability function:
f(x,y) x:
2
6
1
0.1
0.15
y 3
0.25
0.25
5
0.1
0.15
a) Find the covariance of X and Y
b) Find the expected value of g (X,Y) = XY2
c) Find x and y.
d) Find the correlation of X and Y
e) Find the conditional distribution of Y: f ( y |X=6)
f) Compute E( Y | X = 6) and E( X | Y = 3)
g) Are X and Y independent?
22) Out of an urn with 5 blue, 3 red and 2 white balls you are sampling two balls with replacement. Let X denote the number
of blue, Y the number of red and Z the number of white balls.
a) Find the marginal distribution of Y.
b) Find the joint distribution of X and Y.
c) Find the conditional distribution: f(X|Y=1)
d) Find the conditional expectation E(X|Y=1)
Due date for questions 23 to 42: PS Friday 19.10.
23) It is known that 30 % of mice inoculated with a serum are protected from a certain disease. If 4 mice are inoculated find
the probability that
a) none contracts the disease,
b) fewer than two contract the disease
c) more than 3 contract the disease.
24) According to a genetics theory, a certain cross of guinea pigs will result in red, black and white off-spring in the ratio
8:4:4. Find the probability that among 8 offspring 4 will be red, 2 black, and 2 white.
25) From a lot of 10 missiles, 4 are selected at random and fired. If the lot contains 4 defective missiles that will not fire, what
is the probability that among the 4 selected
a) all 4 will fire? b) at most 2 will not fire?
26) A scientist inoculates several mice, one at a time, with a disease germ until he finds 2 that have contracted the disease.
If the probability of contracting is 1/6,
a) what is the probability that 8 mice are required?
b) What are the expected value and the standard deviation of the number of required mice.
27) Service calls come to a maintenance center according to a Poisson process and on the average 180 calls per hour.
Find the probability that
a) no more than 4 calls come in any minute.
b) fewer than 2 calls come in any minute.
c) Fewer than 4 calls come in a 5-minute period.
28) In the November 1990 issue of Chemical Engineering Progress a study discussed the percent purity of oxygen from a
certain supplier. Assume that the mean was 99.63 with a standard deviation of 0.08. Assume that the distribution of percent
purity was approximately normal.
a) What percentage of the purity values would you expect to be between 99.6 and 99.7?
b) What percentage of the purity values is more than 0.1 away from the mean?
c) What purity value would you expect to exceed exactly 7% of the population?
d) What purity value is exceeded by exactly 3% of the population?
29) The weights of a large number of miniature poodles are approximately normally distributed with a mean of 9 kilograms
and a standard deviation of 0.7 kilogram. Find the fraction of these poodles with weights
a) over 9.5 kilograms; b) at most 8.6 kilograms; c) between 7.3 and 9.1 kilograms inclusive; d) of 9 kg.
30) A bus arrives every 13 minutes at a bus stop. It is assumed that the waiting time for a particular individual is a random
variable with a uniform distribution. a) What is the probability that the individual waits more than 7 minutes?
b) What is the probability that the individual waits between 2 and 7 minutes?
31) Statistics released by the National Highway Traffic Safety Administration and the National Safety Council show that on
an average weekend night, 1 out of every 20 drivers on the road is drunk. If 1500 drivers are randomly checked next Saturday
night, what is the probability that the number of drunk drivers will be at least 70 but less than 94?
32) In a biomedical research activity it was determined that the survival time in weeks of an animal when subjected to a certain
exposure of gamma radiation has a gamma distribution with = 5 and =10.
a) What is the mean survival time of a randomly selected animal of the type used in the experiment?
b) What is the standard deviation of survival time?
c) What is the probability that an animal survives more than 30 weeks?
33) We assume that the lifetime of an electric bulb follows an exponential distribution with mean value 5000 hours. Find the
lifetimes that are exceeded by the probabilities 50% and 5%.
34) In one hour 5000 cars are passing a certain filling station on a high-way. We assume that a single driver decides
(independently of the others) to enter the station with a probability of 0.02.
a) Compute the probability that less than 77 cars enter the station.
b) Compute the probability that between 90 and 110 cars are entering the station.
35) Among 2000 electric devices 200 are defective. For quality control reasons 100 randomly chosen pieces are tested.
a) Compute the probability that less than 4 of the selected 100 pieces are defective.
b) Compute the probability that between 15 and 20 pieces are defective.
36) A bookshop is selling a certain monthly journal for 5$ per piece and buys it from a publishing house at 3$ per piece. Lets
assume that the number of sold journals per month is Poisson distributed with expectation 120. As the bookshop cannot hand
back journals that were not sold, it has to decide about the number of journals that are ordered every month.
a) Compute the expectation and the variance of the money the bookshop gains when ordering 100 journals.
b) Compute the expectation and the variance of the money the bookshop gains when ordering 120 journals.
c) Try to find the number of ordered journals that maximises the expected gained money. Compute the expectation and
variance of the gained money for that number of orders.
d) Comment on the interpretation of the variance in this example. What will the manager of the bookshop try to do if he does
not want to take too much risk.
37) The lifetime in weeks of a certain type of transistor is known to follow a gamma distribution with mean 10 weeks and
standard deviation 50 weeks.
a) What is the probability that the transistor will last at most 50 weeks?
b) What is the probability that the transistor will not survive the first 10 weeks?
38) The life of a certain type of device has an advertised failure rate of 0.01 per hour. The failure rate is constant and the
exponential distribution applies.
a) What is the mean time to failure?
b) What is the probability that 200 hours will pass before a failure is observed?
39) We are given a Poisson process with rate λ = 0.1 per hour.
a) What is the distribution and the density of the waiting time till the first event occurs.
b) What is the distribution and the density of the waiting time till the 3 rd event occurs.
40) Proof that the sum of three independent exponential random variates (all have mean 1) is gamma-distributed with
parameters α = 3 and β = 1.
41) a) Compute the mean and the variance of the exponential distribution with (λ = 1).
b) Use the result of a) and the result of exercise 49) to calculate the mean and the variance of the Gamma- distribution with
(α = 3; β = 1).
42) We consider a “discrete random walk with drift” defined by: X 0 = 0
have the pmf: f(1) = p; f(0)=(1-p)
a) Find the expectation and the variance of X1 X2 and X3 .
b) Find E(X2| X1 = 1 ) and V(X5| X4 = 1 )
c) Find a general formula for E(Xi+2| Xi = 2 )
Xi+1 = Xi + Bi+1
where all are Bi independent and
For Questions 43 to 58: Due Friday, 2nd of November
43) Let X be a binomial random variable with n=3 and p=1/2. Find the probability distribution of the random variable Y=X 2
44) Let X have a continuous uniform distribution between 0 and 1.
Show that the random variable Y= – 2 lnX has a Gamma distribution.
Find the parameters of the Gamma distribution.
45) A dealer’s profit, in units of $1000, on a new automobile is given by Y=X 1/2, where X is a random variable having the
density function
2(1-x),
0<x<1
f(x)=
0,
elsewhere.
a) Find the probability density function of Y.
b) Using the density of Y, find the probability that the profit will be more than $500 on the next new automobile sold.
46) Let X be a random variable with probability distribution
f(x) = (1+x)/2,
0,
Find the probability distribution of the random variable Y=X 2 .
-1<x<1
elsewhere.
47) The random variable X has density f(x) = 2 – 2x for 0 < x < 1
0
else
a) Compute the density of Y = α + β X for arbitrary α and β > 0.
b) Compute mean and variance for Y.
c) Make plots of the density for different choices of α and β. Explain, why α is called location and β is called scale parameter.
48) A random variable X has the discrete uniform distribution f(x)= 1/k for x=1,2,3,...,k and 0, elsewhere.
Show that the moment-generating function of X is Mx(t)= et(1-ekt)/k(1-et).
49) A random variable X has the geometric distribution g(x;p)= pq x-1 for x=1,2,3,...
a) Show that the moment-generating function of X is Mx(t)= pet/(1-qet)
b) Use Mx(t) to find the mean and variance of the geometric distribution.
50) A random variable X has the Poisson distribution p(x;μ)=e-μμx/x! for x=1,2,3,...
a) Show that the moment-generating function of X is Mx(t)= eμ(et-1).
b) Using Mx(t), find the mean and variance of the Poisson distribution.
51) Use the result of 50) to prove that the sum of two independent Poisson random variables is again Poisson distributed.
52) X ~ N( 20; σ = 3 ) and Y ~ N( 15; σ = 2 ), X and Y independent: Compute the probability P(Y>X).
53) X ~ N( 12; σ = 3 ); Compute the probability that the sum of 10 independent realisations of X is bigger than 150.
54) X ~ N( 20; σ = 3 ) and Y ~ N( 15; σ = 2 ), X and Y independent:
a) Compute the probability that P(2X+3Y> 80).
b) Compare the probability of 2 X > 50 and X + X > 50.
55) Assume that the random variables X and Y describe the distribution of the price of two different stocks a month in the
future. We assume that X ~ N( 17; σ = 3 ) and Y ~ N( 12; σ = 2 ).
a) If X and Y are independent compute the probability that X+Y is smaller than 25.
b) If X and Y are joint normal and have ρXY=0.5 compute the probability that X+Y < 25.
Hint: Remember that Cov(X Y) = ρXY σY σX
c) For X, Y joint normal and ρXY=0.5 we consider the random variate S=2X+3Y .
(S is the value of a portfolio with two stocks of Company X and 3 of company Y.
Compute the value that S exceeds with probability 99%.
Remark: c) could be seen as a “worst case analysis” for the value of the portfolio and is linked to the “value at risk” concept.
56) If X1, X2,..., Xn are independent random variables having identical exponential distributions with parameter θ, show that
the density function of the random variable Y= X1+ X2+...+ Xn is that of a gamma distribution with parameters α = n and β = θ.
57) A continuous random variate has the CDF F(x). Find the distribution of the random variate Y = F( X ).
58) Find the moment generating function of the Gamma distribution.
Due date: Friday 9th of November:
59) E(X) = 50; Var(X)=108; You take a sample of size 50.
a) Find the probability X-bar >52 using the assumption that X is normal.
b) Use simulation to find teh result of a).
60) X~U(30,70) (uniform distribution between 30 and 70); You take a sample of size 50.
a) Find the probability X-bar >52 using the assumption that X is normal.
b) Use simulation to find the exact probability for X-bar > 52 . Compare the results of a) and b).
Due date: Friday 16.11.
61) A soft-drink machine is being regulated so that the amount drink dispensed averages 240 milliliters with a standard
deviation of 16 milliliters. Periodically, the machine is checked by taking a ssample of 144 drinks and computing the average
content. The company official found the mean of 144 drinks to be 236 milliliters and concluded that the machine needed no
adjustment. Was this a reasonable decision?
62) The amount of time that a drive-through bank teller spends on a customer is a random variable with a mean μ=3.2 minutes
and a standard deviation σ = 3 minutes. If a random sample of 81 customers is observed, find the probability that their mean
time at the teller’s counter is more than 3.5 minutes.
63) The mean score for freshmen on an aptitude test, at a certain college, is 540, with a standard deviation of 50. What is the
probability that two groups of students selected at random, consisting of 32 and 50 students, respectively, will differ in their
mean scores by an amount between 5 and 10 points? Assume the means to be measured to any degree of accuracy.
64) Find the probability that a random sample of 25 observations, from a normal population with variance σ 2=6, will have a
variance s2 a) greater than 9.1 b) between 3.462 and 10.745.
65) Check your result of 64 a) and 64 b) using simulation.
66) A normal population with unknown variance has a mean of 21.
Is one likely to obtain a random sample of size 16 from this population with a mean of 24 and a standard deviation of 4.1? If
not, what conclusion would you draw?
67) Two normal variates have the same variance.
Calculate the probability that for two independent samples of size 100 the ratio of the two variances is
a) smaller than 0.9. b) larger than 1.2 ?
68) Check your result of 64 a) and 64 b) using simulation.
Due Friday 23.11.
69) From a random sample of size n=120 we calculate an x-bar value of 27 and a sample variance of 3.
a) Find a 95% confidence interval for the unknown mean of the parent population.
b) What assumptions are necessary to obtain the above result?
70) From a random sample of size n=12 we calculate an x-bar value of 27 and a sample variance of 3.
a) Find a 95% confidence interval for the unknown mean of the parent population.
b) What assumptions are necessary to calcuate the CI?
71) A random sample of 100 car owners shows, that in Virginia, a car is driven on the average 20,500 km per year, with a
standard deviation of 4000 km.
(a) Construct a 99% confidence interval.
(b) Explain the result of a) in a sentence.
(c) What can we say about the possible size of error, if we estimate the mean kilometers per year as 20,500?
(d) A friend of you, who lives in Virginia says: “I drive more than 30,000 km every year.” Is this statement a contradiction to
your result of a)? Why?
72) a) Referring to exercise 74) construct a 99% tolerance interval of the kilometers travelled by cars annual in Virginia.
b) Explain the result of a) in a sentence.
73) A Taxi company is trying to decide whether to buy brand A or brand B tires for its cars. A random experiment is conducted
using 12 of each brand. The number of kilometers is recorded till the tires wear out.
For brand A we obtain: Sample mean 36,300. sample standard deviation 3500.
For brand B we obtain: Sample mean 38,100. sample standard deviation 5000.
a) Compute a 99% confidence for the difference of the two means. (Do not assume equal variance.)
b) Which assumptions are necessary?
74) In a different experiment for 8 taxis a brand A and a brand B tire a randomly assigned to the rear wheels.
The results are Taxi
Brand A
Brand B
1
36,000
36,200
2
45,500
46,800
3
36,700
37,700
4
32,000
31,100
5
48,400
47,800
6
32,800
36,400
7
38,100
38,900
8
30,100
31,500
a) Find a 95% confidence interval for the difference of the two means.
b) Which assumptions are necessary?
c) What can the company learn from the result. What should the manager do?
d) In General: Which form of the experiment (that of number 76 or 77) do you think is better? Why?
75) In a random sample of 1000 homes in a certain city, it is found that 228 are heated by oil.
a) Find the 95% confidence interval for the proportion of homes in this city heated by oil.
b) What sample size is necessary to obtain a CI that is not longer than 0.01 if we assume that the true porportion is about 0.23.
c) What sample size is necessary to obtain a CI that is not longer than 0.01 if we make no assumptions about the proportion.
76)
The paragraph entitled “data76” below contains a sample from a normal population.
a) Use these observations to compute the 95% Confidence Intervals for the mean of the population
b) Compute a 99 % CI for the variance.
77)
The paragraphs entitled “data77a” and “data77b” below contain two independent samples of the same size of two
different populations A and B.
a) Compute a 99% CI for the difference between the means of the two populations.
b) Compute the differences between the two observations and compute a 99% CI for the mean of the differences.
c) Compare the result of a) and b).
data76:
7.419259
7.056495
9.375264
7.760131
7.599742
6.747239
7.295754
8.240287
7.926233
8.211881
7.637546
8.284557
8.091479
8.671331
6.952709
8.463596
7.488284
7.967883
8.540629
6.879072
8.138085
8.044871
7.895711
8.145631
7.744304
6.848380
8.365549
7.457207
8.143013
8.687392
7.205051
9.165117
7.916291
8.166786
7.767019
8.062623
7.205041
7.970955
7.938611
7.362170
8.665030
7.640752
8.736192
8.688080
6.115616
6.962965
7.805392
8.712742
6.736503
8.334070
8.197774
8.184240
8.428495
8.432457
8.116991
8.892680
9.610202
8.505759
8.479083
8.052199
6.965921
5.797876
8.527289
8.254530
8.446963
8.365858
7.341783
7.263865
6.854382
8.681943
8.253781
7.560372
7.367192
7.615177
8.363107
7.889474
7.721277
8.298355
8.137394
7.662689
7.564097
7.880886
8.160999
8.567554
7.001595
8.466932
5.681707
8.151236
8.817823
8.856159
8.560697
6.819934
7.596765
8.372532
8.062102
9.045914
7.763282
5.941443
8.508007
9.438167
7.877058
6.771858
8.131142
8.279876
8.025672
7.973712
8.689226
8.369996
6.662587
8.107453
8.304959
9.573606
6.701205
6.763532
7.609786
8.288504
8.285351
8.708177
8.072470
7.466076
7.334970
9.636361
8.485808
8.721086
8.483315
7.551206
8.063334
7.183934
7.893252
7.929290
8.334306
6.588512
7.800965
8.765859
7.806810
8.928453
7.566532
6.974015
7.168236
7.836283
8.919116
8.427228
8.385250
7.509284
7.121280
7.945396
6.782600
8.473952
9.092518
8.948584
7.402012
8.706531
8.569651
8.267758
8.728151
7.133510
8.413820
9.107859
6.637942
6.382142
8.068400
8.610322
8.460647
7.101550
9.099680
8.333331
8.213117
8.537296
7.837991
6.976908
5.802054
8.415503
8.809155
8.203243
6.919044
8.056109
7.108550
8.283236
8.185295
7.208631
7.590133
8.546314
7.751276
7.978944
8.169909
8.486126
8.654768
7.963543
7.841224
6.944446
7.486700
8.129097
Data77a
52.333690
55.166465
51.594258
50.419944
53.461261
54.904844
52.508731
49.911816
53.056366
53.873748
52.353986
53.083078
52.474261
52.759394
50.619011
53.010069
50.315980
52.566540
52.070468
53.585422
51.987218
51.849122
49.660424
53.245706
51.733168
52.106701
53.442978
54.220383
50.709237
52.036490
51.590022
53.818854
52.045158
52.421500
53.577453
53.830395
52.957456
51.632753
50.317643
52.994342
53.078327
52.127840
50.402423
53.380535
52.755852
53.264364
52.485122
53.239884
54.086508
50.807777
53.768739
52.009798
53.861452
53.730694
53.978723
51.542171
49.903385
51.338110
53.398497
51.788162
52.365106
55.222912
52.105629
53.427481
51.721977
52.440816
52.955336
50.435484
54.222694
53.564081
51.775821
51.117915
53.133485
Data77b
55.418937
54.036567
56.321394
57.087059
57.053921
56.690640
55.521681
53.998709
51.782508
55.944760
55.075649
54.881766
54.143390
55.621585
55.796658
53.625294
55.136540
52.590661
53.206185
53.922468
54.458077
56.050492
52.958139
52.316697
54.724639
56.323157
54.983971
53.656482
54.615007
56.106793
55.138095
54.393010
54.773609
54.686687
56.138678
55.206364
55.618597
54.228966
54.343576
53.622869
56.872121
55.432419
54.744826
52.100718
54.166574
53.099402
55.248392
55.206078
54.580442
54.384710
54.784792
54.773504
55.055130
54.267374
52.283858
52.361292
55.215105
54.997249
55.189243
55.997997
56.002856
56.257825
57.991198
54.309545
54.971946
53.393864
54.759862
55.943181
55.011442
53.659240
53.825036
55.794084
54.401110
Due Mo 26.11.
78) For the estimate for the mean value: mu-hat = “average of the first n-5 observations.
Is the estimate unbiased? Is it asymptotically unbiased? Is it consistent?
Proof your assumptions.
79) For the estimate for the mean value: mu-hat = “average of the first 5 observations.
Is the estimate unbiased? Is it asymptotically unbiased? Is it consistent?
Proof your assumptions.
80) Which estimate is better. 78) or 79)? Proof why.
81) For the estimate for the mean value: mu-hat = xbar *(n-1)/n
Is the estimate unbiased? Is it asymptotically unbiased? Is it consistent?
Proof your assumptions.
82) Show that the sample mean is an unbiased estimator for the mean μ of the parent population.
83) Suppose that there are n trials from a Bernoulli process with parameter p, the probability of success. Work out the
maximum likelihood estimators for the parameter p.
84) Consider the log-normal distribution.
Develop the maximum likelihood estimator for μ and σ 2.
85) Consider observations from the gamma distribution.
Write out the likelihood function and the set of equations, which when numerically solved, give the maximum likelihood
estimators for α and β.
86) 83) with moment etimate
87) 84) with moment etimate
88) a) 85) with moment etimate
c) For the Gamma distribution: What is the advantage of the moment estimator? What is the advantage of the maximum
likelihood estimator?
89) a) Find the MLE estimate for the parameter a for a uniform distribution on (0,a)
b) Find the moment estimate.
c) Which one is better? Use simulation with R to answer that question. Try a=1 and n=5, 10, 100, 1000.
d) Think of a sample of size n=3 where it is clear that the moment estimate is not good.
Final Assignments: Due 28.12.
1) Suppose that a scientist wishes to test the hypothesis that at least 20% of the public is allergic to a certain cheese product.
Explain how the scientist could commit the a) type I error. b) type II error.
2) The proportion of adults living in a small town who are college graduates is estimated to be p = 0.4. To test this hypothesis,
a random sample of 20 adults is selected. If the number of college graduates is anywhere from 4 to 12 we shall accept the null
hypothesis that p=0.4; otherwise we shall conclude that p is not equal to 0.4
a) Evaluate α (the probability for the type I error) assuming that p=0.4. Use the binomial distribution.
b) Evaluate β (the probability for the type II error) for the alternatives p=0.3 and p=0.5.
c) Is this a good test procedure?
d)Repeat a),b), c) when n=200 and the acceptance region is defined to be 70 ≤ x ≤ 90. Use the normal approximation.
3) For a new fishing line test the hypothesis that the mean breaking strength is μ = 15 kg against the alternative μ < 15.
n = 49 and assume σ = 2.1 is correct.
The critical region is defined as sample mean < 14.7.
a) Find α.
b) Find β for the alternatives μ = 14.8 and μ = 14.9.
c) Find the critical region for the two sided alternative: H A: μ ≠ 15 for α = 0.1.
d) Compute the P-Value of the test in a) when the sample mean is 14.8.
e) Compute the 95% confidence interval for the mean for the given data.
f) Interpret all results.
4) Assume that the lifetime of light bulbs is approximately normally distributed with a mean of 1800 hours and a standard
deviation of 50 hours. a) Test the hypothesis μ=1800 against the two-sided alternative. A random sample with n=30 had an
average life of 1760 hours. Use a 0.04 level of significance.
b) Compute the power of the test for the case that the true μ = 1760.
5) It is claimed that in the USA an automobile is driven on the average more than 20,000. km per year. In a random sample of
size n=100 we obtained a sample mean of 22,100 km and a sample standard deviation of 4000 km.
a) Use a P-value to make this test. What is your conclusion?
b) Replace the test above test by a two-sided one. How is the P-value changed? Why?
c) Is it possible to compute the power of this test if you make the assumption that the true μ=23.000 and the true σ = 5000?
Why?
6) Test the hypothesis that the average weight of boxes is 10 kg if we obtain the following random sample:
9.2 9.7 10.1 10.3 10.1 9.8 9.9 10.4 10.3 9.8
Decide yourself about a sensible level of significance.
What assumption is necessary to make the test?
7) A coin is tossed 40 times resulting in 10 heads. Is this enough evidence to conclude that head occurs less than 50 % of the
time? a) Compute a P-value with using the normal approximation!
b) Compute α if you decide to reject when the number of heads is smaller or equal to 5.
c) Compute β if the true probability for a head is 0.4.
8) It is assumed that more than 60% of the residents of a certain area favour an annexation suit by a neighbouring city.
To prove this 320 voters were asked and 195 said that they favour the annexation suit.
a) Use a 0.05 level of significance to test if this supports the assumption.
b) Compute the power of the test if the true proportion is 0.65.
9) For a normal population with σ = 20 we test H0 : μ = 200 against HA: μ ≠ 200.
Draw the OC-Curve (operating characteristic: this is a plot of the power of the test against the true value of μ) for n = 10, 100
and 10000 and α = 0.1 and 0.01. To do this compute the power for all three n and both α for all integers μ between 20 and 400.
a) What can we say about the performance of that test (and the difference between statistical and engineering significance) if
we assume that a deviation of μ by less than two is of no practical importance?
b) Is it possible that a sample size is too large?
Prepare a sheet with a print-out of the six curves (in two separate plots, one for α = 0.1 and 0.01), the formula you used for
computing the power (written by hand), and the answers to questions a and b.
10) A manufacturer claims that the average tensile strength of thread A exceeds the average tensile strength of thread B by at
least 10 kg. To test his claim, 64 pieces of each type of thread are tested under similar conditions. Type A had an average of
86.7 with s = 5.28 while type B had an average of 77.8 with s = 5.61. Test the manufacturers claim using α = 0.05.
b) Are there assumption we have to make for this test?
11) We want to test H0: μ = 14 against the two-sided alternative with α = 0.05. What sample size is necessary that the
probability of a type II error is guaranteed to be less than 0.2 when the true population mean differs from 14 by at least 1.5.
(From a preliminary sample we estimate σ = 2.)
Hint: You can use simulation with R or the tables of the book to answer this question.
12) A taxi company tries to decide if the use of radial tires improves fuel economy. Twelve cars equipped with radial tires were
driven over a test course. Without changing drivers the same cars were equipped with regular belted tires and driven over the
test course again. The gasoline consumption, in km per liter, was recorded as follows:
Car
1
2
3
4
5
6
7
8
9
10
11
12
Radial tires
4.2
4.7
6.6
7.0
6.7
4.5
5.7
6.0
7.4
4.9
6.1
5.2
Belted tires
4.1
4.9
6.2
6.9
6.8
4.4
5.7
5.8
6.9
4.7
6.0
4.9
a) Can we conclude that cars equipped with radial tires give better fuel economy? Use a P-Value.
b) Which assumptions are necessary to make this test?
c) Why is it sensible to consider also the result of the confidence interval?
13) To test a new medicine 120 people with disease A were given the medicine. Among them 34 were cured within two days.
Among 280 people who had the same disease but were not given any medicine 56 were cured within two days.
Is there any significant indication that supports the claim of the effectiveness of the medicine?
14) A soft drink dispensing machine is said to be out of control if the variance of the contents exceeds 1.15 deciliters. If a
random sample of 25 drinks from this machine has a variance of 2.03 deciliters, does this indicate at the 0.05 level that the
machine is out of control? Assume that the contents are approximately normally distributed.
15) A study is conducted to compare the length of time between men and women to assemble a certain product. Past
experience indicates that the distribution of times for both men and women is approximately normal but the variance of the
times for women is less than that for men. A random sample for 11 men and 14 women produced the following results:
men: n = 11; s = 6.5; women n = 14, s = 5.5. Test the hypothesis of equal variances against the alternative that the variance of
the times for women is less than that for men. Decide yourself about a (sensible) level of significance.
16) A study compares the average time per day used for watching TV in the USA and in Austria.
a) Use the histogram and the normal-quantile-quantile plot (r-command: qqnorm(x)) get an impression if the data are normally
distributed.
b) Make two box-whisker plots to compare the mean values.
c) Compare the means using the t-test.
d) Calculate the confidence interval for the difference of the means.
Give a short interpretation of your result for steps a), b), c) d).
TV Data
Austria
2.2
3.1
2.1
1.4
2.6
1.8
1.6
2.5
1.6
2.2
2.5
1.1
2.2
2.5
1.9
2.1
3.2
1.6
2.3
2.9
2.0
2.0
2.8
2.4
2.2
2.0
2.7
2.0
1.4
2.7
2.9
2.0
2.8
1.7
2.1
2.7
2.8
2.8
2.5
2.5
2.0
1.9
1.8
1.9
1.7
2.9
3.0
3.2
1.8
1.7
2.6
2.3
2.9
3.0
2.6
1.7
1.8
2.8
2.0
1.7
2.3
2.8
1.3
3.0
2.7
2.9
2.2
0.7
2.4
2.7
3.0
1.9
1.7
1.7
2.3
2.9
1.8
1.3
2.8
2.8
2.0
2.4
2.8
2.2
3.2
3.3
3.8
1.9
1.4
2.5
1.5
2.8
2.5
3.3
3.0
2.5
2.3
1.7
2.2
3.0
2.2
2.4
2.3
2.4
1.7
2.3
2.0
2.5
2.4
1.9
3.0
2.1
1.0
2.2
2.5
2.8
2.0
2.5
2.2
2.2
2.2
2.3
1.9
2.7
2.3
2.7
2.4
1.6
2.5
2.9
2.1
3.0
1.5
3.2
1.9
2.2
1.8
1.7
1.9
2.3
1.2
2.4
1.0
2.5
3.2
3.2
2.9
2.1
1.9
3.3
3.6
1.8
2.4
2.7
1.5
1.3
2.2
2.5
2.5
2.2
3.1
0.2
1.5
2.4
2.2
1.9
2.3
2.8
2.4
2.0
1.9
1.7
1.9
2.4
3.2
2.6
2.2
2.3
2.5
1.5
2.8
1.9
2.5
2.5
2.9
1.9
2.6
2.2
1.8
2.4
2.0
2.4
TV DATA
America
3.4
2.3
3.7
1.0
4.0
3.3
2.6
4.9
2.4
3.9
3.9
3.5
1
4.8
2.6
1.7
0.5
1.6
1.3
3.7
2.9
0.1
1.2
1.1
2.7
3.6
3.8
1
4.0
1.5
3.6
3.8
2.5
2.8
2.2
3.9
0.4
3.0
2.3
3.0
0.6
2.4
1.9
0.1
1.1
4.4
4.2
3.2
3.4
3.2
0.4
0.1
0.8
1.2
1.9
2.6
1.5
0.2
0.7
1.9
1.1
3.1
1
0.9
1.1
1.4
2.3
3.4
0.8
2.7
1.4
1.9
0.4
4.5
4.5
2.8
1.7
1.7
4.5
4.9
3.4
1.2
4.1
1.1
0.4
0.1
2.4
4.8
3.1
4.1
2.7
4.2
4.0
1.4
1.9
4.3
2.9
4.4
0.1
1.9
0.2
4.1
1.1
3.2
0.1
3.5
4.5
2.3
1.4
4.5
1.7
3.9
4.3
0.8
2.8
1.2
3.0
2.6
2.4
2.8
0.7
4.8
4.3
2.2
2.7
2.5
3.5
2.9
0.1
2.0
1.9
0.1
0.5
3.1
4.5
3.9
1.2
3.9
4.9
3.4
4.1
1
0.5
2.2
2.0
4.5
3.0
0.5
2.4
2.4
1.9
0.0
4.3
4.3
4.1
0.8
3.9
1.4
2.5
3.6
1.5
0.6
2.4
4.4
0.8
3.0
4.2
4.7
0.4
1.1
3.9
1.3
2.3
1
4.4
0.7
0.5
1.0
2.5
1.5
4.9
3.4
1
1.9
4.3
4.6
3.1
3.2
3.4
1.9
2.4
3.8
3.2
0.3
3.4
0.6
1.2
1.5
5.8
1.9
0.1
0.9
0.8
0.8
0.3
3.4
2.9
4.0
0.4
2.6
3.8
4.3
1.5
3.5
4.5
0.9
4.9
2.2
2.0
3.5
3.8
1.9
2.8
2.3
1.9
2.3
1.5
3.4
1.7
2.8
0.9
1
17) A random sample of 200 married men, all retired, were classified according to education and number of children. Test the
hypothesis that the size of a family is independent of the level of education attained by the father (Assume α = 0.05).
Number of childeren
Education
0-1
2-3
over 3
Elementary
14
37
32
Secondary
19
42
17
College
12
17
13
18) In a study to estimate the proportion of grown-ups who regularly watch soap operas, it is found that 52 of 200 persons in
Denver, 31 of 150 persons in Phoenix and 41 of 150 in Rochester watch at least one soap opera. Use a 0.05 level of
significance to test the hypothesis that there is no difference among the true proportions of persons watching soap operas.
19) The following number of car-accidents per day are observed during a period of 106 days in a big city:
0 for 33 days; 1 for 45 days 2 for 20 days; 3 for 5 days; 4 for 2 days; 7 for 1 day
Test if the number of car accidents follows a Poisson distribution. (Assume α = 0.1)
20)
To build a simulation model for the customer flow in a bank office we have to model the service times at counter1 and
counter2. We have collected 100 service times of counter 1 (generate them using rst1(100) for the r-function
rst1<-function(n){p<-runif(n); 5.2*((p<0.5)*(16+3*rnorm(n))+(p>0.5)*rgamma(n,13))} )
and 1000 service times of counter 2 (generate them using rst2(1000) for the r-function
rst2<-function(n){p<-runif(n); 2.9*((p<0.9)*(18+3*rnorm(n))+(p>0.9)*rgamma(n,16))} )
We assume that both service times are gamma distributed. Then we have to find estimates for the parameters of the gamma
distribution that we can start with our simulation.
a) Compute the moment estimates for the parameters of the waiting time for Server 1 and Server 2.
b) Make the chi2 goodness of fit test for the gamma distribution to check if the gamma assumption is correct.
Use 5 intervals for the smaller sample and 25 intervals for the large samples; choose the intervals such that they have (at least
approximately) the same probability. (Hint: Do’nt forget to substract the number of estimated parameters from the number of
degrees of freedom you use for the test-statistic!!!!)
c) Interpret your result of b). What is the problem of the tests, especially for the first sample?
Hint: It is easiest to make the chi-square test by transforming the data with the assumed CDF of the Gamma distribution. Then
you can make the chi-square test for the U(0,1) uniform distribution and can easily divide into the sub-intervals.
21)
We have two independent samples of size n=50 and n=500 from the same distribution (both generated by the R-function
ran<-function(n){p<-runif(n); 12+3.3*((p<0.5)*(0.5+rnorm(n)/sqrt(12))+(p>0.5)*runif(n))}).
We want to test if the unknown distribution could be the normal distribution.
a) Make a histogram and the normal quantile-quantile plot for both samples and interpret its results.
b) Make the Shapiro-Wilk test of normality for both samples (Shapiro.test). Can we reject the hypothesis that the samples
comes from a normal distribution? (Assume α = 0.1)
c) Why is it easily possible that we get different results for the two samples, although they come from the same distribution?
d) Use simulation to find the Power of the Shapiro-Wilk test of normality for the “ran()” distribution for
n= 20, 50, 100, 200, 500 .
22)
We want to test the hypothesis H0: μ = 1 against the alternative H1: μ < 1.
We know that the parent population is exponentially distributed and the sample size is n = 10. Clearly we should not use the
standard t-test as the normal assumption is not fulfilled. So we can try to find the critical region by simulation.
If the sample mean of a sample is smaller or equal to that value, H 0 is rejected.
a) Compare your result of the critical sample mean with the result that we would obtain when using the t-test (with σ = 1).
b) Is it different? Why?
c) Make the same experiment with sample size n = 100.
d) Is x-bar critical closer to the result of the t-test? Why?
e) Use the critical region of a) to estimate the power of this test for μ = 0.5.
For the PS on 28.12. All students: 3, 16, 20, 21, 22
Other exercises:
Students with an odd student number should especially prepare the odd questions,
students with an even student number especially the even questions.
Of course for the final you are responsible for all questions.