Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ASSIGNMNETS IE504 – FALL 2007 PS 1: Probability (due: Monday 1st of October) 1) For married couples living in a certain suburb the probability that the husband will vote on a bond referendum is 0.25, the probability that his wife will vote in the referendum is 0.32, and the probability that both the husband and wife will vote is 0.15. What is the probability that a) At least one member of a married couple will vote? b) A wife will vote, given that her husband will vote? c) A husband will vote, given that his wife does not vote? d) Are the events “the wife will vote” and “the husband will vote” independent? 2) Suppose that the four inspectors at a film factory are supposed to stamp the expiration date on each package of film at the end of the assembly line. John, who stamps 25% of the packages, fails to stamp the expiration date once in every 250 packages; Tom, who stamps 40% of the packages, fails to stamp the expiration date once in every 100 packages; Jeff, who stamps 25% of the packages, fails to stamp the expiration date once in every 90 packages; and Pat, who stamps 10% of the packages, fails to stamp the expiration date once in every 200 packages. If a customer complains that her package of film does not show the expiration date, what is the probability that it was inspected by John? 3) A truth serum has the property that 90% of the guilty suspects are properly judged while, of course, 10% of guilty suspects are improperly found innocent. On the other hand, innocent suspects are misjudged 2% of the time. If the suspect was selected from a group of suspects of which only 5% have ever committed a crime, and the serum indicates that he is guilty, what is the prob. that he is innocent? 4) The probability that a patient recovers from a delicate heart operation is 0.85. What is the probability that a) Exactly 2 of the next 3 patients who have this operation survive? b) All of the next 3 patients who have this operation survive? c) What is a necessary assumption that we can solve part a) and b) of this question. 5) In a certain federal prison it is known that 2/3 of the inmates are under 25 years of age. It is also known that 3/5 of the inmates are male and that 5/8 of the inmates are female or 25 years of age or older. What is the probability that a prisoner selected at random from this prison is female and at least 25 years old? 6) From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn in succession, each ball being replaced in the box before the next draw is made. What is the probability that a) All 3 are the same colour? b) Each color is represented? 7) From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn in succession without replacement. What is the probabilty that the third ball drawn is black? 8) If the events A and B are independent and the events A and C are independent what can we say about the events B and C? Proof that your assertion is correct. 9) (*)From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn with replacement. a) Find the probability function f(x) of the random variate X = “Sum of red balls drawn”. b) Calculate F(2) and F(0.17). 10) From a box containing 4 black balls, 3 red balls and 3 green balls. 3 balls are drawn without replacement. a) Find the probability function f(x) of the random variate X = “Sum of red balls drawn”. b) Calculate F(2) and F(0.17). 11) For the discrete R.V. X with f(0) = 0.4, f(1) = 0.2, f(2) = f(3) = f(4) = f(5) = 0.1 find a) P(1 ≤ X < 6) b) the expectation, c) the variance . 12) The waiting time, in hours, between successive speeders spotted by a radar unit is a continuous random variable with cumulative distribution 0 x≤0 F(x)= 1 − e−3x x > 0 Find the probability of waiting less than 12 minutes between successive speeders a) using the cumulative distribution of X; b) using the probability density function of X. 13) Consider the density function kx 0<x<4 f(x)= 0 elsewhere a) Evaluate k, b) Find F(x) and use it to evaluate P(0.3 < X < 0.6). c) Find the expectation and the variance of the random variable. 14) A continuous random variable has density 1+x for -1 < x ≤ 0 f(x) = 0.5 for 0 < x ≤ 1 0 else a) b) c) d) Check that f is a density. Find the CDF F(x). Compute the probability P(- 0.2 < X ≤ 0.5). Compute the expectation. 15) Consider the (continuous) random variate X=”time in minutes that it takes a randomly selected student of our class to solve question 10”. Assume that the time is always between 0.5 and 10 minutes and most students work about 3 minutes. a) Construct a function that can be used as density for the random variate X. b) Calculate its CDF. 16) 3 balls are selected at random from an urn with 3 blue, 2 red and 1 green ball. Let X be the total number of blue balls selected and let Y be the number of red balls selected. a) Find the joint probability mass function f(x,y). b) Find the marginal distribution of X. c) Find the conditional distribution of Y given that X is equal to 1. d) Are X and Y independent? e) Find: E(X), E(Y), V(Y), E(XY+X^2) f) Find: E(Y | X=1), V(Y | X=1) and E(X | Y=0) g) Find Cov(X,Y) 17) A continuous random variable has density 1+x for -1 < x ≤ 0 f(x) = 1 for 0 < x ≤ 0.5 0 else a) Check that f is a density. b) Find the CDF F(x). c) Compute the expectation and variance of X. d) Find the conditional density of X: f(x|X < -0.5) e) Find the conditional expectation of X: E(X| X < -0.5) 18) Let X be a random variable with the following probability distribution: x -3 6 9 f(x) 1/6 1/3 ½ Find E(X) and E(X2) and then, using these values evaluate E(2X+1)2. 19) If X and Y are independent random variables with expectations μx = 1 and μy = 2 and variances 2x=5 and 2y=3, find the expectation and variance of the random variable Z= -2X + 4Y - 3. 20) Repeat question 30 for the case that X and Y are not independent and xy = 1. 21) Suppose that X and Y have the following joint probability function: f(x,y) x: 2 6 1 0.1 0.15 y 3 0.25 0.25 5 0.1 0.15 a) Find the covariance of X and Y b) Find the expected value of g (X,Y) = XY2 c) Find x and y. d) Find the correlation of X and Y e) Find the conditional distribution of Y: f ( y |X=6) f) Compute E( Y | X = 6) and E( X | Y = 3) g) Are X and Y independent? 22) Out of an urn with 5 blue, 3 red and 2 white balls you are sampling two balls with replacement. Let X denote the number of blue, Y the number of red and Z the number of white balls. a) Find the marginal distribution of Y. b) Find the joint distribution of X and Y. c) Find the conditional distribution: f(X|Y=1) d) Find the conditional expectation E(X|Y=1) Due date for questions 23 to 42: PS Friday 19.10. 23) It is known that 30 % of mice inoculated with a serum are protected from a certain disease. If 4 mice are inoculated find the probability that a) none contracts the disease, b) fewer than two contract the disease c) more than 3 contract the disease. 24) According to a genetics theory, a certain cross of guinea pigs will result in red, black and white off-spring in the ratio 8:4:4. Find the probability that among 8 offspring 4 will be red, 2 black, and 2 white. 25) From a lot of 10 missiles, 4 are selected at random and fired. If the lot contains 4 defective missiles that will not fire, what is the probability that among the 4 selected a) all 4 will fire? b) at most 2 will not fire? 26) A scientist inoculates several mice, one at a time, with a disease germ until he finds 2 that have contracted the disease. If the probability of contracting is 1/6, a) what is the probability that 8 mice are required? b) What are the expected value and the standard deviation of the number of required mice. 27) Service calls come to a maintenance center according to a Poisson process and on the average 180 calls per hour. Find the probability that a) no more than 4 calls come in any minute. b) fewer than 2 calls come in any minute. c) Fewer than 4 calls come in a 5-minute period. 28) In the November 1990 issue of Chemical Engineering Progress a study discussed the percent purity of oxygen from a certain supplier. Assume that the mean was 99.63 with a standard deviation of 0.08. Assume that the distribution of percent purity was approximately normal. a) What percentage of the purity values would you expect to be between 99.6 and 99.7? b) What percentage of the purity values is more than 0.1 away from the mean? c) What purity value would you expect to exceed exactly 7% of the population? d) What purity value is exceeded by exactly 3% of the population? 29) The weights of a large number of miniature poodles are approximately normally distributed with a mean of 9 kilograms and a standard deviation of 0.7 kilogram. Find the fraction of these poodles with weights a) over 9.5 kilograms; b) at most 8.6 kilograms; c) between 7.3 and 9.1 kilograms inclusive; d) of 9 kg. 30) A bus arrives every 13 minutes at a bus stop. It is assumed that the waiting time for a particular individual is a random variable with a uniform distribution. a) What is the probability that the individual waits more than 7 minutes? b) What is the probability that the individual waits between 2 and 7 minutes? 31) Statistics released by the National Highway Traffic Safety Administration and the National Safety Council show that on an average weekend night, 1 out of every 20 drivers on the road is drunk. If 1500 drivers are randomly checked next Saturday night, what is the probability that the number of drunk drivers will be at least 70 but less than 94? 32) In a biomedical research activity it was determined that the survival time in weeks of an animal when subjected to a certain exposure of gamma radiation has a gamma distribution with = 5 and =10. a) What is the mean survival time of a randomly selected animal of the type used in the experiment? b) What is the standard deviation of survival time? c) What is the probability that an animal survives more than 30 weeks? 33) We assume that the lifetime of an electric bulb follows an exponential distribution with mean value 5000 hours. Find the lifetimes that are exceeded by the probabilities 50% and 5%. 34) In one hour 5000 cars are passing a certain filling station on a high-way. We assume that a single driver decides (independently of the others) to enter the station with a probability of 0.02. a) Compute the probability that less than 77 cars enter the station. b) Compute the probability that between 90 and 110 cars are entering the station. 35) Among 2000 electric devices 200 are defective. For quality control reasons 100 randomly chosen pieces are tested. a) Compute the probability that less than 4 of the selected 100 pieces are defective. b) Compute the probability that between 15 and 20 pieces are defective. 36) A bookshop is selling a certain monthly journal for 5$ per piece and buys it from a publishing house at 3$ per piece. Lets assume that the number of sold journals per month is Poisson distributed with expectation 120. As the bookshop cannot hand back journals that were not sold, it has to decide about the number of journals that are ordered every month. a) Compute the expectation and the variance of the money the bookshop gains when ordering 100 journals. b) Compute the expectation and the variance of the money the bookshop gains when ordering 120 journals. c) Try to find the number of ordered journals that maximises the expected gained money. Compute the expectation and variance of the gained money for that number of orders. d) Comment on the interpretation of the variance in this example. What will the manager of the bookshop try to do if he does not want to take too much risk. 37) The lifetime in weeks of a certain type of transistor is known to follow a gamma distribution with mean 10 weeks and standard deviation 50 weeks. a) What is the probability that the transistor will last at most 50 weeks? b) What is the probability that the transistor will not survive the first 10 weeks? 38) The life of a certain type of device has an advertised failure rate of 0.01 per hour. The failure rate is constant and the exponential distribution applies. a) What is the mean time to failure? b) What is the probability that 200 hours will pass before a failure is observed? 39) We are given a Poisson process with rate λ = 0.1 per hour. a) What is the distribution and the density of the waiting time till the first event occurs. b) What is the distribution and the density of the waiting time till the 3 rd event occurs. 40) Proof that the sum of three independent exponential random variates (all have mean 1) is gamma-distributed with parameters α = 3 and β = 1. 41) a) Compute the mean and the variance of the exponential distribution with (λ = 1). b) Use the result of a) and the result of exercise 49) to calculate the mean and the variance of the Gamma- distribution with (α = 3; β = 1). 42) We consider a “discrete random walk with drift” defined by: X 0 = 0 have the pmf: f(1) = p; f(0)=(1-p) a) Find the expectation and the variance of X1 X2 and X3 . b) Find E(X2| X1 = 1 ) and V(X5| X4 = 1 ) c) Find a general formula for E(Xi+2| Xi = 2 ) Xi+1 = Xi + Bi+1 where all are Bi independent and For Questions 43 to 58: Due Friday, 2nd of November 43) Let X be a binomial random variable with n=3 and p=1/2. Find the probability distribution of the random variable Y=X 2 44) Let X have a continuous uniform distribution between 0 and 1. Show that the random variable Y= – 2 lnX has a Gamma distribution. Find the parameters of the Gamma distribution. 45) A dealer’s profit, in units of $1000, on a new automobile is given by Y=X 1/2, where X is a random variable having the density function 2(1-x), 0<x<1 f(x)= 0, elsewhere. a) Find the probability density function of Y. b) Using the density of Y, find the probability that the profit will be more than $500 on the next new automobile sold. 46) Let X be a random variable with probability distribution f(x) = (1+x)/2, 0, Find the probability distribution of the random variable Y=X 2 . -1<x<1 elsewhere. 47) The random variable X has density f(x) = 2 – 2x for 0 < x < 1 0 else a) Compute the density of Y = α + β X for arbitrary α and β > 0. b) Compute mean and variance for Y. c) Make plots of the density for different choices of α and β. Explain, why α is called location and β is called scale parameter. 48) A random variable X has the discrete uniform distribution f(x)= 1/k for x=1,2,3,...,k and 0, elsewhere. Show that the moment-generating function of X is Mx(t)= et(1-ekt)/k(1-et). 49) A random variable X has the geometric distribution g(x;p)= pq x-1 for x=1,2,3,... a) Show that the moment-generating function of X is Mx(t)= pet/(1-qet) b) Use Mx(t) to find the mean and variance of the geometric distribution. 50) A random variable X has the Poisson distribution p(x;μ)=e-μμx/x! for x=1,2,3,... a) Show that the moment-generating function of X is Mx(t)= eμ(et-1). b) Using Mx(t), find the mean and variance of the Poisson distribution. 51) Use the result of 50) to prove that the sum of two independent Poisson random variables is again Poisson distributed. 52) X ~ N( 20; σ = 3 ) and Y ~ N( 15; σ = 2 ), X and Y independent: Compute the probability P(Y>X). 53) X ~ N( 12; σ = 3 ); Compute the probability that the sum of 10 independent realisations of X is bigger than 150. 54) X ~ N( 20; σ = 3 ) and Y ~ N( 15; σ = 2 ), X and Y independent: a) Compute the probability that P(2X+3Y> 80). b) Compare the probability of 2 X > 50 and X + X > 50. 55) Assume that the random variables X and Y describe the distribution of the price of two different stocks a month in the future. We assume that X ~ N( 17; σ = 3 ) and Y ~ N( 12; σ = 2 ). a) If X and Y are independent compute the probability that X+Y is smaller than 25. b) If X and Y are joint normal and have ρXY=0.5 compute the probability that X+Y < 25. Hint: Remember that Cov(X Y) = ρXY σY σX c) For X, Y joint normal and ρXY=0.5 we consider the random variate S=2X+3Y . (S is the value of a portfolio with two stocks of Company X and 3 of company Y. Compute the value that S exceeds with probability 99%. Remark: c) could be seen as a “worst case analysis” for the value of the portfolio and is linked to the “value at risk” concept. 56) If X1, X2,..., Xn are independent random variables having identical exponential distributions with parameter θ, show that the density function of the random variable Y= X1+ X2+...+ Xn is that of a gamma distribution with parameters α = n and β = θ. 57) A continuous random variate has the CDF F(x). Find the distribution of the random variate Y = F( X ). 58) Find the moment generating function of the Gamma distribution. Due date: Friday 9th of November: 59) E(X) = 50; Var(X)=108; You take a sample of size 50. a) Find the probability X-bar >52 using the assumption that X is normal. b) Use simulation to find teh result of a). 60) X~U(30,70) (uniform distribution between 30 and 70); You take a sample of size 50. a) Find the probability X-bar >52 using the assumption that X is normal. b) Use simulation to find the exact probability for X-bar > 52 . Compare the results of a) and b). Due date: Friday 16.11. 61) A soft-drink machine is being regulated so that the amount drink dispensed averages 240 milliliters with a standard deviation of 16 milliliters. Periodically, the machine is checked by taking a ssample of 144 drinks and computing the average content. The company official found the mean of 144 drinks to be 236 milliliters and concluded that the machine needed no adjustment. Was this a reasonable decision? 62) The amount of time that a drive-through bank teller spends on a customer is a random variable with a mean μ=3.2 minutes and a standard deviation σ = 3 minutes. If a random sample of 81 customers is observed, find the probability that their mean time at the teller’s counter is more than 3.5 minutes. 63) The mean score for freshmen on an aptitude test, at a certain college, is 540, with a standard deviation of 50. What is the probability that two groups of students selected at random, consisting of 32 and 50 students, respectively, will differ in their mean scores by an amount between 5 and 10 points? Assume the means to be measured to any degree of accuracy. 64) Find the probability that a random sample of 25 observations, from a normal population with variance σ 2=6, will have a variance s2 a) greater than 9.1 b) between 3.462 and 10.745. 65) Check your result of 64 a) and 64 b) using simulation. 66) A normal population with unknown variance has a mean of 21. Is one likely to obtain a random sample of size 16 from this population with a mean of 24 and a standard deviation of 4.1? If not, what conclusion would you draw? 67) Two normal variates have the same variance. Calculate the probability that for two independent samples of size 100 the ratio of the two variances is a) smaller than 0.9. b) larger than 1.2 ? 68) Check your result of 64 a) and 64 b) using simulation. Due Friday 23.11. 69) From a random sample of size n=120 we calculate an x-bar value of 27 and a sample variance of 3. a) Find a 95% confidence interval for the unknown mean of the parent population. b) What assumptions are necessary to obtain the above result? 70) From a random sample of size n=12 we calculate an x-bar value of 27 and a sample variance of 3. a) Find a 95% confidence interval for the unknown mean of the parent population. b) What assumptions are necessary to calcuate the CI? 71) A random sample of 100 car owners shows, that in Virginia, a car is driven on the average 20,500 km per year, with a standard deviation of 4000 km. (a) Construct a 99% confidence interval. (b) Explain the result of a) in a sentence. (c) What can we say about the possible size of error, if we estimate the mean kilometers per year as 20,500? (d) A friend of you, who lives in Virginia says: “I drive more than 30,000 km every year.” Is this statement a contradiction to your result of a)? Why? 72) a) Referring to exercise 74) construct a 99% tolerance interval of the kilometers travelled by cars annual in Virginia. b) Explain the result of a) in a sentence. 73) A Taxi company is trying to decide whether to buy brand A or brand B tires for its cars. A random experiment is conducted using 12 of each brand. The number of kilometers is recorded till the tires wear out. For brand A we obtain: Sample mean 36,300. sample standard deviation 3500. For brand B we obtain: Sample mean 38,100. sample standard deviation 5000. a) Compute a 99% confidence for the difference of the two means. (Do not assume equal variance.) b) Which assumptions are necessary? 74) In a different experiment for 8 taxis a brand A and a brand B tire a randomly assigned to the rear wheels. The results are Taxi Brand A Brand B 1 36,000 36,200 2 45,500 46,800 3 36,700 37,700 4 32,000 31,100 5 48,400 47,800 6 32,800 36,400 7 38,100 38,900 8 30,100 31,500 a) Find a 95% confidence interval for the difference of the two means. b) Which assumptions are necessary? c) What can the company learn from the result. What should the manager do? d) In General: Which form of the experiment (that of number 76 or 77) do you think is better? Why? 75) In a random sample of 1000 homes in a certain city, it is found that 228 are heated by oil. a) Find the 95% confidence interval for the proportion of homes in this city heated by oil. b) What sample size is necessary to obtain a CI that is not longer than 0.01 if we assume that the true porportion is about 0.23. c) What sample size is necessary to obtain a CI that is not longer than 0.01 if we make no assumptions about the proportion. 76) The paragraph entitled “data76” below contains a sample from a normal population. a) Use these observations to compute the 95% Confidence Intervals for the mean of the population b) Compute a 99 % CI for the variance. 77) The paragraphs entitled “data77a” and “data77b” below contain two independent samples of the same size of two different populations A and B. a) Compute a 99% CI for the difference between the means of the two populations. b) Compute the differences between the two observations and compute a 99% CI for the mean of the differences. c) Compare the result of a) and b). data76: 7.419259 7.056495 9.375264 7.760131 7.599742 6.747239 7.295754 8.240287 7.926233 8.211881 7.637546 8.284557 8.091479 8.671331 6.952709 8.463596 7.488284 7.967883 8.540629 6.879072 8.138085 8.044871 7.895711 8.145631 7.744304 6.848380 8.365549 7.457207 8.143013 8.687392 7.205051 9.165117 7.916291 8.166786 7.767019 8.062623 7.205041 7.970955 7.938611 7.362170 8.665030 7.640752 8.736192 8.688080 6.115616 6.962965 7.805392 8.712742 6.736503 8.334070 8.197774 8.184240 8.428495 8.432457 8.116991 8.892680 9.610202 8.505759 8.479083 8.052199 6.965921 5.797876 8.527289 8.254530 8.446963 8.365858 7.341783 7.263865 6.854382 8.681943 8.253781 7.560372 7.367192 7.615177 8.363107 7.889474 7.721277 8.298355 8.137394 7.662689 7.564097 7.880886 8.160999 8.567554 7.001595 8.466932 5.681707 8.151236 8.817823 8.856159 8.560697 6.819934 7.596765 8.372532 8.062102 9.045914 7.763282 5.941443 8.508007 9.438167 7.877058 6.771858 8.131142 8.279876 8.025672 7.973712 8.689226 8.369996 6.662587 8.107453 8.304959 9.573606 6.701205 6.763532 7.609786 8.288504 8.285351 8.708177 8.072470 7.466076 7.334970 9.636361 8.485808 8.721086 8.483315 7.551206 8.063334 7.183934 7.893252 7.929290 8.334306 6.588512 7.800965 8.765859 7.806810 8.928453 7.566532 6.974015 7.168236 7.836283 8.919116 8.427228 8.385250 7.509284 7.121280 7.945396 6.782600 8.473952 9.092518 8.948584 7.402012 8.706531 8.569651 8.267758 8.728151 7.133510 8.413820 9.107859 6.637942 6.382142 8.068400 8.610322 8.460647 7.101550 9.099680 8.333331 8.213117 8.537296 7.837991 6.976908 5.802054 8.415503 8.809155 8.203243 6.919044 8.056109 7.108550 8.283236 8.185295 7.208631 7.590133 8.546314 7.751276 7.978944 8.169909 8.486126 8.654768 7.963543 7.841224 6.944446 7.486700 8.129097 Data77a 52.333690 55.166465 51.594258 50.419944 53.461261 54.904844 52.508731 49.911816 53.056366 53.873748 52.353986 53.083078 52.474261 52.759394 50.619011 53.010069 50.315980 52.566540 52.070468 53.585422 51.987218 51.849122 49.660424 53.245706 51.733168 52.106701 53.442978 54.220383 50.709237 52.036490 51.590022 53.818854 52.045158 52.421500 53.577453 53.830395 52.957456 51.632753 50.317643 52.994342 53.078327 52.127840 50.402423 53.380535 52.755852 53.264364 52.485122 53.239884 54.086508 50.807777 53.768739 52.009798 53.861452 53.730694 53.978723 51.542171 49.903385 51.338110 53.398497 51.788162 52.365106 55.222912 52.105629 53.427481 51.721977 52.440816 52.955336 50.435484 54.222694 53.564081 51.775821 51.117915 53.133485 Data77b 55.418937 54.036567 56.321394 57.087059 57.053921 56.690640 55.521681 53.998709 51.782508 55.944760 55.075649 54.881766 54.143390 55.621585 55.796658 53.625294 55.136540 52.590661 53.206185 53.922468 54.458077 56.050492 52.958139 52.316697 54.724639 56.323157 54.983971 53.656482 54.615007 56.106793 55.138095 54.393010 54.773609 54.686687 56.138678 55.206364 55.618597 54.228966 54.343576 53.622869 56.872121 55.432419 54.744826 52.100718 54.166574 53.099402 55.248392 55.206078 54.580442 54.384710 54.784792 54.773504 55.055130 54.267374 52.283858 52.361292 55.215105 54.997249 55.189243 55.997997 56.002856 56.257825 57.991198 54.309545 54.971946 53.393864 54.759862 55.943181 55.011442 53.659240 53.825036 55.794084 54.401110 Due Mo 26.11. 78) For the estimate for the mean value: mu-hat = “average of the first n-5 observations. Is the estimate unbiased? Is it asymptotically unbiased? Is it consistent? Proof your assumptions. 79) For the estimate for the mean value: mu-hat = “average of the first 5 observations. Is the estimate unbiased? Is it asymptotically unbiased? Is it consistent? Proof your assumptions. 80) Which estimate is better. 78) or 79)? Proof why. 81) For the estimate for the mean value: mu-hat = xbar *(n-1)/n Is the estimate unbiased? Is it asymptotically unbiased? Is it consistent? Proof your assumptions. 82) Show that the sample mean is an unbiased estimator for the mean μ of the parent population. 83) Suppose that there are n trials from a Bernoulli process with parameter p, the probability of success. Work out the maximum likelihood estimators for the parameter p. 84) Consider the log-normal distribution. Develop the maximum likelihood estimator for μ and σ 2. 85) Consider observations from the gamma distribution. Write out the likelihood function and the set of equations, which when numerically solved, give the maximum likelihood estimators for α and β. 86) 83) with moment etimate 87) 84) with moment etimate 88) a) 85) with moment etimate c) For the Gamma distribution: What is the advantage of the moment estimator? What is the advantage of the maximum likelihood estimator? 89) a) Find the MLE estimate for the parameter a for a uniform distribution on (0,a) b) Find the moment estimate. c) Which one is better? Use simulation with R to answer that question. Try a=1 and n=5, 10, 100, 1000. d) Think of a sample of size n=3 where it is clear that the moment estimate is not good. Final Assignments: Due 28.12. 1) Suppose that a scientist wishes to test the hypothesis that at least 20% of the public is allergic to a certain cheese product. Explain how the scientist could commit the a) type I error. b) type II error. 2) The proportion of adults living in a small town who are college graduates is estimated to be p = 0.4. To test this hypothesis, a random sample of 20 adults is selected. If the number of college graduates is anywhere from 4 to 12 we shall accept the null hypothesis that p=0.4; otherwise we shall conclude that p is not equal to 0.4 a) Evaluate α (the probability for the type I error) assuming that p=0.4. Use the binomial distribution. b) Evaluate β (the probability for the type II error) for the alternatives p=0.3 and p=0.5. c) Is this a good test procedure? d)Repeat a),b), c) when n=200 and the acceptance region is defined to be 70 ≤ x ≤ 90. Use the normal approximation. 3) For a new fishing line test the hypothesis that the mean breaking strength is μ = 15 kg against the alternative μ < 15. n = 49 and assume σ = 2.1 is correct. The critical region is defined as sample mean < 14.7. a) Find α. b) Find β for the alternatives μ = 14.8 and μ = 14.9. c) Find the critical region for the two sided alternative: H A: μ ≠ 15 for α = 0.1. d) Compute the P-Value of the test in a) when the sample mean is 14.8. e) Compute the 95% confidence interval for the mean for the given data. f) Interpret all results. 4) Assume that the lifetime of light bulbs is approximately normally distributed with a mean of 1800 hours and a standard deviation of 50 hours. a) Test the hypothesis μ=1800 against the two-sided alternative. A random sample with n=30 had an average life of 1760 hours. Use a 0.04 level of significance. b) Compute the power of the test for the case that the true μ = 1760. 5) It is claimed that in the USA an automobile is driven on the average more than 20,000. km per year. In a random sample of size n=100 we obtained a sample mean of 22,100 km and a sample standard deviation of 4000 km. a) Use a P-value to make this test. What is your conclusion? b) Replace the test above test by a two-sided one. How is the P-value changed? Why? c) Is it possible to compute the power of this test if you make the assumption that the true μ=23.000 and the true σ = 5000? Why? 6) Test the hypothesis that the average weight of boxes is 10 kg if we obtain the following random sample: 9.2 9.7 10.1 10.3 10.1 9.8 9.9 10.4 10.3 9.8 Decide yourself about a sensible level of significance. What assumption is necessary to make the test? 7) A coin is tossed 40 times resulting in 10 heads. Is this enough evidence to conclude that head occurs less than 50 % of the time? a) Compute a P-value with using the normal approximation! b) Compute α if you decide to reject when the number of heads is smaller or equal to 5. c) Compute β if the true probability for a head is 0.4. 8) It is assumed that more than 60% of the residents of a certain area favour an annexation suit by a neighbouring city. To prove this 320 voters were asked and 195 said that they favour the annexation suit. a) Use a 0.05 level of significance to test if this supports the assumption. b) Compute the power of the test if the true proportion is 0.65. 9) For a normal population with σ = 20 we test H0 : μ = 200 against HA: μ ≠ 200. Draw the OC-Curve (operating characteristic: this is a plot of the power of the test against the true value of μ) for n = 10, 100 and 10000 and α = 0.1 and 0.01. To do this compute the power for all three n and both α for all integers μ between 20 and 400. a) What can we say about the performance of that test (and the difference between statistical and engineering significance) if we assume that a deviation of μ by less than two is of no practical importance? b) Is it possible that a sample size is too large? Prepare a sheet with a print-out of the six curves (in two separate plots, one for α = 0.1 and 0.01), the formula you used for computing the power (written by hand), and the answers to questions a and b. 10) A manufacturer claims that the average tensile strength of thread A exceeds the average tensile strength of thread B by at least 10 kg. To test his claim, 64 pieces of each type of thread are tested under similar conditions. Type A had an average of 86.7 with s = 5.28 while type B had an average of 77.8 with s = 5.61. Test the manufacturers claim using α = 0.05. b) Are there assumption we have to make for this test? 11) We want to test H0: μ = 14 against the two-sided alternative with α = 0.05. What sample size is necessary that the probability of a type II error is guaranteed to be less than 0.2 when the true population mean differs from 14 by at least 1.5. (From a preliminary sample we estimate σ = 2.) Hint: You can use simulation with R or the tables of the book to answer this question. 12) A taxi company tries to decide if the use of radial tires improves fuel economy. Twelve cars equipped with radial tires were driven over a test course. Without changing drivers the same cars were equipped with regular belted tires and driven over the test course again. The gasoline consumption, in km per liter, was recorded as follows: Car 1 2 3 4 5 6 7 8 9 10 11 12 Radial tires 4.2 4.7 6.6 7.0 6.7 4.5 5.7 6.0 7.4 4.9 6.1 5.2 Belted tires 4.1 4.9 6.2 6.9 6.8 4.4 5.7 5.8 6.9 4.7 6.0 4.9 a) Can we conclude that cars equipped with radial tires give better fuel economy? Use a P-Value. b) Which assumptions are necessary to make this test? c) Why is it sensible to consider also the result of the confidence interval? 13) To test a new medicine 120 people with disease A were given the medicine. Among them 34 were cured within two days. Among 280 people who had the same disease but were not given any medicine 56 were cured within two days. Is there any significant indication that supports the claim of the effectiveness of the medicine? 14) A soft drink dispensing machine is said to be out of control if the variance of the contents exceeds 1.15 deciliters. If a random sample of 25 drinks from this machine has a variance of 2.03 deciliters, does this indicate at the 0.05 level that the machine is out of control? Assume that the contents are approximately normally distributed. 15) A study is conducted to compare the length of time between men and women to assemble a certain product. Past experience indicates that the distribution of times for both men and women is approximately normal but the variance of the times for women is less than that for men. A random sample for 11 men and 14 women produced the following results: men: n = 11; s = 6.5; women n = 14, s = 5.5. Test the hypothesis of equal variances against the alternative that the variance of the times for women is less than that for men. Decide yourself about a (sensible) level of significance. 16) A study compares the average time per day used for watching TV in the USA and in Austria. a) Use the histogram and the normal-quantile-quantile plot (r-command: qqnorm(x)) get an impression if the data are normally distributed. b) Make two box-whisker plots to compare the mean values. c) Compare the means using the t-test. d) Calculate the confidence interval for the difference of the means. Give a short interpretation of your result for steps a), b), c) d). TV Data Austria 2.2 3.1 2.1 1.4 2.6 1.8 1.6 2.5 1.6 2.2 2.5 1.1 2.2 2.5 1.9 2.1 3.2 1.6 2.3 2.9 2.0 2.0 2.8 2.4 2.2 2.0 2.7 2.0 1.4 2.7 2.9 2.0 2.8 1.7 2.1 2.7 2.8 2.8 2.5 2.5 2.0 1.9 1.8 1.9 1.7 2.9 3.0 3.2 1.8 1.7 2.6 2.3 2.9 3.0 2.6 1.7 1.8 2.8 2.0 1.7 2.3 2.8 1.3 3.0 2.7 2.9 2.2 0.7 2.4 2.7 3.0 1.9 1.7 1.7 2.3 2.9 1.8 1.3 2.8 2.8 2.0 2.4 2.8 2.2 3.2 3.3 3.8 1.9 1.4 2.5 1.5 2.8 2.5 3.3 3.0 2.5 2.3 1.7 2.2 3.0 2.2 2.4 2.3 2.4 1.7 2.3 2.0 2.5 2.4 1.9 3.0 2.1 1.0 2.2 2.5 2.8 2.0 2.5 2.2 2.2 2.2 2.3 1.9 2.7 2.3 2.7 2.4 1.6 2.5 2.9 2.1 3.0 1.5 3.2 1.9 2.2 1.8 1.7 1.9 2.3 1.2 2.4 1.0 2.5 3.2 3.2 2.9 2.1 1.9 3.3 3.6 1.8 2.4 2.7 1.5 1.3 2.2 2.5 2.5 2.2 3.1 0.2 1.5 2.4 2.2 1.9 2.3 2.8 2.4 2.0 1.9 1.7 1.9 2.4 3.2 2.6 2.2 2.3 2.5 1.5 2.8 1.9 2.5 2.5 2.9 1.9 2.6 2.2 1.8 2.4 2.0 2.4 TV DATA America 3.4 2.3 3.7 1.0 4.0 3.3 2.6 4.9 2.4 3.9 3.9 3.5 1 4.8 2.6 1.7 0.5 1.6 1.3 3.7 2.9 0.1 1.2 1.1 2.7 3.6 3.8 1 4.0 1.5 3.6 3.8 2.5 2.8 2.2 3.9 0.4 3.0 2.3 3.0 0.6 2.4 1.9 0.1 1.1 4.4 4.2 3.2 3.4 3.2 0.4 0.1 0.8 1.2 1.9 2.6 1.5 0.2 0.7 1.9 1.1 3.1 1 0.9 1.1 1.4 2.3 3.4 0.8 2.7 1.4 1.9 0.4 4.5 4.5 2.8 1.7 1.7 4.5 4.9 3.4 1.2 4.1 1.1 0.4 0.1 2.4 4.8 3.1 4.1 2.7 4.2 4.0 1.4 1.9 4.3 2.9 4.4 0.1 1.9 0.2 4.1 1.1 3.2 0.1 3.5 4.5 2.3 1.4 4.5 1.7 3.9 4.3 0.8 2.8 1.2 3.0 2.6 2.4 2.8 0.7 4.8 4.3 2.2 2.7 2.5 3.5 2.9 0.1 2.0 1.9 0.1 0.5 3.1 4.5 3.9 1.2 3.9 4.9 3.4 4.1 1 0.5 2.2 2.0 4.5 3.0 0.5 2.4 2.4 1.9 0.0 4.3 4.3 4.1 0.8 3.9 1.4 2.5 3.6 1.5 0.6 2.4 4.4 0.8 3.0 4.2 4.7 0.4 1.1 3.9 1.3 2.3 1 4.4 0.7 0.5 1.0 2.5 1.5 4.9 3.4 1 1.9 4.3 4.6 3.1 3.2 3.4 1.9 2.4 3.8 3.2 0.3 3.4 0.6 1.2 1.5 5.8 1.9 0.1 0.9 0.8 0.8 0.3 3.4 2.9 4.0 0.4 2.6 3.8 4.3 1.5 3.5 4.5 0.9 4.9 2.2 2.0 3.5 3.8 1.9 2.8 2.3 1.9 2.3 1.5 3.4 1.7 2.8 0.9 1 17) A random sample of 200 married men, all retired, were classified according to education and number of children. Test the hypothesis that the size of a family is independent of the level of education attained by the father (Assume α = 0.05). Number of childeren Education 0-1 2-3 over 3 Elementary 14 37 32 Secondary 19 42 17 College 12 17 13 18) In a study to estimate the proportion of grown-ups who regularly watch soap operas, it is found that 52 of 200 persons in Denver, 31 of 150 persons in Phoenix and 41 of 150 in Rochester watch at least one soap opera. Use a 0.05 level of significance to test the hypothesis that there is no difference among the true proportions of persons watching soap operas. 19) The following number of car-accidents per day are observed during a period of 106 days in a big city: 0 for 33 days; 1 for 45 days 2 for 20 days; 3 for 5 days; 4 for 2 days; 7 for 1 day Test if the number of car accidents follows a Poisson distribution. (Assume α = 0.1) 20) To build a simulation model for the customer flow in a bank office we have to model the service times at counter1 and counter2. We have collected 100 service times of counter 1 (generate them using rst1(100) for the r-function rst1<-function(n){p<-runif(n); 5.2*((p<0.5)*(16+3*rnorm(n))+(p>0.5)*rgamma(n,13))} ) and 1000 service times of counter 2 (generate them using rst2(1000) for the r-function rst2<-function(n){p<-runif(n); 2.9*((p<0.9)*(18+3*rnorm(n))+(p>0.9)*rgamma(n,16))} ) We assume that both service times are gamma distributed. Then we have to find estimates for the parameters of the gamma distribution that we can start with our simulation. a) Compute the moment estimates for the parameters of the waiting time for Server 1 and Server 2. b) Make the chi2 goodness of fit test for the gamma distribution to check if the gamma assumption is correct. Use 5 intervals for the smaller sample and 25 intervals for the large samples; choose the intervals such that they have (at least approximately) the same probability. (Hint: Do’nt forget to substract the number of estimated parameters from the number of degrees of freedom you use for the test-statistic!!!!) c) Interpret your result of b). What is the problem of the tests, especially for the first sample? Hint: It is easiest to make the chi-square test by transforming the data with the assumed CDF of the Gamma distribution. Then you can make the chi-square test for the U(0,1) uniform distribution and can easily divide into the sub-intervals. 21) We have two independent samples of size n=50 and n=500 from the same distribution (both generated by the R-function ran<-function(n){p<-runif(n); 12+3.3*((p<0.5)*(0.5+rnorm(n)/sqrt(12))+(p>0.5)*runif(n))}). We want to test if the unknown distribution could be the normal distribution. a) Make a histogram and the normal quantile-quantile plot for both samples and interpret its results. b) Make the Shapiro-Wilk test of normality for both samples (Shapiro.test). Can we reject the hypothesis that the samples comes from a normal distribution? (Assume α = 0.1) c) Why is it easily possible that we get different results for the two samples, although they come from the same distribution? d) Use simulation to find the Power of the Shapiro-Wilk test of normality for the “ran()” distribution for n= 20, 50, 100, 200, 500 . 22) We want to test the hypothesis H0: μ = 1 against the alternative H1: μ < 1. We know that the parent population is exponentially distributed and the sample size is n = 10. Clearly we should not use the standard t-test as the normal assumption is not fulfilled. So we can try to find the critical region by simulation. If the sample mean of a sample is smaller or equal to that value, H 0 is rejected. a) Compare your result of the critical sample mean with the result that we would obtain when using the t-test (with σ = 1). b) Is it different? Why? c) Make the same experiment with sample size n = 100. d) Is x-bar critical closer to the result of the t-test? Why? e) Use the critical region of a) to estimate the power of this test for μ = 0.5. For the PS on 28.12. All students: 3, 16, 20, 21, 22 Other exercises: Students with an odd student number should especially prepare the odd questions, students with an even student number especially the even questions. Of course for the final you are responsible for all questions.