* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Homework Problems 1. (5pts) What is the level of measurement
History of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Central limit theorem wikipedia , lookup
Misuse of statistics wikipedia , lookup
Taylor's law wikipedia , lookup
German tank problem wikipedia , lookup
Homework Problems 1. (5pts) What is the level of measurement (nominal, ordinal, interval or ratio) for each of the following variables? (a) The number of cups of coffee sold at Starbucks each Sunday during 2008. (b) The courses offered by the department of statistics at NCCU (National Chengchi University) during 2006, such as statistics, calculus, probability, etc. (c) The monthly average temperatures at Taipei over the past 21 years. (d) The ranking of NCCU in 2009 according to the QS World University Rankings. 2. (5pts) Suppose that we have a sample of scores with sample mean 80 and sample standard deviation 1. At least what percentage of the scores are between 76 and 84? 3. (5pts) Suppose that we have a sample of scores with sample mean 80 and sample standard deviation 1. Find a range that covers at least 70% of the scores using Chebyshev’s theorem. 4. (5pts) Suppose that we made a list of 8 singers and asked a group of 6000 people about their favorite singers in our list. As a result, we have a sample of 6000 numbers, where each number is one of 1, 2, ..., 8, corresponding to one of the 8 singers. Which of the following ways for summarizing this sample are meaningful? (a) Reporting the sample mean. (b) Reporting the sample variance. (c) Reporting the sample median. (d) Reporting the mode of the sample. (e) Reporting the frequencies of 1’s, 2’s, ..., and 8’s in the sample. 5. (5pts) For a sample of size 1000 with minimum 0, maximum 1 and sample standard deviation 0.3, determine the number of classes for drawing a histogram using Scott’s rule. 6. (5pts) During the past 34 years, Virgin Atlantic Airways experienced no air accident in 23 years. Estimate the probability that Virgin Atlantic Airways experiences no air accident in one year. Which concept of probability (classical or empirical probability) is used to make the estimate? 7. (5pts) Suppose that we have a box of 1000 items, and 5 of them are defective. Suppose that we randomly select two items one at a time without replacement (選取二個物件時, 一次取一個, 取出不放回). Let A be the event that the first item is defective and B be the event that the second item is defective. Find P (A|B) and P (A). Are A and B independent? Hint: use Bayes theorem to find P (A|B). 8. (5pts) Determine whether X is a discrete random variable in each of the following cases. (a) X is the total number of cars sold in Taiwan in the next month. 1 (b) X is the average weight of all the babies born in Taiwan next year. (c) X is the amount of time you need to wait for service when you visit the post office the next time. 9. (5pts) Suppose that we have a box of 1000 items, and 5 of them are defective. Suppose that we randomly select two items one at a time without replacement . For i = 1, 2, let 1 if the i-th selected item is defective; Xi = 0 if the i-th selected item is not defective. (a) Are X1 and X2 independent? (b) Do X1 and X2 have the same PMF? 10. (5pts) Suppose that we have a box of 1000 items, and 5 of them are defective. Suppose that we randomly select two items one at a time with replacement. For i = 1, 2, let 1 if the i-th selected item is defective; Xi = 0 if the i-th selected item is not defective. Find the PMF for X1 and the PMF for X2 . 11. (5pts) Suppose that X is a discrete random variable with PMF pX , where 0.2 if x = 0; 0.5 if x = 2; pX (x) = 0.3 if x = 4; 0 otherwise. Find P (1.5 < X < 4.5). 12. (5pts) Suppose that X is a discrete random variable with PMF pX , where 0.1 if x = −1; 0.3 if x = 1; pX (x) = 0.6 if x = 2; 0 otherwise. Find the mean and variance of X. 13. (5pts) Consider the X in Problem 12. Let Y = X 2 . (a) Find the PMF of Y . (b) Find the mean of Y . Remark. Compare the E(X 2 ) obtained from Part (b) with the V ar(X) from Problem 12. You should be able to see that E(X 2 ) = V ar(X) + (E(X))2 . 14. (5pts) For a random variable X whose E(X) exists, we have E(a + bX) = a + bE(X) for all constants a, b. (1) Using (1) to show that V ar(X + a) = V ar(X) and V ar(bX) = b2 V ar(X) for all constants a, b, assuming E(X 2 ) and E(X) exist. 2 • Remark. This result can be expressed more briefly as V ar(a + bX) = b2 V ar(X). 15. (5pts) Suppose that X and Y are two random variables with the same PMF, and P (X = 1) = p = 1 − P (X = 0), where 0 < p < 1. Show that if P (Y = 1|X = 0) 6= P (Y = 1|X = 1), then X and Y are not independent. 16. (5pts) Suppose that X is a random variable with mean 80 and variance 1. Find a range A such that P (X ∈ A) ≥ 0.7 using Chebyshev’s theorem. 17. (5pts) Suppose that X1 , . . ., Xn are IID random variables with E(X1 ) = µ and V ar(X1 ) = σ 2 . Let X̄ = (X1 + · · · + Xn )/n. Show that ! n X E (Xi − X̄)2 = (n − 1)σ 2 . i=1 Pn Pn • Hint: express i=1 (Xi − X̄)2 using i=1 Xi2 and (X̄)2 , and then compute the expectations of the two terms using the fact that E(X 2 ) = V ar(X) + (EX)2 and Equations (6) and (7) in the handout “Mean, variance and standard deviation”. • Remark. This result shows that the expectation of the sample variance of a random sample is the variance of the population distribution. 18. (5pts) Ms Li is the manager of the human resource department of a company. Based on her experience, she estimates that the probability that a new employee will quit in one year is 0.025. The company hired 40 people last month. What is the probability that 2 of the 40 new employees will quit in one year? 19. (5pts) Suppose that a class of 15 students are given 4 movie tickets. To be fair, the students would like to choose 4 ticket winners among themselves randomly. Suppose that 10 of the 15 students are males and the rest are females. Find the probability that exactly 3 of 4 ticket winners are males. 20. (5pts) Suppose that Ms Yu goes fishing every weekend, and she catches 3 fishes in 2 hours on average. Suppose that she plans to spend 2 hours on fishing this weekend. Let X be the number of fishes that she will catch this weekend. Propose a distribution for X and find the probability that X ≥ 1 using the proposed distribution. 21. (5pts) In Problem 18, find an approximate value for the probability that 2 of the 40 new employees will quit in one year using a Poisson distribution. You may use R or other devices such as a calculator to evaluate e−µ for a given µ. 22. (5pts) Show that ∞ X x(x − 1)e−µ x=0 using the fact that ∞ X e−µ x=0 µx = µ2 x! µx = 1. x! Remark. This result shows that if X ∼ P oisson(µ), then E(X(X − 1)) = µ2 . 3 23. (5pts) Suppose that X has a PDF fX , where x if 0 ≤ x < 1; 1 if 1 ≤ x ≤ 1.5; fX (x) = 0 otherwise. Find the following probabilities. (a) P (0 < X < 1). (b) P (0 < X < 1.1). (c) P (X > 1.5). (d) P (X < 0). (e) P (X = 0.5). 24. (5pts) In Problem 23, we have fX (1) > fX (0.5). Does this imply that P (X = 1) > P (X = 0.5)? Justify your answer. 25. (5pts) For the X and fX in Problem 23, derive a formula for P (1 − h < X < 1 + h) P (0.5 − h < X < 0.5 + h) for 0 < h < 0.5. Find an h in (0, 0.5) such that P (1 − h < X < 1 + h) fX (1) − P (0.5 − h < X < 0.5 + h) fX (0.5) < 0.01. Hint: choose a small h. 26. (5pts) Suppose that X ∼ U (1, 3). Find the following quantities. (a) P (X < 1). (b) P (X = 2). (c) P (2 < X < 4). (d) E(X). (e) V ar(X). You may use the following R outputs to solve this problem. • Running h <- function(x){ x/2 } integrate(h, 1, 3) gives the R output 2 with absolute error < 2.2e-14 • Running h <- function(x){ x^2/2 } integrate(h, 1, 3) gives the R output 4.333333 with absolute error < 4.8e-14 27. (5pts) Suppose that X ∼ N (0, 1). Find the following probabilities using the table “Normal probabilities” (available at the course web site). 4 (a) P (−0.5 < X < 1.51). (b) P (0.5 < X < 1.51). (c) P (X > 1.51). (d) P (X > −0.5). (e) P (X < 0.5). 28. (5pts) Suppose that X ∼ N (1, 32 ). Find P (2.5 < X < 5.53). 29. (5pts) Suppose that X is random variable with mean 1 and standard deviation 2, and Y = 1 − 2X. (a) Find the mean and variance of Y . (b) Give two values a and b such that P (a ≤ Y ≤ b) ≥ 0.75 using Chebyshev’s Theorem. (c) Suppose that the distribution of X is normal. For the a and b found in Part (b), what is P (a ≤ Y ≤ b)? 30. (5pts) Suppose that Ms Yu goes fishing every weekend, and she catches 4 fishes in 2 hours on average. Suppose that the number of fishes she catches in the first t hours forms a Poisson process. Find the probability that the next time when Ms Yu goes fishing, she does not catch any fish in the first 0.5 hour and catches one fish in the first hour. 31. (5pts) In Problem 30, let X be the waiting time till the first fish caught and let Y be the numbers of fishes caught in the first 0.5 hour. (a) What is the distribution of X? Find P (X > 0.5). (b) What is the distribution of Y ? Find P (Y = 0). 32. (5pts) Suppose that a productive line produces pens with defective rate 0.02. Let X be the number of defective pens in the next 1000 pens made from the productive line. Approximate P (X ≤ 22) using a normal probability. 33. (5pts) Consider the population of rents for one-bedroom apartments near NCCU. Suppose that the population distribution can be approximated well by a normal distribution with standard deviation of NT$4,000 per month. Suppose that we have a random sample of 30 rents from the population, and the sample mean is NT$1,1000 per month. Find an approximate 95% confidence interval for the population mean rent. 34. (5pts) Consider the population of rents for one-bedroom apartments near NCCU. Suppose that the population distribution can be approximated well by a normal distribution with unknown mean and variance. Suppose that we have a random sample of 30 rents from the population, and the sample mean and sample standard deviation are NT$11,000 per month and NT$4,000 per month. Find an approximate 90% confidence interval for the population mean rent. 35. (5pts) A mobile phone manufacturer would like to estimate the defective rate for one of their new products. Suppose that a random sample of 40 items are taken and 1 of them are defective. Construct an approximate 95% confidence interval for the defective rate. 5 36. (5pts) Consider the population of rents for one-bedroom apartments near NCCU. Suppose that the population distribution can be approximated well by a normal distribution with standard deviation of NT$4,000 per month. Suppose that we want to construct an approximate 90% confidence interval for the population mean rent with maximum allowable margin of error NT$4,000 per month. What is the required sample size? 37. (5pts) A mobile phone manufacturer would like to construct a 90% approximate confidence interval for the defective rate for one of their new products with maximum allowable margin of error 0.005. What is the required sample size? 38. (5pts) A new drug has been developed to lower systolic blood pressure. Suppose that after using the new drug, a typical patient’s systolic blood pressure is lowered by µ (mmHg) on average. Suppose that after using a conventional drug, a typical patient’s systolic blood pressure is lowered by µ0 (mmHg) on average. We would like to know whether the new drug is more effective than the conventional drug. Suppose that we want to control the probability of falsely claiming the new drug is more effective than the conventional drug. Which statement should be the null hypothesis, µ ≥ µ0 or µ < µ0 ? 39. (5pts) Consider Problem 36. Suppose that we have a random sample of 30 rents from the population, and the sample mean is NT$11,000 per month. (a) Can we conclude that the population mean rent is more than NT$10,000 per month at the 0.01 significance level? (b) Can we conclude that the population mean rent is less than NT$18,000 per month at the 0.05 significance level? (c) Can we conclude that the population mean rent is different from NT$14,000 per month at the 0.01 significance level? 40. (5pts) Consider Problem 34. (a) Can we conclude that the population mean rent is more than NT$10,000 per month at the 0.01 significance level? (b) Can we conclude that the population mean rent is less than NT$18,000 per month at the 0.05 significance level? (c) Can we conclude that the population mean rent is different from NT$14,000 per month at the 0.01 significance level? 41. (5pts) Consider Problem 36. Suppose that we have a random sample of 30 rents from the population, and the sample mean is NT$11,000 per month. For each a in {0.02, 0.04, 0.06, 0.08, 0.1}, can we conclude that the population mean rent is more than NT$10,000 per month at level a? Note that you only need to compute one number to solve this problem. 42. (5pts) Consider Problem 36. Suppose that we have a random sample of 30 rents from the population, and the sample mean is NT$11,000 per month. (a) How confident can we be to conclude that the population mean rent is less than NT$18,000 per month? (b) How confident can we be to conclude that the population mean rent is different from NT$14,000 per month? 6 Note: we can be very confident to conclude H1 if we have very strong evidence against H0 . 43. (5pts) Suppose that (X1 , . . . , Xn ) is a random sample from N (µ, σ 2 ), where σ is known. Suppose that 0 < p < a < 1. Show that σ σ X̄ − za−p √ , X̄ + zp √ n n is a (1 − a) confidence interval for µ. 44. The R output after running qnorm(0.915);qnorm(0.925);qnorm(0.935);qnorm(0.945);qnorm(0.950) is [1] [1] [1] [1] [1] 1.372204 1.439531 1.514102 1.598193 1.644854 and the R output after running qnorm(0.955);qnorm(0.965);qnorm(0.975);qnorm(0.985) is [1] [1] [1] [1] 1.695398 1.811911 1.959964 2.170090 Recall that running qnorm(1-p) in R gives zp , so the above output gives zp ’s for different p’s. Suppose that (X1 , . . . , Xn ) is a random sample from N (µ, σ 2 ), where σ is known. Give five different 90% confidence intervals for µ using the above zp ’s and the result in Problem 43. Which one of the five confidence intervals has the shortest length? 45. (5pts) A mobile phone manufacturer would like to learn about the defective rate for one of their new products. Suppose that a sample of 1000 is taken and 5 out of the 1000 items are defective. (a) Can we conclude that the defective rate is more than 0.01 at the 0.05 significance level? (b) Can we conclude that the defective rate is less than 0.01 at the 0.05 significance level? (c) Can we conclude that the defective rate is different from 0.006 at the 0.01 significance level? 46. (5pts) Suppose that (X1 , . . . , Xn ) is a random sample from N (µ, 22 ), where n = 100. Consider the testing problem H0 : µ ≤ 0 v.s. H1 : µ > 0 7 and a test that rejects H0 at level 0.05 if √ nX̄/2 > z0.05 . Let p(µ) be the probability that the test rejects H0 . Determine the order of p(−0.2), p(−0.1), p(0), p(0.1) and p(0.2) and express your answer in the form p(a1 ) < p(a2 ) < p(a3 ) < p(a4 ) < p(a5 ), where {a1 , . . . , a5 } = {−0.2, −0.1, 0, 0.1, 0.2}. You can justify your conclusion by computing the probabilities directly or provide a proof without computing the probabilities (請將p(−0.2), p(−0.1), p(0), p(0.1) and p(0.2)依大小排列, 由小排到大. 你可以直接算出這五個值再排序, 也可以不算 出值, 但提出一個證明來推導這五個值的順序). 47. (5pts) Consider Problem 34. Can we conclude that the standard deviation for the rent population distribution is greater than NT$3,200 at levels 0.01, 0.05 and 0.1 respectively? Use the table “Quantiles for χ2 distributions” on the course web site to find the ka,df values. 8