Download STATISTICS 7 – SPRING 2008 Homework 4 Handed out: Friday

STATISTICS 7 – SPRING 2008 Homework 4 Handed out: Friday April 25, 2008 Due: Friday May 2, 2008 (by 5pm to Bren Hall 2216) Reading: Apr 25 – Apr 28 Apr 30 – May 2 May 5 – May 9 Probability and random variables (Ch 10, 12, 13, 3 - only pieces of 12) Sampling distributions (Ch 11) Introduction to inference (Ch 14-16) 1. Probability - blood types. (Based on 10.21-10.24 on page 264.) The table below lists the possible blood types and the proportion of the U.S. population with each blood type. Blood type Probability O 0.45 A 0.40 B 0.11 AB ? (a) Suppose we pick a person at random from the U.S. population and determine their blood type. Describe the sample space for this random phenomenon. (b) For this to be a valid probability assignment what value must the missing probability take? Explain. (c) If you have type B blood then you can safely receive blood from people with blood types O or B. What is the probability that a randomly chosen person will be an acceptable donor? (d) What is the probability that a randomly chosen person does not have type O blood. 2. Probability and random variables - car ownership. (Based on 10.27 on page 264.) (a) Choose an American household at random and ask them whether they own any cars or not. Identify the sample space for this random phenomenon. (b) Choose an American household at random and ask them to list the types of cars (i.e., manufacturers) they own. In this case the sample space is really quite large because both the number of cars and the types of cars can vary; give two possible elements of the sample space. (c) Choose an American household at random and let the random variable X be the number of cars they own. Identify the sample space for X. (d) The table below gives the probability distribution for X, the random variable described in part (c) (ignoring households with more than 5 cars). Identify the missing probability. Number of cars X Probability 0 0.09 1 0.36 2 ? 3 0.13 4 0.05 5 0.02 (e) What is the probability that a randomly chosen household has two or more cars? (f) What is the probability that a randomly chosen household has one or two cars? 3. Independence - telemarketing. Do problem 12.29 on page 319 but first answer the following two questions. They will help with 12.29(a) and 12.29(b). (i) A telemarketer places 2 calls. What is the probability that both reach a person. (ii) A telemarketer places 1 call. What is the probability that it does not reach a person. 4. Binomial random variables - stock market. Do problem 13.26 on Page 340. (See problem 13.25 if you are having trouble.) 5. Conditional probability - drug testing and false positives. Conditional probability is covered in Chapter 12 but we did not discuss it in detail during lecture. This homework problem provides a quick and hopefully interesting demonstration of the topic. Conditional probability is relevant when the probability of an event depends on the amount of information we have (and it almost always does!). For one quick example, notice that our current guess for the probability of rain on June 1 would be quite low (about .01), but this would change though if learn that it rains on May 31 (since storms tend to stick around). Now for our problem. Suppose that we decide to carry out drug tests on all employees of a major corporation. It seems reasonable to assume that without any other information the probability that an employee is a drug user is small, say .005 (half of one percent). We assume our drug test is very effective - if you are taking drugs then the test will correctly detect the drugs with probability .99 (no test is perfect!). If you are not taking the drugs then the test will correctly come back negative with probability .99 (again no test is perfect). Fred works at the company and his test comes back positive. Oh no! Fred continues to proclaim his innocence. What is the probability that Fred is a drug user? Let’s answer this question as follows: (a) The company has 20,000 employees. If our assumptions are correct how many of these are drug users? How many are not drug users? (b) How many of the drug users do we expect to return a positive test? How many do we expect to return a negative test? (Use the information given about the accuracy of the test.) (c) Repeat part (b) but now for the non-drug users. (d) In total how many positive tests do we expect at this company? (e) What fraction of the positive tests correspond to drug users? This last fraction is known as the conditional probability of being a drug user given that the test is positive. NOTE: The original probability (with no additional information) that an employee is a drug user is .005 (by our assumption). You may be surprised that the conditional probability in (e) is so low!! Didn’t you think it would be close to .99!! This is the danger with wide-spread screening when a condition is rare. The false positives can outnumber the true positives. 6. Stata analysis – simple random samples This quick Stata assignment demonstrates how random sampling works and also serves to introduce the idea of a sampling distribution. The data set iowafarm.dta contains the mean value of farmland in each of Iowa’s 99 counties. In this problem we will take a random sample of counties and use the mean from the sample to estimate the overall state-wide mean value of framland. The variables are the CountyID (a number), the County name, and the mean value of farmland in the county (called Farmval). Stata hints: This assignment requires only a few Stata commands. When you sample in Stata it actually deletes the “unsampled” cases (and therefore leaves you with ONLY the sample). After studying the sample we want to go back to the full data set. This is possible by typing preserve before sampling (this saves a copy of the original data set) and then typing restore after we are done with the sample (this calls back the original data set). (Interesting anecdote: While preparing this HW I typed in all the data and then took a sample without preserving the data first. You guessed it, I lost all the data and had to type it all in again!) (a) Download the data from the class homepage. Open the data in Stata. Carry out the following commands preserve sample 5, count summarize Farmval restore This takes a sample of 5 observations (note that without count it would take a 5% sample) and the finds the mean and s.d. of the sample. Record the mean of the sample. (An alternative to the sample command is to use the Statistics→Resampling→Draw random sample menu item.) (b) Repeat part (a) four additional times. When done you should have five means, with each coming from a different sample of five counties. (HINT: In Stata you don’t have to keep typing in the commands; you can just click on an old command in the “command” window and it will appear as a new command. Also, it is possible to create a “do” file (or macro) that contains these 4 lines and then run the “do” file over and over. If you are interested in this approach type help doedit.) (c) Now repeat part (a) to draw a sample of size 20: preserve sample 20, count summarize Farmval restore Record the mean of the sample. (d) Repeat part (c) four additional times. When done you should have five means, with each coming from a different sample of twenty counties. (e) The actual mean of the 99 counties is 1212. Examine the means from the samples of size five. Are they close to the true population mean (1212)? Now examine the means from the samples of size twenty. Are they close to the true population mean (1212)? Comment on the difference between the samples of size 5 and those of size 20.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download STATISTICS 7 – SPRING 2008 Homework 4 Handed out: Friday