Download STATISTICS 7 – SPRING 2008 Homework 4 Handed out: Friday

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
STATISTICS 7 – SPRING 2008
Homework 4
Handed out: Friday April 25, 2008
Due: Friday May 2, 2008 (by 5pm to Bren Hall 2216)
Reading:
Apr 25 – Apr 28
Apr 30 – May 2
May 5 – May 9
Probability and random variables (Ch 10, 12, 13, 3 - only pieces of 12)
Sampling distributions (Ch 11)
Introduction to inference (Ch 14-16)
1. Probability - blood types. (Based on 10.21-10.24 on page 264.) The table below lists the possible blood
types and the proportion of the U.S. population with each blood type.
Blood type
Probability
O
0.45
A
0.40
B
0.11
AB
?
(a) Suppose we pick a person at random from the U.S. population and determine their blood type. Describe
the sample space for this random phenomenon.
(b) For this to be a valid probability assignment what value must the missing probability take? Explain.
(c) If you have type B blood then you can safely receive blood from people with blood types O or B. What
is the probability that a randomly chosen person will be an acceptable donor?
(d) What is the probability that a randomly chosen person does not have type O blood.
2. Probability and random variables - car ownership. (Based on 10.27 on page 264.)
(a) Choose an American household at random and ask them whether they own any cars or not. Identify the
sample space for this random phenomenon.
(b) Choose an American household at random and ask them to list the types of cars (i.e., manufacturers) they
own. In this case the sample space is really quite large because both the number of cars and the types of
cars can vary; give two possible elements of the sample space.
(c) Choose an American household at random and let the random variable X be the number of cars they own.
Identify the sample space for X.
(d) The table below gives the probability distribution for X, the random variable described in part (c) (ignoring
households with more than 5 cars). Identify the missing probability.
Number of cars X
Probability
0
0.09
1
0.36
2
?
3
0.13
4
0.05
5
0.02
(e) What is the probability that a randomly chosen household has two or more cars?
(f) What is the probability that a randomly chosen household has one or two cars?
3. Independence - telemarketing. Do problem 12.29 on page 319 but first answer the following two questions.
They will help with 12.29(a) and 12.29(b).
(i) A telemarketer places 2 calls. What is the probability that both reach a person.
(ii) A telemarketer places 1 call. What is the probability that it does not reach a person.
4. Binomial random variables - stock market. Do problem 13.26 on Page 340. (See problem 13.25 if you
are having trouble.)
5. Conditional probability - drug testing and false positives. Conditional probability is covered in Chapter
12 but we did not discuss it in detail during lecture. This homework problem provides a quick and hopefully
interesting demonstration of the topic. Conditional probability is relevant when the probability of an event
depends on the amount of information we have (and it almost always does!). For one quick example, notice
that our current guess for the probability of rain on June 1 would be quite low (about .01), but this would
change though if learn that it rains on May 31 (since storms tend to stick around).
Now for our problem. Suppose that we decide to carry out drug tests on all employees of a major corporation.
It seems reasonable to assume that without any other information the probability that an employee is a drug
user is small, say .005 (half of one percent). We assume our drug test is very effective - if you are taking drugs
then the test will correctly detect the drugs with probability .99 (no test is perfect!). If you are not taking the
drugs then the test will correctly come back negative with probability .99 (again no test is perfect). Fred works
at the company and his test comes back positive. Oh no! Fred continues to proclaim his innocence. What is
the probability that Fred is a drug user? Let’s answer this question as follows:
(a) The company has 20,000 employees. If our assumptions are correct how many of these are drug users?
How many are not drug users?
(b) How many of the drug users do we expect to return a positive test? How many do we expect to return a
negative test? (Use the information given about the accuracy of the test.)
(c) Repeat part (b) but now for the non-drug users.
(d) In total how many positive tests do we expect at this company?
(e) What fraction of the positive tests correspond to drug users? This last fraction is known as the conditional
probability of being a drug user given that the test is positive.
NOTE: The original probability (with no additional information) that an employee is a drug user is .005 (by our
assumption). You may be surprised that the conditional probability in (e) is so low!! Didn’t you think it
would be close to .99!! This is the danger with wide-spread screening when a condition is rare. The false
positives can outnumber the true positives.
6. Stata analysis – simple random samples This quick Stata assignment demonstrates how random sampling
works and also serves to introduce the idea of a sampling distribution. The data set iowafarm.dta contains the
mean value of farmland in each of Iowa’s 99 counties. In this problem we will take a random sample of counties
and use the mean from the sample to estimate the overall state-wide mean value of framland. The variables are
the CountyID (a number), the County name, and the mean value of farmland in the county (called Farmval).
Stata hints: This assignment requires only a few Stata commands. When you sample in Stata it actually deletes
the “unsampled” cases (and therefore leaves you with ONLY the sample). After studying the sample we want
to go back to the full data set. This is possible by typing preserve before sampling (this saves a copy of the
original data set) and then typing restore after we are done with the sample (this calls back the original data
set). (Interesting anecdote: While preparing this HW I typed in all the data and then took a sample without
preserving the data first. You guessed it, I lost all the data and had to type it all in again!)
(a) Download the data from the class homepage. Open the data in Stata. Carry out the following commands
preserve
sample 5, count
summarize Farmval
restore
This takes a sample of 5 observations (note that without count it would take a 5% sample) and the finds
the mean and s.d. of the sample. Record the mean of the sample. (An alternative to the sample command
is to use the Statistics→Resampling→Draw random sample menu item.)
(b) Repeat part (a) four additional times. When done you should have five means, with each coming from a
different sample of five counties. (HINT: In Stata you don’t have to keep typing in the commands; you
can just click on an old command in the “command” window and it will appear as a new command. Also,
it is possible to create a “do” file (or macro) that contains these 4 lines and then run the “do” file over
and over. If you are interested in this approach type help doedit.)
(c) Now repeat part (a) to draw a sample of size 20:
preserve
sample 20, count
summarize Farmval
restore
Record the mean of the sample.
(d) Repeat part (c) four additional times. When done you should have five means, with each coming from a
different sample of twenty counties.
(e) The actual mean of the 99 counties is 1212. Examine the means from the samples of size five. Are they
close to the true population mean (1212)? Now examine the means from the samples of size twenty. Are
they close to the true population mean (1212)? Comment on the difference between the samples of size 5
and those of size 20.