Download Chapters 7

Document related concepts

Randomness wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

German tank problem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Math 3307
Lecture Notes
Perkowsky text
May’13
Chapters 7 – 8
Homework Assignments
10 points each problem ( 140 points total)
Chapter 7
2, 4, 6, 12, 14, 16, 20
Chapter 8
2, 4, 6, 8, 10, 14, 16
Homework style sheet and rules:
Work on one side only; pdf it and upload it before the deadline on the calendar.
Work that is poorly scanned or illegible will be given a zero.
This includes sideways or upside down scans!
Do NOT crowd the work, leave at least 3” between problems.
Label the answers carefully so the grader can grade efficiently.
1
Chapter 7 – Random Variables and Probability Distributions
7.1
What is a Random Variable?
Technically:
A quantitative variable whose value is determined by the outcome of a chance
experiment.
The book’s example of a free throw is excellent! It starts on page 183…
Let’s discuss the Classroom Exploration on page 185 together
Why 6 blue/4 red…why not 5/5
And let’s check out the probability table! Would you have thought of a grid?
Which type of learner is likely to appreciate which presentation?
Can we turn it into a tree diagram?
NOTE TI simulation
page 185 This is VERY useful for making up
worksheets, quizzes, and tests.
2
Now let’s turn the information into the most abstract representation of all:
Discrete Random Variable: a finite number or a countable number of outcomes.
Continuous Random Variable: inifinitely many variables, situated on a
numberline with no gaps or interruptions.
Which of the following are discrete? Continuous?
The number of eggs received by the shipping department at the local Krogers
on a given day.
The number of people marching in the Fourth of July parade downtown
Houston.
The measure of voltage for a smoke detector in your kitchen.
The temperature in Houston.
The exact playing time for a given baseball game.
The number of actors in a randomly selected movie.
The weight of a randomly selected human.
3
Probability Distribution Table
page 186
Probability Density Table?
Review the rules, page 130!
Problem 1
In a drug study, there is a control group and a group of people not taking the drug.
The drug is to help you have girls for children. These are VERY large groups.
Here is a table for the control group. Is it a Probability Distribution Table?
X
P(X)
0
.125
1
.375
2
.375
3
.125
4
What is X? Are all possibilities covered? How did they get those numbers? Do
they add up right?
Check out the example on your own (top page 187)
Let’s do the Focus on Understanding together. Read page 188 and we’ll discuss the
3 questions in the box on page 189
5
Fair or Unfair?
Page 188
Let’s do this problem and save the notes. We’ll come back to it in a bit.
Get in pairs and play the game using your TI as your simulator. Get two random
numbers per turn and follow the rules.
Now let’s make a probability distribution table for the game!
6
7.2
The Mean of a Random Variable
When we discuss the measure of center for a random variable, we’ll call it the
expected value (E(x)). It’s like a mean but in a different context. It is a long term
average value.
Let’s review the grid on page 186:
100 outcomes:
40
24
36
zero’s
one’s
two’s
If you’re Nicky’s coach you’ll have an expectation of how she’ll do in the free
throw situation…this expectation is the “mean”.
Of course there’s a formula!
(see the box, page 190)
Multiply each outcome value by the probability that it may happen.
Add.
0(40%)+1(24%)+2(36%) = 96%
this is the expected value for a trip to the line
Now let’s review:
What if Nicky were an 80% shooter?
Focus on Understanding
page 191
7
EV problem 1
Years ago, members of organized crime groups ran numbers games. Now such
games are legalized. New Jersey’s Pick 3 game works this way:
Bet 50 cents and select a three-digit number between 000 and 999. If your 3 digits
match the numbers drawn, you win $275.
Prob (winning):
Amount won:
Prob (losing):
Amount lost:
Is this a fair game?
8
EV Problem 2
The CAN Insurance Company charges Mike $250 for a one-year $100,000 life
insurance policy. Because Mike is a 21 year old male, there is a 0,9985 probability
that he’ll live for that year.
What are the outcomes and their probabilities?
What are the financial outcomes for each probability?
What is the expected value?
9
The Prime Number Multiplication Game
page 191
Teams!
10
Hard choices!
Page 192
Teams!
Try setting it up as a 6 x 6 table or grid!
HINT:
11
7.3
Variance and Standard Deviation
Measures of spread and variability – we have them in this context, too!
Let’s look at this from a vocabulary standpoint:
Deviation (from the mean):
(x - )
Squared deviation:
(x - )2
Let’s sketch this:
Note that the further from the mean a point is, the bigger the squared deviation!
Now let’s look again at Tasa’s possible earnings for mowing the grass again.
See page 193
See page 195 for the formula for variance (remember standard deviation is the
square root of variance!) Let’s decode it!
Calculate the squared deviation, multiply each times it’s probability…add them up.
See page 194, bottom, for Tasa’s variance.
See page 195 for an alternate version of the formula.
12
For Option 2B, let’s check out the top of page 196 and do some comparisons.
Now let’s move into Standard Deviation.
What is standard deviation? Let’s go around the room and discuss what it is!
What is the z-score? Again, around the room…
13
Let’s check out using the TI for finding these numbers: page 198 – 199
We’ll do this with the lawn mowing data!
And for the sum of two cubes data from the 7.1 experiment.
14
7.4
Binomial Random Variables
Often we have a situation with repeated identical trials. Tossing a free throw (it
goes in or it doesn’t), tossing a coin (heads/tails), landing a plane (ok/crash), having
a baby (boy/girl), taking a T/F test.
These trials need to be independent of one another! If they are, then we may
multiply the individual probabilities for the outcomes.
Let’s analyze the standard 2 child family:
The 3 child family:
15
Note that we will be using COMBINATIONS when we count outcomes:
Let’s look at the 3 child family again:
BBB
GGG
2B1G
1B2G
The combination of 3 kids taken 2 at a time:
3
 3  3!
C 2   
3
 2  2!1!
P(3girls) = 3/8
Summary of Binomial Experiments/Probability: page 202 and 203
Let’s review it carefully!
16
The complement rule:
Application of it: page 204 middle, gray box
Are there mean, variance, and standard deviation? Bet your grade on it.
Summary page 205, bottom, box
17
TI – let’s learn how to do this efficiently: page 206
18
Now we know it’s binomial: check those possibilities again with the formula.
In a drug study, there is a control group and a group of people not taking the drug.
The drug is to help you have girls for children. These are VERY large groups.
Here is a table for the control group. Is it a Probability Distribution Table?
X
P(X)
0
.125
1
.375
2
.375
3
.125
What is the Expected Value? Mean? Standard Deviation?
What does the histogram look like?
19
Which of the following are binomial experiments?
Surveying 1000 people and asking them to rate the president on a scale of 1 – 5
Rolling a fair die 50 times
Having kids
Determining whether 12,000 pacemakers are defective or not, one by one
Guessing on a T/F test
Guessing on a test with 5 answer choices per question
Compute the following binomial probabilities:
A.
n = 2, x = 0, p = .01
B.
n = 10, x = 4, p = .95
C.
n = 7, x = 2, p = .35
D.
n = 6, x = 4, p = .16
20
BP Problem 1
Bob is a self-proclaimed mentalist who claims he can read minds. To test this, he is
given 14 T/F questions.
A.
He gets 8 of them right. What is the expected value and is this unusual?
B.
He gets 11 of them right. What is the expected value and is this unusual?
C.
He gets 2 of the right? EV is? Is this unusual?
21
BP Problem 2
There is a 0.723 probability that an airplane will land on time at Hobby. Discuss
whether that result would be considered unusual or normal.
A
Find the probability that at least 5 out of 6 airplanes arrive on time in a given
period of time.
B
Find the probability that at most 2 airplanes arrive on time in a given period
of time.
C
Find that probability that exactly 3 airplanes land on time in a given period of
time.
22
BP Problem 3
Internal surveys show that directory assistance providers give the wrong number
15% of the time. Assume you are testing a provider by making 10 requests.
Assume further that this is a very average company and gives wrong answers 15%
of the time.
Find the probability of getting one wrong answer. Is this unusual?
Find the probability of getting at most one wrong answer. Is this unusual?
Is the probability really 15% for this company?
23
BP Problem 4
A study was conducted to determine whether there were significant differences
between medical students admitted through special programs and medical students
admitted through the regular admissions criteria. It is claimed that the graduation
rate for the students admitted through the special programs is 94%.
If 10 students from the special programs are randomly selected, find the probability
that at least 9 of them graduated.
Would it be unusual to randomly select 10 and find that 7 graduated? Why or why
not?
24
7.5
The Normal Curve
The standard normal curve
the bell curve
symmetric, mound shaped,
continuous
Let’s discuss continuous versus discrete
For the standard normal curve the mean is zero and the standard deviation is 1.
It is symmetric about z = 0…not x? why not?
Probabilities correspond to area under the curve.
Let’s review the Empirical Rule (p. 71) right now with a picture:
25
Now let’s look at the standard normal probability table.
Given a z-score of 1.28, what is the probability that a measurement is at or below
this value?
page 210
Now for using the chart with “greater than or equal to”…a version of the
complement rule!
Or between two measurements!
26
Using the table in reverse: from a probability to a z-score:
Page 212
In reality, MOST normal curves are NOT standard! How do we rescale to make
use of our standard normal chart? With z-scores! All normal curves are
proportional and we use the z-score calculation to make them “fit” the table.
Page 213
27
Focus on Understanding:
page 215
Using the TI to do this, chart-free! Pages 216 – 218
28
From another source – TI83 instructions for
Areas between two bounds:
2nd VARS
[2: normal cdf(left z score, right z score)]
Normal Distributions:
The Precision Scientific Instrument Company manufactures thermometers. To
check the accuracy, they test the thermometers in freezing water and make sure it
registers 0 degrees F. Of course some are high and some are low. Assume there is
a standard deviation of 1 degree F. Find the area and show it on a standard normal
curve!
What is the probability that the reading is less than 1.58°?
You should get 94.29%
What is the probability that the reading is above −1.23°?
You should get .8907
29
What is the probability that the reading is between −2° and 1.5°?
You should get 91.04%
Working backwards in the chart:
Find the temperature associated with the 95th percentile. z = 1.645
How does this work?
Find the temperatures separating the bottom 2.5% and the top 2.5%
These are called tolerances. (−1.96 and 1.96 for z’s).
How does this work?
30
Fill in the blanks:
About _________% of the area is within 1 standard deviation of the mean
About _________% of the area is within 2 standard deviations of the mean
About _________% of the area is within 3 standard deviations of the mean
Find the probabilities:
P ( z  1.645) 
P ( z  2.575) 
P (1.96  z  2.33) 
31
Find the following percentiles:
P95
P75
P50
P35
Enrichment:
c
5
10
Here is a probability distribution. Find the value of c.
Find the probability that x is between 0 and 3.
Find the probability that x is between 2 and 9.
32
ND problem 1
Air Force ejection seats are designed for people weighing between 140 lb and 211
lb. Women’s weights are normally distributed with a mean of 143 lb and a standard
deviation of 29 lb. What percentage of women have weights in those limits?
33
ND problem 2
The airline industry wants the passenger seats to fit 98% of all males flying. Men
have hip widths that are normally distributed with a mean of 14.4 inches and a
standard deviation of 1 inch. Find P98 and the associated seat width.
What is the formula for standard deviation?
x

z
34
ND problem 3
The lengths of pregnancies are normally distributed with a mean of 268 days and a
standard deviation of 15 days.
A woman wrote to Dear Abby claiming that she gave birth 308 days after a brief
visit with her husband who was fleet Navy and ship bound else. Is this credible?
Premature is being born in the 4th percentile of length…what length of time is this?
Can you figure out how we could use this fact to help hospital administrators?
35
7.6
Normal Approximations
Sometimes, when everything is right, you may approximate a binomial
distribution as a normal distribution and use the far easier calculations for the
normal distribution.
What is “everything is right”?
1.
the binomial distribution is “smooth” , not “chunky”
2.
the binomial distribution is symmetric, not skewed
3.
The number of data points times the minimum (p, q) > 5
If you’ve got these three things you are good to approximate.
36
Now, we can look at the TI way to do this (page 221, bottom). Let’s compare with
another way on page 222. We’ll use the “continuity correction” (page 223) with
abandon! It’s a sort of “split the difference” way to manage the discrete nature of
real binomial data!
Let’s go through the calculations on pages 224 – 226
37
Now let’s look at “tossing tacks” page 227
38
Approximating normal
When an airliner is loaded with passengers, baggage, and cargo plus fuel, the pilot
must verify that the gross weight is below a maximum and that the weight is
properly distributed for safety.
An airline has established a procedure in which extra cargo must be eliminated
whenever a 200 person plane has at least 120 men.
Assume that the population is 50/50 men and women.
Check to make sure we can approximate:
Get the mean and standard deviation:
Mean:
Sigma:
did you get 7.0710678?
Continuity Correction:
119.5 to 120.5
39
We want “at least 120 men”…120 and to the RIGHT…sketch this!
Now find the area that is shaded.
z = 2.76
What is the probability? Do we need to worry much about this?
40
Using continuity corrections:
Wording:
At least 120
to the right of 119.5
More than 120
to the right of 120.5
At most 120
to the left of 120.5
Fewer than 120
to the left of 119.5
Exactly 120
between 119.5 and 120.5
41
AN Problem 1
In a study of 420,000 cell phone users in Denmark, it was found that 135 developed
brain cancer. Assuming cell phones have no effect, there is a 0.000340 probability
of a person developing brain cancer. We would, then, expect 143 cases among
420,000 randomly chosen people.
Estimate the probability of 135 or fewer cases of such cancer in the randomly
chosen population.
What do these results suggest about media reports that cell phones cause brain
cancer?
42
AN Problem 2
After being rejected for employment, Ms. Kim learns that this company has hired
only 21 women applicants among its 62 new employees. She also learns that the
pool of applicants is very large with equal numbers of qualified men and women.
The company claims no unfair discrimination in hiring. Kim feels differently.
Run the numbers and decide how you feel.
43
AN Problem 3
45% of humans have Type O blood. A hospital is running low on Type O blood
and runs a blood drive…it needs 177 units of this type of blood. Assume 1 unit per
donor. If 400 volunteers show up, what is the probability that at least 177 of them
will have Type O blood? Are the 400 volunteers enough?
44
Chapter 8 – Distributions from Random Samples
8.1
Random Sampling
Let’s go with the book’s comment about defining “random” by what it’s NOT:
systematic, logical, having a clear pattern or order.
In statistics, random has to do with the process of picking a sample – each element
in the population has an equally likely chance to be chosen.
Let’s look at Classroom Exploration 8.1 page 235
Let’s read it – will there be repetitions in the scenario?
Plan A
24 cards, one name per card
Plan B
roll a die – the number on top is the row number
Plan A
Plan B
how many possible samples? Equally likely?
how many possible samples? Equally likely?
Question 3 and Question 4
Picking Amy?
Let’s now read page 237 at the top: an exerpt…
Note that in this part of the class we are doing inferential statistics – we want to
infer some conclusion about the population from our work…and we want to
quantify how reliable this conclusion is.
45
Now let’s read the Focus on Understanding project that starts on page 237…and
check out the results from doing it on page 240.
What do you notice about the dot plots?
What can you conclude about small samples vs bigger samples?
Note that we look at a range of values for the mean – why do we do this? What are
we trying to ensure by doing this? Focus on the discussion on page 241 in the
middle of the page for a discussion about these ideas.
46
8.2
The Distribution of Sample Means
The mean of a random sample is an estimator of the true population mean. It can
be a good estimate or a poor estimate. We want to ensure that it’s a good one!
How can we do this?
A
we want the mean to be unbiased
We can check this by finding the expected mean of the SAMPLE means.
If the expected mean is the true mean, then the sample is unbiased.
Operationally, the more perfectly random your samples, the more unbiased
your sample means are.
B
we want a large sample size, not a small one
Operationally, n = 30 is the best minimum sample size, but more is better if
you can afford it!
When we have these, then the distribution of sample means is normally distributed
about the true mean,  .
This is so important! And took so long to discover!
Page 249
The Central Limit Theorem:
Regardless of the distribution of the population being sampled, the distribution of
sample means taken from random samples of size n is approximately normally
distributed when n is large.
See the caution on page 240 at the bottom of the last paragraph.
47
The mean of the sample means is the true population mean and the standard
deviation is the population standard deviation divided by the square root of n.
x  
x 

n
Let’s discuss that standard deviation:
Suppose n is small
Suppose n is large
Now compare the two dot diagrams on page 240 again.
So now, suppose we have 50 samples (random!) and we calculate the mean of each.
We then have a list of sample means as our data.
We find the mean of these sample means and the standard deviation of these sample
means. What do we know about the original population?
We know the means are the same and we can multiply our standard deviation to get
the original population standard deviation. Do you see how?
What DON’T we know?
The shape of the original distribution!
48
Let’s look at the example on page 250:
Back to Nicky’s free throws!
Recall her distribution (page 250 – mean is .96). Now we’ll look at a simulation of
size 50.
Let’s go through the calculations to find the mean and standard deviation for the
distribution of the sample means.
How do you find the mean and the standard deviation? What are the formulas?
WHERE are the formulas in the textbook?
Now let’s walk through Lauren’s simulation of doing 50 free throws and calculate
the probability that Lauren’s sample mean will be within .1 of the actual mean.
See page 251
49
Suppose we do this 4 times and take the AVERAGE mean from those 4
attempts…will this be more accurate than doing it just once? Why or why not?
What we are doing here with that “0.1” is finding an error bound or margin of
error. The probability that our estimate is within the given error bound is what we
calculated in this example. The probability is called the “confidence level” of our
estimate. The confidence level of an estimate goes up as n increases.
Let’s review our procedure from a Big Picture viewpoint.
We got our sample and calculated the mean
We then went to z-scores* to find the probability “between”
We used Table 1 or our calculators to get the probability
We described our confidence level in our estimate
*and we used the distribution of the SAMPLE MEANS not the original distribution
in our calculations!
50
SD – Problem 1
A company that specializes in data analysis tests all its applicants for employment
by having them solve three short problems that are indicative of the type of work
they will be required to perform. An applicant is given a score from 0 to 10 for
each problem. From the performances of previous applicants, the sampling
distribution of mean scores has been found to be as shown in the table below.
Sketch this distribution on the right:
Mean
Prob
0
.001
1
.005
2
.010
3
.045
4
.060
5
.100
6
.150
7
.350
8
.200
9
.070
10
.009
Use your calculator to find the mean (6.570) and standard deviation (1.63)
Page numbers for formulas:
Check the Empirical Rule on your distribution. What is the z-score for 8?
51
SD
Problem 2
The number of patients admitted per day to a medium-sized regional hospital is 35
with a standard deviation of 10. If, on a given day, there are 60 beds available for
new patients, do you think the hospital will have to divert emergency vehicles to
another hospital?
SD Problem 3
The sampling distribution of X, the number of people who arrive at a cashier’s
counter in a bank per minute is given below:
X
P(X)
0
.36
1
.38
2
.18
3
.06
4
.02
Verify the Empirical Rule.
52
53
8.3
The Distribution of Sample Proportions
Proportions have a place in statistics. And we use a sample proportion from a
random sample to estimate the true proportion of a population that has a specific
property often.
Let’s look at Classroom Exploration 8.3 on page 253…
Let’s look at “drawing more blocks” page 254…
And look at the proportion on the bottom of page 254 to see how this differs a bit
from a sample mean.
Class discussion: What are the differences?
“hat” or caret notation is discussed on page 255 at the top…we have special
notation to use when we are talking about a sample proportion
p
The expected value of “p-hat”
The standard deviation of “p-hat”
page 256
page 258
The distribution – no surprises here!
54
SP
Problem 1
Suppose a warship takes 6 shots at a target, and it takes at least 4 hits to sink the
target. If the warship has a record of hitting with 20% of its shots, in the long run,
what is the probability of sinking the target.
Is this binomial?
Sketch the distribution … make a table first.
Answer the question.
55
SP
Problem 2
Let’s consider the 107th Congress: There are 100 senators 9(2 per state). At that
time, there were 87 males and 13 females.
What is the population proportion of each type of senator.
Suppose we take random samples of size 10.
S1
S2
S3
S4
S5
MFMMFMMMMM
MFMMMMMMMM
MMMMMMFMMM
MMMMMMMMMM
MMMMMMMMFM
Calculate the sample proportions.
Now suppose we go on and do 95 more samples resulting the in the following table:
Sketch the frequency table:
Prop F
Freq
0.0
26
0.1
41
0.2
24
0.3
7
0.4
1
0.5
1
Check that the mean is 0.119 and the standard deviation is 0.100
56
What would the frequency table look like if we did this 10,000 times?
57
SD Problem 3
Here is the population of all 5 US Presidents who had professions in the military
along with their ages at inauguration:
Eisenhower
Grant
Harrison
Taylor
Washington
(62)
(46)
(68)
(64)
(57)
Assume that samples of size 2 are randomly selected WITH REPLACEMENT.
How many samples are possible?
What is the mean of each sample?
Make a frequency table for these means…is this a sampling distribution?
What is the distribution for these means?
What is the mean of the table? How does this compare with the actual mean of the
presidents?
58