Download Ch. 07 - Notes - Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Ch. 7- Random Variables – Stahler - Statistics - Page 1
Essential Question(s): How do we know the difference between a discrete and continuous random variable?
RV: Use your calculator to find the following probabilities. Draw a picture.
1) P(z>1.56)
2) P(z<1.99)
3) P(-3.44<z<4.51)
Suppose you are counting the number tails when flipping a coin 4 times. We could list the sample space S =
{HHHH, HHHT, HHTH, HTHH, . . . , TTTT} but all we are interested in is the number of tails. Let X represent the
number of tails flipped. We will call X a random variable because it represents the number of possible tails (0,
1, 2, 3, or 4) in one outcome.
Random Variable: a variable whose value is a numerical outcome of a random phenomenon
Two Types of Random Variables
1) Discrete Random Variable (count)
2) Continuous Random Variable (measure)
Discrete Random Variable: has a countable number of possible values
Ex: Roll two dice and find the sum
Ex: The number of students in each class
Ex: The number of cars in the parking lot
Ex:
Ex: Suppose you are working for a contractor who is building 500 single-family houses. They want to know
how many parking spaces will be needed per house. You gather data from a sample of 500 single-family
houses that that closely resembles the houses in the community you are building. Assume there are at most 4
vehicles per house.
# of Vehicles
0
1
2
3
4
Proportion of Houses
0.088
0.332
0.385
0.137
0.058
Create a histogram representing this data.
How many houses have more than 2 cars?
How many houses have 2 or more cars?
The contractor now wants to build 500 duplexes (shared driveways). Is there a way to use our previously found
data to find the distribution of the number of vehicles per duplex?
What is the minimum and maximum number of vehicles that could be in a duplex driveway?
How could we randomly combine two houses worth of driveways from the previously collected data? Describe
2 different methods.
Suppose your simulation for 500 duplexes is shown below. What can we conclude?
# of Vehicles
0
1
2
3
4
5
6
7
8
Prop. of Duplexes
0.008
0.058
0.142
0.306
0.250
0.160
0.064
0.010
0.002
Construct a histogram representing this data.
Compare the shape, center, and spread of both histograms.
Discrete Probability Distribution (we saw this last chapter)
_______________________________
𝐕𝐚𝐥𝐮𝐞 𝐨𝐟 𝐗: 𝐱𝟏 𝐱𝟐 𝐱𝟑 . .. 𝐱𝐤
𝐏𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲: 𝐩𝟏 𝐩𝟐 𝐩𝟑 . .. 𝐩𝐤
_______________________________
The probabilities pi must satisfy two requirements
1) Every probability is between 0 and 1.
2) The sums of the probabilities must equal 1.
Ex: In years, how old are you? How do we know this is a DRV? What values can we assign to the random
variable X? Create a probability distribution (model).
HW: 1) Create probability histograms to compare the probability distribution of student ages (previous page)
with that of the probability distribution of teachers ages that were hired last year (let’s assume that only 22-26
year-olds were hired). Make sure to compare shape, center, and spread.
Age (X):
22
23
24
25
26
Probability:
0.45
0.25
0.1
0.15
0.05
2) A couple plans to have three children. There are 8 possible arrangements of girls and boys (Fundamental
Counting Principle). For example GGB means the first two children are girls and the third is a boy. All 8
arrangements are (approximately) equally likely.
Create the probability distribution that represents the number of girls the couple can have.
3) A study of social mobility in England looked at the social class reached by the sons of lower-class fathers.
Social classes are numbered from 1 (low) to 5 (high). Take the random variable X to be the class of a randomly
chosen son of a father in Class 1. The study found that the distribution of X is...
Son’s Class:
1
2
3
4
5
Probability:
0.48
0.38
0.08
0.05
0.01
a) What percent of the sons of lower-class fathers reached at least Class 3? Write this in probability notation.
b) What percent of the sons of lower-class fathers reached at most Class 3? Write this in probability notation.
c) What percent of the sons of lower-class fathers reached less than Class 3? Write this in probability notation.
d) Describe how you could simulate the percent of sons in Class 1 or 2 of fathers in Class 1.
Ch. 7- Random Variables – Stahler - Statistics - Page 4
Essential Question(s): How do you calculate the expected value and st. dev. of a discrete probability dist.?
An insurance company offers a “death and disability” policy that pays $10,000 when you die or $5,000 if you
are permanently disabled. It charges a premium of $50/year for this benefit. Will the insurance company make
a profit selling this policy?
What are the possible outcomes for the policy holder?
Let the random variable, X, represent the amount the insurance company will payout. What values can X
assume? How do we know this is a discrete probability distribution?
Suppose that the death rate for any year is 1 out of every 1000 people, and that another 2 out of 1000 suffer
some kind of disability. Represent this data in a probability model (distribution).
Policyholder Outcome
Payout X
Probability (X)
What can we expect (not predict) if the company insures exactly 1000 people for one year? Total payout?
How much will the company payout per customer? How much does the company profit from each customer?
We will call the payout the expected value (denoted: μx or E(X)) which is a parameter of this model.
Parameter: a measurable characteristic of a population
Statistic: a measurable characteristic of a sample
How did we find the expected value, or payout per policy?
10,000(1) + 5,000(2) + 0(997)
1
2
997
μ = E(X) =
= 10,000(
) + 5,000(
) + 0(
) = $20
1000
1000
1000
1000
Expected Value (discrete random variable): the average value of the outcomes:
Multiply each probability by the possible value and find the sum.
μx = E(X) = ∑
xi P(xi )
Ex: Suppose you and your lover go out for breakfast in Las Vegas. If you order pancakes, you are eligible for
the “flapJACK” discount. As the waiter brings the check, he also brings 4 Jacks from a standard deck of playing
cards. The cards are face down and you get to turn over one card. If it is a black jack, you owe the full amount
for your meal, if it is the jack of hearts you get a $10 discount. If the first card you turn over is the jack of
diamonds, you get another turn with the three remaining cards but only earn a $5 discount if you flip the jack of
hearts (you get nothing if you flip a black jack).
Create a probability distribution (model) for the discount awarded (tree diagram might help). Find the expected
savings per meal (expected value). Explain what this means in the context of the problem.
Ex: You take your car to the shop. The mechanic identifies the air conditioner problem as dirt in a control unit.
She said that in about 75% of such cases, recharging the coolant multiple times cleans up the problem (at a
cost of $60). If that fails, then the control unit must be replaced at an additional cost of $140.
a) Define the random variable and construct the probability model.
b) What is the expected value of the repair? Explain what this means in the context of the problem.
RV: What is standard deviation? What variables do we use to represent the parameter and statistic?
RV: What is the difference between standard deviation and variance?
σ2 = Var(X) = ∑
(x − μ)2 P(x)
σ = SD(X) = √∑
(x − μ)2 P(x)
Find and interpret the standard deviation for the car repair problem above.
Verify the results. Enter the random variables in L1 and the probabilities in L2. Use 1-Var Stats L1,L2.
If you are using a newer calculator use L2 as the FreqList.
HW:
1) As the head of inventory for Knowway computer company, you were thrilled that you had managed to ship 2
computers to your biggest client the day the order arrived. You are horrified, though, to find out that someone
had restocked refurbished computers in with the new computers in the storeroom. The shipped computers
were selected randomly from 15 computers in stock, but 4 of those were actually refurbished.
If your client gets 2 new computers, things are fine. If the client gets one refurbished computer, it will be sent
back at your expense ($100) and you can replace it. However, if both computers are refurbished, the client will
cancel the order this month and you’ll lose a total of $1000.
What is the expected value and the standard deviation of the company’s loss? *Hint: Draw a tree diagram.
2) A couple plans to have children until they get a girl, but they agree that they will not have more than three
children even if all are boys (assume boy/girls are equally likely).
a) Create a probability distribution for the number of children the couple might have.
b) Find the expected number of children and standard deviation.
c) Find the expected number of girls.
3) Let’s play a game. You pay me $25 to play. I take a standard deck of cards. I shuffle the deck well. You pick
1 card at random. The suit of that card becomes your winning suit. You do not put the card back. You pick 2
more cards at random. If those 2 cards are both the winning suit, I will pay you $500 (plus give you back the
initial bet). If not, I pay you nothing. Do you want to play? Justify your reasoning.
4) A nonprofit plans to hold a raffle to raise funds for its operations. A total of 1,000 raffle tickets will be sold for
$1.00 each. After all the tickets are sold, one ticket will be selected at random and its owner will receive
$50.00. The expected value for the net gain for each ticket is -$0.95. What is the meaning of the expected
value in this context (2012)?
a) The ticket owners lose an average of $0.05 per raffle ticket.
b) The ticket owners lose an average of $0.95 per raffle ticket.
c) Each ticket owner will lose $0.95 per raffle ticket.
d) A ticket owner would have to purchase 19 more tickets for the expected value of his or her net gain to
increase to $0.00.
e) A ticket owner has a 95 percent chance of having a ticket that is not selected.
Ch. 7- Random Variables – Stahler - Statistics - Page 7
Essential Question(s): How do means and standard deviations of random variables change?
Rv: The following were exam grades for the last chapter (probability) test: 22, 24, 18, 17, 19, 20. Find the
mean standard deviation, and variance of the sample.
Suppose I deducted 3 points from each student. How does the mean change? Standard deviation? Variance?
Suppose I gave a 10% bonus to each student. How does that change the mean? Standard deviation?
Variance?
Suppose I gave an 8% bonus then 2 points to each student. Find the new mean, StDev, and variance.
Typically during the holiday season you spend $250 on gifts with a standard deviation of $23. Your hours were
reduced at work so this year you are looking to spend 10% less than previous years. How is the mean,
standard deviation, and variance going to change?
Rules for addition of random variables:
E(X ± c) = E(X) ± c
Var(X ± C) = Var(X)
SD(X ± C) = SD(X)
μx±c = μx ± c
σ2x±c = σ2
σx±c = σ
Rules for multiplication of random variables:
E(aX) = aE(X)
Var(aX) = a2 Var(X)
SD(aX) = |a|SD(X)
μax = a ∗ μx
σ2ax = a2 ∗ σ2
σax = |a| ∗ σ
Rules for means and variances for TWO random variables.
1) If X and Y are any two random variables, then:
μx+y = μx + μy
μx−y = μx − μy
2) If X and Y are independent random variables, then:
σ2x+y = σ2x + σ2y
σ2x−y = σ2x + σ2y *Note: variances always ADD
Why Independent?
*If you want to add/subtract standard deviations, you must first square the values (find the variance),
then add, and then take the square root. 𝐚 + 𝐛 ≠ √𝐚𝟐 + 𝐛 𝟐
Ex: You are placing your books in your backpack. The mean book weight is 1.3 pounds with a standard
deviation of 0.4 pounds. You put 5 books in your backpack. What is the mean weight and standard deviation of
the backpack with the books in it (ignore the weight of the backpack)? Assume book weights are independent.
What if you removed the books from the backpack?
Why do we add variances even when subtracting? σ2x−y = σ2x + σ2y
Article
Suppose you are digging a hole to plant a new tree you purchased. You remove a wheelbarrow full of dirt from
the yard. Now, did you get an EXACT wheelbarrow full? No, there is some variance there.
As you try to plant the tree, you realize that you did not dig the hole deep enough. You take a bucket and
scoop out a bucket full of dirt. Did you have an EXACT bucket full? No, there was some variance.
We subtracted dirt from the hole. Although we subtracted, the variance increased (added) because we never
subtracted an exact bucket full.
The variability of the dirt removed has increased even though we dug the hole deeper (subtracted more dirt).
Moral: Every time something happens at random, whether it adds to the pile or subtracts from it, uncertainty
(read "variance") increases.
Ex: Suppose you started your own business on the weekends. The business can bring you a profit on
Saturday and Sunday based on various (independent) scenarios. The profits and their respective probabilities
for Saturday and Sunday are listed below. Calculate the following.
Sat. Profit (X)
$10
$50
$100
P(X)
0.1
0.5
0.4
Sun. Profit (Y)
$20
$60
$80
P(Y)
0.2
0.3
0.5
μ𝐱
μ𝐲
μ𝐱 + 𝛍𝐲
μ𝐱 − 𝛍𝐲
σ𝟐𝐱
σ𝟐𝐲
σ𝟐𝐱 +𝛔𝟐𝐲
σ𝐱
σ𝐲
σ𝐱 +𝛔𝐲
Ex: The distribution of SAT scores of college-bound male seniors has a mean of 1532 and a standard
deviation of 312. The distribution of SAT scores for college-bound female senior has a mean of 1506 and a
standard deviation of 304. One male and one female are randomly selected. Assume their scores are
independent.
a) What is the average sum of their scores?
b) What is the average difference of their scores?
c) What is the standard deviation of the difference in their scores?
HW:
1) The following table gives the ages of students enrolled at the University of Texas, San Antonio.
Age
17-22
23-29
30+
Percentage
44.2
31.4
24.2
a) Make a reasonable estimate of the center of each age group, and compute the mean and standard deviation
of the ages of these students.
b) Describe how to use this list of random digits to take a sample of six ages.
30558 45957 36911 97199 08432
2) The passenger vehicle with the highest theft loss is the Cadillac Escalade EXT, with 20.2 claims for theft per
1000 insured vehicles per year and an average payment of $14,939 per claim. How much would you charge an
owner of a Cadillac Escalade EXT for theft insurance per year if you simply expect to break even?
3) Explain why you can calculate the mean for combined Verbal and Math SAT scores but you cannot
calculate the variance and standard deviation.
4) You are selling handmade keychains (much like Deb from Napoleon
Dynamite). The weights of the keychains are independent with a mean
of 3.1 grams and a standard deviation of 0.25 grams. You can store 10
keychains in a fanny pack. The weights of empty fanny packs have a
mean of 340.5 grams and a standard deviation of 4.98 grams. What is
the mean and standard deviation of a fanny pack filled with 10
keychains?
Ch. 7- Random Variables – Stahler - Statistics - Page 10
Essential Question(s): How do we describe continuous random variables?
RV: Use your calculator to find the following probabilities. Draw a picture.
The class average for the last test was an 89% with a standard deviation of 4%. If you scored a 92%, in which
percentile would your score lie? What would the cutoff score be for the lower 10%? “normalcdf and invNorm”
Ex: Suppose 2 people weighed themselves and both weight 178.3 lbs. Do they weigh the same? Explain.
Continuous Random Variables: variables which take an infinite number of possible values (usually
measurements)
Ex: The weight of your brain
Ex: The distance to the person across from you
Ex: My age
Ex:
Continuous Probability Distribution: described by the area under a density curve
A continuous probability distribution differs from a discrete probability distribution in several ways.
● The probability that a continuous random variable will assume an exact value is zero.
● As a result, a continuous probability distribution cannot be expressed in tabular form.
● Instead, an equation or formula is used to describe a continuous probability distribution.
● We assign probabilities to intervals of outcomes rather than to individual outcomes.
Find the areas (shaded region) for the following density curves (uniform distribution).
Ex: Every summer you serve tables at the Wyomissing Diner. On average you earn $31.56 in tips with a
standard deviation of $4.74 for each day you work. The distribution of money earned is approximately normal
and we will assume independence. What is the probability that over the next 30 days you work, you will earn
more than $1,000?
Ex: Accurate labeling of packaged meat is difficult because of weight decrease due to moisture loss (defined
as a percentage of the package’s original net weight). Suppose that moisture loss for a package of chicken
breasts is N(4%, 1.1%) as suggested in the paper “Drained Weight Labeling for Meat and Poultry: An
Economic Analysis of a Regulatory Proposal”. Let X denote the moisture loss for a randomly selected package.
a) What is the probability X is at most 4.2%?
b) What is the probability X is at least 6.9%?
c) What is the probability moisture loss differs from the mean value by at least 1.3%?
d) Find a moisture loss value such that 90% of packages have a loss below that value.
Ex: Leona and Fred are friendly competitors in high school. Both are about to take the ACT college entrance
examination. They agree that if one of them scores 5 or more points better than the other, the loser will buy the
winner a pizza. Suppose that in fact Fred and Leona have equal ability, so that each score varies normally with
mean 24 and standard deviation 2. (The variation is due to luck in guessing and the accident of the specific
questions being familiar to the student.) The two scores are independent. What is the probability that the
scores differ by 5 or more points in either direction?
HW:
1) You have two balanced, six-sided dice. The first has 1, 3, 4, 5, 6, and 8 spots on its six faces. The second
die 1, 2, 2, 3, 3, and 4 spots on its faces.
a) What is the mean number of spots on the up-face when you roll each of these dice?
Die 1:
Die 2:
b) Write the probability model for the outcomes when you roll both dice independently. From this, find the
probability distribution of the sum of the spots on the up-faces of the two dice.
c) Find the mean number of spots on the two up-faces in two ways: from the distribution you found in (b) and
by applying the addition rule to your results in (a). You should of course get the same answer.
2) A study of working couples measures the income X of the husband and the income Y of the wife in a large
number of couples in which both the partners are employed. Suppose that you knew the means μx and μy and
the variances σ2x and σ2y of both variables in the population.
a) Is it reasonable to take the mean of the total income X + Y to be μx + μy ? Explain your answer.
b) Is it reasonable to take the variance of the total income to be σ2x + σ2y ? Explain your answer.
3) Rotter Partners is planning a major investment. The amount of profit X is uncertain but a probabilistic
estimate gives the following distribution (in millions of dollars):
Profit:
1
1.5
2
4
10
Probability:
0.1
0.2
0.4
0.2
0.1
a) Find the mean profit and the standard deviation of the profit.
b) Rotter Partners owes its source of capital a fee of $200,000 plus 10% of the profits X. So the firm actually
sees Y = 0.9X - 0.2 from the investment. Find the mean and standard deviation of Y.
4) You have two scales for measuring weights in a chemistry lab. Both scales give answers that vary a bit in
repeated weighings of the same item. If the true weight of a compound is 2.00 grams (g), the first scale
produces readings X, that have a mean of 2.000 g and standard deviation 0.002 g. The second scale’s
readings Y, have a mean of 2.001 g and standard deviation 0.001 g.
a) What are the mean and standard deviation of the difference Y – X between the readings? (The readings X
and Y are independent.)
b) You measure once with each scale and average the readings. Your result is Z = (X + Y)/2. What
are μz and σz ? Is the average Z more or less variable than the reading Y of the less variable scale?