Download Discrete Populations and Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Models for Discrete Variables
Our study of probability begins much as any data analysis does: What is the distribution of the
data? Histograms, boxplots, percentiles, means, standard deviations - they all apply with at most
minor adjustments. We begin our study by considering discrete quantitative data. Remember: In
discrete data, a lot of the data are “ties.”
We’ll start with a simple random variable: The number x of credit cards a randomly selected
person has. A probability table and histogram for this data are shown.
x = # of credit
cards a person has
p(x) = probability a
person has x credit cards
0
0.20
1
0.30
2
0.20
3
0.15
0.05
4
0.10
0.00
5
0.05
0.30
Probability
0.25
0.20
0.15
0.10
0
1
2
3
x = # of credit cards
4
5
Because this is highly discrete data, percentiles are not very useful. Because of the large numbers
of ties, we use relative frequencies to summarize the data. The letter p is used for probability
(which is synonymous in a sense with proportion and relative frequency.)
Here’s how to obtain the value of the mean for the random variable x:
Mean =  =
 xp x
Compute the product of the values with their relative frequencies. Sum these.
(This formula works even when there are no ties: p(x) = 1/N for each value.)
For the example, the mean computation is in the third column of the table.
Mean
Variance/SD
x
p(x)
x p(x)
 x    2 p x 
0
0.20
0(0.20) = 0.00
(0 – 1.80)2 0.20 = 0.648
1
0.30
1(0.30) = 0.30
(1 – 1.80)2 0.30 = 0.192
2
0.20
2(0.20) = 0.40
(2 – 1.80)2 0.20 = 0.008
3
0.15
3(0.15) = 0.45
(3 – 1.80)2 0.15 = 0.216
4
0.10
4(0.10) = 0.40
(4 – 1.80)2 0.10 = 0.484
5
0.05
5(0.05) = 0.25
(5 – 1.80)2 0.05 = 0.512
1.00
x= 1.80
 2 = 2.060
Discrete Populations and Probability Distributions
  2.060  1.44
Page 1
Suppose you had the “ideal” sample of 100 data values from this situation. Then you would have
the following (these have been sorted):
0
1
2
4
0
1
2
4
0
1
2
5
0
1
2
5
0
1
2
5
0
1
2
5
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4
5
The mean of these is 1.80. (The standard deviation is 1.44.) You might notice that the mean
could be computed as follows:
MEAN 
x
n
0  0 11 2  2  3  3  4  4  5  5
20
0  20  1  30  2  20  3  15  4  10  5  5

100
20
30
20
15
10
5
 0
 1
 2
 3
 4
 5
100
100
100
2100
100
100
 00.20  10.30  20.20  30.15  40.10  50.05

 1.800
The two computations are identical. The  =  xp x  computation takes advantage of the
discreteness of the data – the large number of ties.
What happened to the “divide by the number of observations” in the formula for mean?
Reexamine the computations detailed above: This division is incorporated into the probabilities.
Place your finger under the horizontal axis of the histogram, at the position of the mean: The
histogram will balance.
The mean does not have to be one of the possible values. No one has 1.8 credit cards.1
Variance and Standard Deviation
Let’s talk standard deviation. We want to be able - as we did for the mean – to obtain this value
without worrying about actual lists of data. We determine the variance, and then standard
deviation, for x as follows:
(Discrete) Variance =  2   x    px  .
2
Standard Deviation =    2 
  x    p x 
2
For each value we determine the squared deviation from the mean. We multiply these squared
deviations by the probabilities. Sum the results to get the variance. The standard deviation is the
square root of the variance. Again, the “divide by how many” is embedded into the relative
frequencies.
For the credit card example, the details of this computation are shown in the fourth column of the
table on page 4. The variance is  2 = 2.060 and the standard deviation is  = 1.44. This
computation could be replaced by simply constructing a data set having the proper relative
1
This is a right skewed distribution, yet the mean is below the mode. This is an exception to the general rule. Such
exceptions are usually found when data are highly discrete (here there are only 6 possible values of x).
Discrete Populations and Probability Distributions
Page 2
frequencies, then inputting the values into a computer or calculator and having the technology
determine the value of the standard deviation.2
The mean and standard deviation are measures of the center and spread of a distribution. They
are not systematically dependent on the size of the data set.
Illustrating Probability and the Mean and Standard Deviation of a Random Variable as Long
Term Behavior
Consider the probability distribution for a
random variable x as at right.
Here are results of 10 observations of this
variable: 1 4 1 3 4 2 1 3 1 2
The (sample) mean for these 10 observations is
2.20, the standard deviation is 1.23.
Relative frequencies (RF) and the sample mean
and standard deviation are shown in the second
(10) column below:
(x – )2 p(x)
x
p(x)
1
0.4
0.4
(1 – 2)20.4 =
0.4
2
0.3
0.6
(2 – 2)20.3 =
0.0
3
x p(x)
0.2
4
2
0.2
2
(3 – 2) 0.2 =
0.6
0.4
(4 – 2) 0.1 =
0.4
Mean = 2.0
Variance 2 =
1.0
0.1
SD  = 1.0
x
RF10
RF100
RF1000
RF10000
RF100000
1
0.4
0.44
0.424
0.4068
0.39837
2
0.2
0.31
0.298
0.2984
0.29899
3
0.2
0.12
0.185
0.2004
0.20067
4
0.2
0.13
0.093
0.0944
0.10197
Mean10 = 2.20
Mean100 = 1.94
Mean1000 = 1.947
Mean10000 = 1.982
Mean100000 = 2.006
SD10 = 1.23
SD100 = 1.04
SD1000 = 0.999
SD10000 = 0.992
SD100000 = 1.003
Add to these another 90
observations, for 100 total (at
right):
Results are shown in the third (100)
column of the table above.
2
1
1
1
2
1
2
3
2
2
1
1
2
1
1
4
2
2
1
1
2
2
1
3
2
3
3
1
1
2
1
2
1
1
3
1
2
2
1
1
4
3
1
2
2
4
1
2
1
2
1
4
1
2
2
2
2
4
3
1
2
1
1
2
2
2
3
4
3
1
4
1
1
2
1
1
1
1
4
1
1
2
3
4
4
1
Add another 900 (not shown due to space considerations) for 1000; then another 9,000 for
10,000; then another 90,000 for 100,000. See the fourth through sixth columns of the above
table.
Is that a fluke? Try it again…
2
If you are treating a sample of data of this type in tabular form, you should multiply the value you get for
x
by
) to obtain the sample standard deviation S. If you have a sample on hand, there really is no probability
√ ⁄(
– you aren’t going to randomly select items from the sample (you already have them). So the formula for the
standard deviation of a random variable is not quite correct when applied to a sample. The adjustment is necessary
for technical reasons – the adjusted value is, in one technical sense, a better estimate of the standard deviation for the
probability distribution.
Discrete Populations and Probability Distributions
Page 3
1
1
4
1
x
RF10
RF100
RF1000
RF10000
RF100000
1
0.5
0.41
0.397
0.3969
0.39753
2
0.4
0.37
0.296
0.3056
0.30135
3
0.1
0.13
0.193
0.1949
0.19932
4
0.0
0.09
0.114
0.1026
0.10180
Mean10 = 1.60
Mean100 = 1.90
Mean1000 = 2.024
Mean10000 = 2.003
Mean100000 = 2.005
SD10 = 0.70
SD100 = 0.95
SD1000 = 1.023
SD10000 = 1.001
SD100000 = 1.002
Probabilities are long term relative frequencies. The mean and standard deviation of a random
variable also reflect what happens in the long term: “The mean and standard deviation for all
possible units.” If the probability distribution mimics a population distribution, then probabilities
are population relative frequencies, and the mean and standard deviation for the probability
distribution are the mean and standard deviation for the population.
Probability Talk
Suppose we select a person at random from the general population. Go back to the credit card
example: The probability a random selected person carries (exactly) 2 credit cards is 0.20. So: If
the population has 20 people, it must be the case that 0.20(20) = 4 of the people have 1 credit
card. How else could the probability be 0.20? If the population consists of 8000 people, then
1600 (which is 0.20 of 8000) have 1 credit card. A probability of 0.20 goes hand in hand with a
population for which 0.20 of all people have 1 credit card – no matter the size of the population.
The population distribution is identical to the probability distribution for the outcome if a single
value is randomly selected from the population.
Thinking about things a slightly different way:
Consider p(3) = 0.15. It means that the probability of selecting a single person with 3
credit cards is 0.15. This implies that if the experiment of selecting a single person at
random is performed repeatedly, in the very long run, 0.15 of the time the person will
have 3 credit cards. Probability refers to the relative frequency of occurrences in a huge
(infinite) set of identical repeats of the sampling. This also means that 0.15 of all people
(the entire population) have 3 credit cards.
Consider the mean of  = 1.800. (We use the Greek letter  to represent the mean of a
probability distribution or population.) As the mean for the probability distribution it also
implies that if the experiment of selecting a single person at random is performed
repeatedly, in the very long (infinite) run the average result will be 1.800. This also
means that the mean for the entire population is 1.800.
(That was probably obvious to you. However, a population is a real thing while a
probability distribution is a mathematical object. It is likely not the case that only one
single value would ever be randomly selected from the population.)
When we talk about probability distributions we are talking about all possible ways that an
experiment might occur.
Consider looking at all possible ways of selecting a person. The mean number of credit
cards is 1.8, and the standard deviation is 1.435. In 0.30 = 30% of those ways, the
selected person has exactly 1 credit card.
Discrete Populations and Probability Distributions
Page 4
Example
Choose a college student at random. Count x = the number of siblings in the family. (Subtract
one from each x to arrive at “# of brothers and sisters a student has.”)
p( x)
0.2194
0.2806
0.2329
0.1442
0.0736
0.0317
0.0124
0.0043
0.0005
0.0003
0.30
0.25
0.20
Probability
# of children x
1
2
3
4
5
6
7
8
9
10
0.15
0.10
0.05
0.00
1
2
3
4
5
6
7
8
9
10
# of Children
Here’s the interpretation of the probability p(2) =0.2806: Consider all possible ways of selecting
a student. In 0.2806 = 28.06% of those ways, the student is from a 2-child family.
The computation of the mean, variance and standard deviation follow:
x
1
2
3
4
5
6
7
8
9
10
55
p(x)
x p(x)
0.2194 1(0.2194) = 0.2194
0.2806 2(0.2806) = 0.5612
0.2329 3(0.2329) = 0.6987
0.1442
:
= 0.5768
0.0736
0.3680
0.0317
0.1902
0.0124
0.0868
0.0043
0.0344
0.0005
0.0045
0.0003 10(0.0003) = 0.0030
1.0000
Mean:  = 2.7430
(x – )2 p(x)
(1 – 2.743)20.2194 = 0.6665
(2 – 2.743)20.2806 = 0.1549
(3 – 2.743)20.2329 = 0.0154
: = 0.2278
0.3749
0.3363
0.2247
0.1188
0.0196
(10 – 2.743)20.0003 = 0.0158
Variance: 2 = 2.1548
St Dev:  =
2.1548  1.468
It is not correct to compute 55/10 = 5 to get the mean. (Nor is 1/10 = 0.1 correct.) While 1, …, 10
are the 10 possible values, they do not occur with equal frequency. Each possible value must be
weighted by its probability of occurrence.
Here’s an interpretation of the mean and standard deviation: Consider all possible ways of
selecting a student. The mean number of siblings is 2.7430 with standard deviation 1.468.
In other words: The mean number of siblings for the population of all students is 2.7430,
with standard deviation of 1.468.
Probability calculations
What is the probability a randomly chosen student comes from a family with…
Discrete Populations and Probability Distributions
Page 5
…more than 5 siblings?
P(x > 5) = 0.0317 + 0.0124 + 0.0043 + 0.0005 + 0.0003 = 0.0492
…at most 3 siblings?
P(x  3) = 0.2194 + 0.2806 + 0.2329 = 0.7329
…at least 7 siblings?
P(x  7) = 0.0124 + 0.0043 + 0.0005 + 0.0003 = 0.0175
…fewer3 than 5 siblings?
P(x < 5) = 0.2194 + 0.2806 + 0.2329 + 0.1442 = 0.8771
These computations illustrate how to perform probability computations for events that consist of
a number of outcomes. (For example: The event “more than 5” is formed by all outcomes more
than 5: 6, 7, 8, 9, 10. Since we are talking about family size, what’s being said is that a family
with more than 5 siblings is a family with 6, 7, 8, 9, or 10 siblings (ignoring really large families,
as they have sufficiently small probabilities that ignoring them has no impact on fundamental
analyses.)
3
Technically, the phrase “less than” is not appropriate for a discrete variable, while “fewer than” is grammatically
proper. However, in common speech very few people make this distinction.
Discrete Populations and Probability Distributions
Page 6
Application
Approximating the mean of continuous data from a histogram.
If you are given a histogram for continuous data, but not the data itself, you can approximate the
mean and standard deviation:
Step 1)
pretend that all the data in a given bin are at the midpoint
Step 2)
determine relative frequencies for each bin
Step 3)
use the relative frequency approach to computing mean and standard deviation
Example
9
Consider the failure times (in hours) of 19
industrial machines.
7
Step 2)
We pretend all data are at the
midpoints: 35, 45, …, 85.
Here are relative frequencies. (For
best accuracy use many digits or exact
fractions.)
Frequency
Step 1)
8
6
5
4
3
2
1
0
30
40
50
60
70
Failure Time (Hours)
80
90
Step 3)
35
45
midpoint
1
2
frequency
0.0526 0.1053
relative frequency
Step 4) The mean is then approximately
55
2
0.1053
65
9
0.4737
75
3
0.1579
85
2
0.1053
35(0.0526) + 45(0.1053) + 55(0.1053) + 65(0.4737) + 75(0.1579) + 85(0.1053) = 63.947
Use the mean in the variance computation
(35 – 63.947)2(0.0526) + (45 – 63.947)2(0.1053) + (55 – 63.947)2(0.1053) +
(65 – 63.947)2(0.4737) + (75 – 63.947)2(0.1579) + (85 – 63.947)2(0.1053) = 156.787
Then SD  156.787 = 12.521
These are only approximate values. To obtain exact values you must input the raw data and do
the computations.
(You might also approximate the standard deviation using Range/4. For the histogram, the range
would be expected to be Max – Min  97 – 35 = 62. Then the standard deviation should be
around 15.5. This approach is considerably quicker.
Discrete Populations and Probability Distributions
Page 7
Exercises
1. Consider the set of 25 observations below - each is the number of bags of recycling brought
by a family to the recycling center (there is a 7 bag limit).
2 5 4 6 3 6 3 6 6 7 2 5 5 4 5 7 4 3 1 7 7 6 4 6 5
a) Use your calculator’s (or software’s) statistics functions to determine the mean and
standard deviation for the data (nearest 0.1 for each).
b) Complete the table below.
1
# of bags
2
3
4
5
6
7
Frequency
Relative Frequency
c) Sketch a histogram. What shape is this distribution?
d) Suppose the table from part c describes a population much larger than 25. Determine the
mean, variance and standard deviation for this distribution. Use the formulas for population
mean, variance and standard deviation of a probability distribution (again to the nearest 0.1):
=
 xp x
 2    x    2 p x 
  2 
x   
2
p x 
Associate each answer with the correct symbol. (If your work is correct, the measn and
standard deviations from a and d will match.)
2. For the population described below…
x
p(x)
1
2
3
4
5
6
1/6
1/6
1/6
1/6
1/6
1/6
a) Visualize a histogram.
b) Determine the mean and standard deviation (nearest 0.01).
This is the distribution of results when one tosses a fair six sided die.
c) Suppose you started tossing a die many many times, recording each result (and entering
them into you calculator). After a huge number of tosses:

what is the proportion of 3s?

what is the mean, as computed by your calculator?

what is the value of the standard deviation, as computed by your calculator?
3. The distribution of lengths of students’ last names are tabulated below.
x
p(x)
2
3
4
5
6
7
8
9
10
11
2.4% 0.0% 9.4% 14.2% 22.0% 22.8% 16.5% 9.4% 2.4% 0.9%
a) Draw a histogram. What shape is this distribution?
b) Find the mean length.
c) Does this distribution describe a large population? Why (not)?
Discrete Populations and Probability Distributions
Page 8
d) Using the range, guess the standard deviation. How does this compare to the actual
standard deviation of 1.71 for this data?
4. The table at right is the probability distribution of the number of pets in a household from a
survey given by the Humane Society:
a) Find the probability that a household picked at random would have four pets.
x
b) Find the probability that a household has a pet.
0
c) Find the probability that a household has more than 3 pets.
1
d) Find the probability that a household has at least 4 pets.
2
e) Find the probability that a household has less than 2 pets.
3
4
f) Find the probability that a household has no more than 5 pets.
5
g) Find the probability that a household has at most 6 pets.
6
h) Find the mean and standard deviation.
7
i) Which is more likely to occur, that a household to have 4 or more pets, or at
most 2?
p(x)
0.165
0.298
0.268
0.161
0.072
0.026
0.008
0.002
j) According to the table, which number of pets is most likely to occur? Is the mean number
of pets the same as the number of pets most likely to occur and what does this indicate?
k) What is the probability that a randomly selected household has a number of pets within
one standard deviation of the mean? (Begin by computing  –  and  + . What is the
probability of an outcome between these two values?)
l) What is the probability that a randomly selected household has a number of pets within
two standard deviations of the mean? (Begin by computing  – 2 and  + 2.)
m) According to this table, what is the probability that someone has 9 pets? Is this
necessarily representative for every household anywhere?
n) Explain what it means to say “the probability of having 3 pets is 0.161.” Use either
relative frequency or percent in your explanation, as well as the phrase “all possible.”
o) Explain what the mean and standard deviation represent.
5. Here is a discrete probability table and chart showing the number of songs downloaded off of
iTunes in a week by college students who own Apple computers.
x
p(x)
0
1
2
3
4
5
6
7
8
9
10
0.045
0.140
0.216
0.224
0.173
0.107
0.056
0.025
0.010
0.003
0.001
a) Identify the units and the variable.
What is the probability that:
b) A college student will download more than 6 songs a week?
c) A college student will download at most 6 songs a week?
d) A college student will download less than 3 songs a week?
e) A college student will download at least 3 songs a week?
Discrete Populations and Probability Distributions
Page 9
f) A college student will download no more than 5 songs a week?
g) A college student will download no less than 5 songs a week?
h) A college student will not download 4 songs?
i) Find values of the mean and standard deviation.
j) If we polled all college students that owned an Apple computer, what percent of them
would download more than 6 songs in a week?
k) What is the probability a student’s number of downloads is within one standard deviation
of the mean? Within two?
6. A casino offers a game of chance. It costs $5 to play. The profit (loss if negative) x has the
probability distribution shown below.
x
p(x)
–5
0
5
10
100
0.540 0.060 0.370 0.025 0.005
a) Sketch a histogram. (This is bimodal with an outlier. Gambling uses strange
distributions.)
b) Determine the mean and standard deviation (nearest 0.01 = one penny). (Be careful when
subtracting a negative.) Identify each by symbol.
c) What is the probability you lose money when you play this game? What’s the probability
you win money?
d) What would happen to a player who plays this game a very large number of times?
e) Convince yourself that the probability of a result within one standard deviation of the
mean is 0.970, and that the probability of a result within two standard deviations is 0.995.
The “rule of thumb” stating 68/95 can be really misleading when data are from a strange
distribution like this. This distribution is strange in two ways: Bimodal; Huge outlier.
7. A carnival game costs $2.00 to play. The probability of winning x dollars is shown.
a) What is the probability of at least getting your money back?
b) What is the probability of losing money playing this game?
c) What is the probability of breaking even?
d) What is the mean payout for this game?
e) What is the standard deviation?
f) Does this game make a profit for the carnival? If so, how much? Explain.
8. Suppose p(y) is defined for y = 1, 2, …, 9 as follows:
x
0
1
2
3
4
5
p(x)
0.449
0.359
0.144
0.038
0.008
0.001
p y   log1  1 y 
For example: p(5) = log (5 + 1/5) = log (6/5) = log 1.2 ( 0.079181).
a) Construct a probability (relative frequency) table. Draw a histogram.
b) Show that the probabilities sum to exactly 1.
c) Determine the mean, variance and standard deviation to the nearest 0.001.
d) Determine the exact value of the mean.
Discrete Populations and Probability Distributions
Page 10
This distribution is known as Benford’s Law. It is often used to model the leading digits of
collections of numbers (not all collections of numbers – just those with certain properties). So:
e) Find the leading digit of the population of each of the 50 states (for example, if the
population is 23,483,399 then the leading digit is 2). Construct a relative frequency table for
this data.
Solutions
1. a) 4.8 and 1.7. b) The relative frequencies are (in order) 0.04, 0.08, 0.12, 0.16, 0.20, 0.24, 0.16.
c) It’s a bit left skewed, d)  = 4.8,  = 1.7. (The mean is left of the mode which hints at left
skew.)
2. a) The histogram has a flat (uniform) shape. b) 3.50, 1.71. c) Enter the data from each toss in
the calculator. Once you have many many tosses (a large set of data) the proportion of 3s will be
very close to 1/6 = 0.1666… The mean and standard deviation for the data will closely match
3.50 and 1.71.
3. a) The distribution is fairly symmetric (an outlier at 2?). b) The mean is 6.555 letters. c) This
probably is not a population; in any large population some people would have three-letter last
names, others would have names longer than 11 letters. d) 9/4 = 2.25. This isn’t too far from the
actual 1.74.
4. a) p( x  4)  0.072 . b) p( x  0)  0.835 . c) p( x  3)  0.108 . d) p( x  4)  0.108 . e)
p( x  2)  0.463 . f) p( x  5)  0.99 . g) p( x  6)  0.998 . h) The mean is    xp (x) = 1.80.
The variance is  2 
 (x  )
2
p( x) 
 ( x  1.797)
2
p( x) = 1.78. The standard
deviation is the square root of this:    2  1.78  1.33 .
i) The probability that a household has 4 or more pets is 0.108, the probability that there are at
most 2 pets is 0.731, so since the probability is higher for at most 2 pets, it is more likely to
occur. j) According to the table it is most likely to occur that a given household has 1 pet.
However the mean number of pets is 1.8. This indicates that the mean doesn’t always represent
what is likely to occur or what has occurred the most. The mean is not the mode – and the mean
being greater suggests right skew. k)  –  = 1.80 – 1.33 = 0.47;  +  = 1.80 + 1.33 = 3.13.
Results between these two values (between 0.47 and 3.13) are 1, 2, and 3. The probability of
having 1, 2 or 3 pets is 0.298 + 0.268 + 0.161 = 0.727 (not that far from 0.68). l)  – 2 = 1.80
– 2(1.33) = -0.86;  + 2 = 1.80 + 2(1.33) = 4.46. Results between these two values (between
-0.86 and 4.46) are 0, 1, 2, 3, and 4. The probability of having 0 – 4 pets (inclusive) is 0.165 +
0.298 + 0.268 + 0.161 + 0.072 = 0.964 (not that far from 0.95). m) According to this table the
probability is 0, meaning it cannot happen. However this does not mean that no household
anywhere has 9 pets. The model given here is stated more concisely by truncating values from 9
and up, and the lack of this detail has no real impact on the accuracy of the description. n) If we
repeatedly sample households, then in the long run 16.1% of the time the household will have
exactly 3 pets. Or…16.1% of all households have 3 pets. o) Technically it means this: If we
repeatedly sample one household randomly, then in the long run the mean number of pets is 1.80
with standard deviation 1.33. Here’s the better way: Examining all possible households, the
mean number of pets is 1.80 with standard deviation is 1.33.
5. a) The units are students with Apple computers; the variable is weekly number of downloads.
b) 0.039. c) 0.961. d) 0.401. e) 0.599. f) 0.905. g) 0.202. h) 0.827. i) Mean: 3.099; SD: 1.756. j)
3.9%. k) 0.613; 0.961.
Discrete Populations and Probability Distributions
Page 11
60
50
40
Percent
6. a) See the histogram. Notice the “outlier” at 100. b)  =
–0.10 (a loss of 10 cents on average),  = 8.67. c) 0.54,
0.40. d) The player will go broke. On average in the long
run the player loses 10 cents per game. So in the very
long run this will lose the player huge amounts of money.
(However: If you had a million dollars to lose, it would
take you around 10 million plays to lose it. So you’d keep
entertained for quite some time.)
30
20
10
0
-5 0 5 10
100
Profit ($)
7. a) 0.191. b) = 0.808. c) 0.144. d) 0.798. e) 0.8902. f)
Yes. The mean payout of only $0.80 is easily offset by the $2 cost to play. In the long run the
carnival makes a mean of $1.20 per play.
8. Partial solutions.
y
p(y)
1
log 2 – log 1 = 0.3010300
2
log 3 – log 2 = 0.1760913
3
log 4 – log 3 = 0.1249387
4
log 5 – log 4 = 0.0969100
5
log 6 – log 5 = 0.0791812
6
log 7 – log 6 = 0.0669468
7
log 8 – log 7 = 0.0579919
8
log 9 – log 8 = 0.0511525
9
log 10 – log 9 = 0.0457575
Total log 10 – log 1 = 1 – 0 = 1  0.9999999
Mean = 3.440237; Variance = 6.05651; St Dev = 2.460998
Discrete Populations and Probability Distributions
Page 12