Download Advanced Probability and Statistics Module 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Advanced Probability and Statistics Module 4
Aloha, stat people. This problem set focuses primarily on probability and its application to statistics. This is the really cool stuff. It
corresponds to chapters 7 and 8 in the book. There are a few topics from these chapters that I’ll cover in the next module. I’ll supplement
the book with some additional material and examples. I’ll also try to lay a more rigorous foundation for probability than the book does by
incorporating some set theory, much of which you already know. Here we go.
Here’s an example of a random experiment: roll a die (singular of dice) and see how many spots are on top of it when the die stops rolling.
I agree, it’s a boring example, but it’s simple and familiar. Anyway, it’s a random experiment since there is no way to determine the
outcome before the roll. One outcome is, of course, 5 spots. The sample space, S, of the experiment is the set of all possible outcomes.
Here, S = {1, 2, 3, 4, 5, 6}. An event is any subset of S (any collection of outcomes). The event E1 = {1, 3, 5} is the event that an odd
number of spots faces up on the die. Note that E1  S, as required by the definition. The event E2 = {6} is called a simple event since it
consists of just one outcome.
1. How many simple events are there for this situation?
2. There are  6  = 15 different events with exactly two outcomes since this is the number of ways two outcomes can be
 2
 
chosen from six. List a few of them. Pick one and state exactly what it means, even though it may seem obvious.
3. Calculate, separately, the number different events with 3, 4, 5, and 6 outcomes.
4. If asked to list the all the events that contain no outcomes at all, you’d have to find a subset of S with zero elements.
This would be the empty set, { }. (The empty set is a subset of every set.) Counting the empty set, how many total
different events are there? Hint: the answer should equal 2 6.
5. It is no coincidence that the last answer was two to the power of the number of elements in S. In general, the number of
subsets of a set with n elements is 2n. Curious, huh? An explanation is forthcoming. For now, how many subsets are there
of the set containing the letters of the alphabet?
6. Demonstrate that this trick works with the set L = {a, b, c} by listing and counting all subsets. Do is systematically by
listing and counting all the zero-element, one-element, two-element, and three-element subsets.
7. When counting the subsets in the last question, you should notice that those four numbers appear in the third row of a
special triangle (where the pinnacle is row zero). Dag gone, is that cool, or what?! This is no coincidence either. Use the
fifth row of the triangle to state how many zero-, one-, … , five-element subsets there are for {Moe, Larry, Curley,
Shemp, Joe}. Then add these up and make sure you get 25.
8. Experiment: The number of cups of green tea drunk by someone during one day’s time is counted. (Green tea has lots of
antioxidants, which help fight damaging free-radicals produced in your cells due to metabolism.) Assume that to make
and drink a cup of tea takes at least 10 minutes. How many different “tea events” can be defined?
9. Let’s see why the number of sets has of a set with n elements has 2 n subsets. Here’s the simple Beuschlein proof.
Forming a subset is like shopping. You walk past each of the n elements in the set and decide whether or not to put it in
your shopping cart. With each element you have a decision to make: to put it in the cart or to leave it on the shelf. You
have n decisions to make, each of which can be made in two different ways. If you’re a finicky shopper and say no to
each element, your cart will be empty; this is the empty set (the only zero-element subset), and it’s one possible “fillstate” your cart can be in when the shopping is done. If you’re not finicky at all, you’d say yes to each element and
have the only n-element subset, the set itself. All other subsets have more than one element. Since there are two
options for you upon approaching the first element, two at the second, and on down the line, there are 2 × 2 × ··· × 2 =
2n ways to fill your cart! Hence, the number of subsets is 2n. For what values of n does this formula hold?
10. Ok, but you might be wondering why combinations add up to 2 n. In problem #4 you’ve already demonstrated that
n
n
  k   2
k 1
 
n
works for n = 6. It works in general, and there’s a highly cool, easy proof. It’s simply a matter of counting
the same thing in two different ways. I think you’d agree that if you count correctly the number of anything in two
different ways, those numbers have to be equal. That is, the “count” of anything countable can be at most one natural
number. Simple idea ... let’s use it. A set with n elements has got exactly one number of subsets, and that number is 2 n
(shopping cart method). But we can count the subsets differently by counting all the zero-, one-, two-, … , n-elements
separately and adding them up. This would be  n    n    n      n  which is the summation above. Proof done! Use
 0  1  2
n
     
 
the formula to explain why elements in an arbitrary row n of Pascal’s Triangle always add up to 2 n. Hint: think about
coefficients.
11. In thermodynamics you learned about macrostates and microstates (as related to entropy). Which is analogous to
outcomes and which to events?
Back to the boring die example. A random variable is defined as a function that assigns a real number to every element in S. We typically
use capital letters for random variables. We could define a random variable X like so: let X(s) = s  sS. The “” symbol means “for all,”
or “for each.” Thus, X simply corresponds to the number of spots face up. Event E2 was defined above as {6}, so the probability that a 6
comes up is written P(XE2) or, more simply, P(X = 6). Why bother with the fancy-schmancy-upside-down-A set stuff? Well, for the same
experiment we could have a different random variable Y defined like this: Y(s) = 10s. Then, P(6 lands up) = P(Y = 60). The sample space of
the experiment hasn’t changed, but each element of the sample space has been mapped to a number ten times bigger than itself. I know, I
know, it sounds like a bunch of math-babble nonsense for no good reason, but bear with me, and you’ll see that sometimes the sample
space does not contain the type of numbers we want to work with or it is not made up of numbers all.
12. Assume the die is fair. True or false: P(X  5) = 3 P(X = 4) + 2 P(X > 5)
13. Let Z be a random variable for our die situation defined as Z(s) =
 1, s  4 

 . Note the Z is indeed a function mapping
 0, s  5 
each element of S to exactly one real number. Find P(Z = 1) and find P(Z = 0).
14. Every time our experiment is carried out, X is 1 through 6, while Z is only 0 or 1, for s = 1 through 6. Interpret Z in a
dice game in which a winning throw is a 5 or 6, and a losing throw is a 1, 2, 3, or 4.
15. Suppose you a conduct an experiment consisting of tossing fair, tetrahedral die, and note what number lands face
down. In order for a tetrahedral die to be fair, it must be a regular tetrahedron, just as a cube is a regular hexahedron.
Why did I say face down?
16. There are many different shapes that can be used as fair dice, but only 5 of them are have congruent faces that are
regular polygons. What are the names of these shapes, and how many sides does each have?
17. What is the sample space for the tetrahedral die experiment?
18. How many two-outcome events are there?
19. The rules of a game state that you move forward 16 spaces if you roll a four but only 9 spaces if you roll a three. You
move backwards 4 spaces if you roll a two but only 1 space if you roll a one. Define a random variable that describes
how you’ll move with each possible outcome. Use function notation. Note that the sample space of the experiment is
different than the range of your random variable (which is called the “space”, rather than “sample space”.)
20. New experiment: you toss your tetrahedral die along with a quarter, noting both the number face down on the die and
whether the quarter comes up heads or tails. Write the sample space as a set of ordered pairs. Hint: see the next
question if you’re dumbfounded.
21.  sS, s = (x, y) where xA = {1, 2, 3, 4} and yB = {H, T}. Thus the sample space consists of every possible
combination of an element of A paired with an element of B, where the element from A is always listed first. One
element of S is (3, H), for example. This set of ordered pairs, S, is an example of a Cartesian product. How many
elements are in S and how do you get that number without writing them all out?
22. Clearly define a random variable, W, for the die-coin situation. Hint: there are an infinite number of right answers; try
to find one of them.
23. Let set Q1 = {Xerxes, Darius, Cyrus} and Q2 = {sword, axe, knife, laser pistol}. The Cartesian product is written Q1 × Q2
and pronounced “Q1 cross Q2.” It might represent all possible ways in which any of these three men could have killed
off a rival with any of these weapons. Write out this product. You may use abbreviations if you like.
24. Q1  K, where K is the set of all kings from what kingdom? Note: I left out the numbers after their names. (Since we’re
talking random experiments and random variables, I figured it would be appropriate to throw in a random question. I
hope you don’t mind.)
25. Define two sets of your own like I did above. Then define your own reasonable and interesting random experiment
along with a random variable that pertains to two sets. Make the sample space of your experiment the Cartesian
product of the two sets.
Our boring random experiment with one normal die could be performed a dozen times with the results expressed as a sequence:
{3, 4, 1, 1, 3, 6, 1, 2, 6, 4, 3, 1}. Note that sequence notation and set notation both make use of braces, {}, but sets and sequences differ in
that a sequence can have repeated elements and the order of those elements matters. We could call our sequence of outcomes {ak} where a1
= 3 was the first outcome, etc. The eighth roll yielded a 2, so a8 = 2. So the outcomes of a random experiment can be expressed as a
sequence {ak} where ak = kth outcome. Each outcome is an observation or measurement performed repeatedly after each experiment (trial).
The observations/measurements must all pertain to the same attribute of the experimental outcomes. For example, our attribute of interest
is the number of spot facing up. This is what we observe and record (in order) every time we conduct the experiment. We don’t record, for
example, the number of spots facing up on the first roll, the amount of time in seconds it takes the die to stop rolling on the second roll, and
the x-coordinate in centimeters of the resting position of the center of die on the third. If we did our sequence would not be well-suited to
analysis, since we’re mixing attributes. We could, however, record each of these pieces of information after every single roll and create a
sequence of “three-dimensional” outcomes such as {(3, 1.05, 2.4), (4, 0.892, -5.7), … , (1, 9.4, -2.6)}. Here, each of the 12 elements in the
sequence is an ordered triple giving three separate pieces of information. Each piece of info is an observation or measurement regarding
specific attributes of the outcome. S would be a set of ordered triples as well. To be specific, S is the set of all ordered triples (m, t, x) such
that m is any possible number of spots facing up, t is the time in seconds of the roll, and x is the x-coordinate of the center of die when it
stops relative to some fixed reference point. Therefore, S ={(m, t, x): m{1, 2, … , 6}, t(0, ∞), x(-∞, ∞)}. The colon in the preceding
equation means “such that.”
26. In Excel use the Int and Rand functions to simulate the roll of a fair die. Either by filling down or using F9 on a PC, find
how many rolls are required to get back-to-back identical rolls. The random experiment here is not the roll of a die but
the number of rolls required to observe back-to-back rolls. For example, if in a particular trial you successively roll
2, 5, 6, 2, 3, 2, 1, 4, 3, 3, 1, 2, 5, 4, 4, 1, … , the outcome of this trial is 10. What is the sample space?
27. Perform this experiment 50 times and create a sequence of outcomes. Hint: I found a very easy way to do this. Put the
random formula in cell A1 and fill down column A (about 40 rows should be sufficient). In cell B2 enter
=IF(A1=A2,"repeat","") and fill it down column B. Record the position of the second “repeat” in the first back-to-back
pair. How cool is that? Record the position right in Excel, and when you hit enter the random values will automatically
change. Do this until you’ve recorded 50 observations.
28. How many arguments does the If function take in Excel? Explain how it’s working in the formula I provided for you.
29. Let E be the event “four rolls or fewer.” How many times did E occur? Hint: You’ll like this method. Here’s what I did. I
listed all fifty outcomes in the range D2:E26. (It’s fine if you use a different range.) Then in a different cell I entered
=COUNTIF(D2:E26,"<=4") which gave me 15. (Your answer will likely differ.)
30. Just for the fun of it, let’s find all the cells with 4 or less that Excel counted up for you. You can do this quickly by
selecting your data (the outcomes), then go to Format menu, Conditional Formatting, Cell value is … less than or equal
to … 4. Then click the Format button and change the font to red and bold. Finally, on the Patterns tab change the
background to yellow. This should turn all cells with 4 or less a bold red on top of yellow. Hot digity!
31. If our random variable here is X, estimate the probability P(X  4).
32. Estimating probabilities with a computer simulation, by flipping a coin, or whatever is called the Monte Carlo method.
Its drawback is that it is empirical (based on experimentation) rather than theoretical, so we technically can’t use it to
do formal mathematical proofs. On the other hand, it is sometimes vastly simpler to create a simulation rather than
solve the problem analytically. It’s a very powerful tool, especially in the computer age. How could make the
probability you estimated in the last question more accurate?
33. Let n be the number of trial runs of our simulation (the number of experiments we do). n = 50 in your spreadsheet. The
# (E)
. If this limit does not exist then the probability of the event is not
n
n
exact probability can be expressed as lim
defined. What does the # sign mean, what’s the point of the limit, and why does this statement need no proof?
34. Let’s now find the relative frequency of each simple event. Make a three-column table in Excel: the first column for X,
the second for the frequency of X, and the third for the relative frequency. In the X column enter the values ‘2, ‘3, ‘4,
up to the max value you had for X. For graphing purposes later you’ll want these numbers to be entered as text, which
is why you’ll need the single quote before each. Enter and fill down a short formula for frequency making use of the
Countif function. There’s an easy formula for relative frequency. Check your formulae by summing (separately) the
second and third columns.
35. Make a relative frequency histogram by graphing the first and third columns. To select the first and third columns
without the second, press the Control key after selecting the first column and hold it while selecting the third. This is
how noncontiguous blocks are selected. In the histogram set the Gap Width to zero and delete the stupid legend.
36. What does the area of any one column represent?
37. What does the total area equal and why?
38. If n if were much, much greater than 50 (n >> 50), repeat the last two questions.
39. Suppose your relative frequency histogram is update after each trial, beginning with the first. Describe what you would
see as you perform more and more trials. What law can you site to explain what you see?
40. The distribution of X should be right-tailed. What is the simple interpretation of this?
41. X is a discrete distribution because it can only have certain values, such 17 and 18, with nothing in between. If W is a
random variable for the weight of Urbana wrestlers, on what grounds would one argue that W is continuous rather than
discrete?
42. On what grounds could someone else argue that W is discrete?
43. The graph on page 203 is not a histogram; rather it is a graph of rel. freq. vs. n. A histogram has all possible values of
the random variable on the x-axis rather than number of trials. Experiment: Draw a card from a shuffled deck
containing 54 cards (a normal deck with two jokers) and note the card. Let X be the random variable defined by X(s) =
s and let Ered be the event of “any red card drawn.” Assume jokers in this deck are neither black nor red. Describe a
graph of rel. freq. of Ered vs. n.
All right, back once again to the boring, 6-sided die example. If the die is fair, we say its probability distribution is uniform—graphically, a
horizontal line. If the center of mass of the die were not quite at its geometric center due to, say, more weight loss from the depressions on
the 6 side than on the 1 side, the distribution would no longer be exactly uniform. Since the 1 side has fewer holes, it’s just a tad bit heavier
and, thus, more likely to land face down. So, the probability histogram would dip a bit below 1/6 at X = 1 and it would be a half a smidgeon
above 1/6 at X = 6. All probabilities would still add up to one, of course.
Many distributions are nowhere near uniform. The earlier problem of repeat rolls of a normal die certainly was not uniform. A symmetric
distribution does not imply a uniform one. The sum of two dice, for instance, has a symmetric distribution, but it’s not uniform. In this case
S = {2, 3, … , 12}. If the random variable is X(s) = s, then P(X = 2) = P(X = 12) = 1/36, since each simple event can only occur in one way
out of 36 equally likely ways. Similarly, P(X = 3) = P(X = 11) = 2/36 and P(X = 4) = P(X = 10) = 3/36. Note the symmetry with respect to
the middle and that the distribution rises and then falls, much like a normal distribution. In fact, if n is large enough, and if the distribution
is approximately normal, we often pretend the distribution is normal, which makes computing probabilities much simpler, as you shall see.
Make sure you realize that the formula P(event) = (# of outcomes in event)  (# of possible outcomes) is only valid when all outcomes are
equally likely. The sum of dice above has a sample space of 11 possible outcomes. If we applied this formula, the probability of any sum
would be one out of 11. This is wrong because the outcomes aren’t equally likely. However, if redefine the experiment slightly so that we
now note not the sum but the number of spots on each die as ordered pairs (die 1, die 2), our sample space becomes a set of 36 outcomes,
all of which are equally likely. Now the formula applies, and the probability of getting a particular sum is the probability of an event that
could contain many outcomes. For example, a sum of 7 is the event {(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)}. The 6 outcomes in this event
divided by the 36 possible outcomes gives the correct probability of 1/6. Experiments don’t have to have outcomes that are equally likely,
but only use the formula when it applies!
44. What is the probability of rolling three fair dice and getting a sum of 10? Hint: List all outcomes in the event
systematically. Clever use of a spreadsheet will be helpful but is not necessary. The answer is a little over 10%.
45. Make a probability table for the random experiment of counting the number of heads that come up when flipping 10
fair coins simultaneously. (A table or formula that lists probabilities for all outcomes is called a probably density
function.) Find the probabilities theoretically. Hint: to find, say, the probability of 6 heads, you have to figure out the
number of ways 7 heads can appear on 10 different coins and divide that by the number of total possible head-tail
outcomes. One number requires a combination; the other does not. The answer for 7 heads is about 0.117. You could
also use the general formula in the book. To find these probabilities quickly in Excel, use the Combin function.
46. You don’t have to create a histogram, but look at your table and describe what it would look like.
47. Write the probability density function, p.d.f., as a formula using combinatorics symbols. The p.d.f. can be written as
f(x) = … , where xX.
48. If X is the number of heads, what is P(X  4)?
49. This type of distribution is called b
and it results from repeated trials of b
data, like heads/tails. It
works even if the probability, p, of getting heads is not 0.5. If you’re a 61% free throw shooter, what is the probability
that you can step up to the line and hit exactly 7 out of 10?
50. As the book explains this type of distribution has a mean of np. This is also called the “expected value,” and its symbol
is . In the free throw example, your expected value is 0.61, even though the closest you can get to hitting 61% in ten
shots is six. In other words, if you were to shoot thousands of free throws (without getting tired or improving with
practice) we would expect you to make 61% of them. The more you shoot, the more likely your percentage will be
about 61%. It might seem intuitively obvious that  = np, but there is rigorous proof for it. Proofs aside, there is also a
formula for the standard deviation of this type of distribution:  = [np(1-p)]1/2. Consider dropping 100 fair coins. For n
this large the distribution is approximately normal. That stated, between what two values of X (the number of heads)
would you expect to find about 68% of the drops to be within? Hint: think back to Module 3.
51. As n, the # of coins, increases so does  and  , but the “spreadoutness-to-range ratio” is a constant. Show that the
variance of X divided by the range of X is always 0.25 no matter what n is.
52. To find the probability of getting at most 4 heads out of 10 flips, you’d have to compute and add up five combinations.
Imagine if, for 100 coins, you wanted to know the probability of getting at most 43 heads. How many combinations
would you have to add up?
53. As fun as computing and adding combinations is, after a while it starts to get old. Fortunately, there is another
approach. Once again, since this distribution is approximately normal, and since n = 100 is large, we’ll use a normal
distribution to solve the problem. Find  and  . Then find the Z score by transforming X as was done last module.
Finally, use the standard normal table in the front and back covers of the book to approximate the probability of
getting at most 43 heads, that is, zero through 43 heads, inclusive.
54. Check your last answer by comparing it to the exact answer that can be found fairly quickly in Excel. What is your
percent error? Hint: In Excel put the # of heads in column A, beginning with zero, and the probabilities of getting
exactly that many heads in B using the Combin function and division. (The probabilities are extremely tiny at first!)
Then sum the probabilities for zero through 43 heads, which should be between 9% and 10%.
55. Estimate and find the exact probability for flipping 100 coins and getting between 35 and 56 tail out of 100 flips.
56. Schmedrick is playing with his piggy bank one day when he accidentally drops it. It shatters and all the coins spill out.
He notices that 75% of the coins show tails. What can you safely (but not definitively) conclude about Schmedrick’s
saving ability?