Download here - BCIT Commons

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
MATH 2441
Probability and Statistics for Biological Sciences
Introduction to Probability
The mathematical theory of probability provides concepts and methods for measuring and taking into
account uncertainty and randomness (really aspects of the same thing). In a statistical experiment or study,
the only information we have about a population is the information obtained from a random sample of that
population. To connect what we observe for a random sample and the properties of the corresponding
population, we cannot avoid looking at least briefly at some basic notions of probability.
By a probability, we mean a numerical (quantitative) measure of likelihood. To make this term more
precise, we need a bit of jargon.
Random Experiments and Events
First, we will use the phrase random experiment or chance experiment to refer to the act of making an
observation or taking a measurement where we cannot predict with certainty what the outcome of that
observation or measurement will be (even though we can list or describe all possible outcomes that may
occur).
Thus, flipping a coin is a random experiment. We know that the coin will fall with either the heads or the tails
face up, but we can't predict in advance which of those faces will be exposed upwards before the coin is
flipped. Similarly, to shuffle a deck of standard playing cards, and then select one card from the middle of
the deck, is a random experiment. We know that that selected card will be one of the 52 cards normally
found in such a deck of cards, but we can't predict with certainty which card will be selected. Another
example of a random experiment might be to have a technologist toss a dart over his shoulder into a bin of
apples, then select the apple impaled by the dart, and weigh it. Although we can probably be fairly certain
that the outcome of that experiment will be a number between 0 grams and, say, 10,000 grams (just to be
on the safe side -- surely there are no apples heavier than 10 kg), we cannot predict in advance precisely
what weight in grams will be obtained when the experiment is performed.
Notice that each of these random experiments results in one outcome from among a set of several or many
possible outcomes. Such outcomes of random experiments are called events. Events are classified as
being either simple events (which cannot be expressed in terms of combinations of even simpler events) or
as being compound events (which correspond to a collection of simpler events). Typically, events are
denoted by upper case characters taken from the beginning of the alphabet (A, B, C, etc.). Subscripts are
sometimes used to distinguish between elements of a set of related events (as in A 1, A2, A3, etc.).
In the experiment of flipping a coin, there are only two possible outcomes, which we can write H (for heads
up) and T (for tails up). Neither of these can be written as a collection of even simpler outcomes, so these
are simple events. On the other hand, in drawing a card from a deck of playing cards, the event that a
diamond is drawn corresponds to 13 simpler outcomes (that the ace of diamonds, or the 2 of diamonds or
the 3 of diamonds, etc. is drawn). Thus, the event that a diamond is drawn is a compound event,
expressible as a set of 13 simple events.
The set of all possible outcomes of a random experiment (which is the same thing as the set of all events for
a random experiment) is called the sample space,.often denoted by the symbol S.
Two events are said to be mutually exclusive if they cannot both occur at the same time. In flipping a coin,
the events H and T are mutually exclusive. If the "heads" face is up, the "tails" face cannot be up. An
example of two events which are not mutually exclusive are the events A = a '2' is selected from a deck of
playing cards and B = a 'diamond' is selected from a deck of playing cards. If the card selected is the 2 of
diamonds, both events A and B have occurred. (There is also the quite different concept of independent
events, but we don't deal with them until much later in the course.)
With every event A we can associate the complementary event Ac, the event that A does not occur (some
writers call this the event 'not A'). An event and its complement form a very special pair: clearly they are
mutually exclusive, but also, they cover all possible outcomes of an experiment. Thus, if A is the event that
a flipped coin falls heads up, then Ac is the event that it falls in such a way that the "heads" side is not up.
David W. Sabo (1999)
Basic Probability
Page 1 of 5
We could call that outcome "not heads", though in this case it is simpler to just say "tails". But note: a coin
can either fall "heads" up or "not heads" up. There is no other possibility.
Similarly, if we have a room filled with fruit flies having eyes of various colors as a result of being exposed to
some chemical that causes genetic mutation, we may define the event A to be that a fly selected at random
has red eyes. The event Ac then is the event that the fly selected has "non-red" eyes or eyes that are not
red. Clearly, any fly you select will either have red eyes or will have eyes that are not red. There are no
other possibilities.
Random Variables
Often, the outcome of a random experiment is the numerical value of some property. For example, if we
define
let x = number of heads showing when one coin is flipped
then the value of x that results from the experiment of flipping the coin is either x = 0, or x = 1. However, we
can't predict the value of x in advance, and hence we would refer to x as a random variable. For the applesticking experiment described above, we could define:
let w = the weight of the apple selected (in grams)
Then w would also be a random variable. Its value is determined by the outcome of a random experiment.
We cannot predict the value of w with certainty in advance. Random variables are often denoted by lower
case characters from near the end of the alphabet (for example: x, y, and z).
Since most of the properties we deal with in scientific and technical applications are quantitative (numerical),
the notion of a random variable comes up frequently. From a mathematical point of view, it is necessary to
distinguish between two broad classes of random variables. Random variables which can take on values
from a set of isolated numbers are called discrete random variables. The variable x = number of heads
observed when a coin is flipped, is a discrete random variable because the list of possible values x can have
are the two distinct values {0, 1}. Similarly, the variable y = the number of apples that must be weighed until
an apple weighing more than 300 g is obtained, is a discrete random variable. The values that y can have
are 1, 2, 3, … . Only whole number values make sense here, so the variable y is discrete. Note that
'discrete' doesn't necessarily mean there are only a few possible outcomes (or even a finite number of
possible outcomes). In this case, if apples weighing more than 300 g were rare, an actual observation of y
might result in a rather large number.
When a random variable can take on any value from a continuous or unbroken range of possible values, we
refer to it as a continuous random variable. For instance, the random variable w = the weight of the
selected apple in grams, can in principle be any real (decimal) number between 0 g and some upper limit. If
we ignore some of the fundamental limitations of physics, the possible weights of an apple are not restricted
to whole numbers or even certain parts of whole numbers. Generally, properties that are measured with
respect to a continuous scale (mass, length, area, volume, time, etc.) result in continuous random variables,
whereas properties that result from counting and related operations usually result in discrete random
variables.
Most problems involving discrete random variables can be solved using arithmetic and algebra. Problems
involving continuous random variables are based on methods of calculus though in this course the use of
statistical tables will make it unnecessary for us to employ methods from calculus directly.
When working with a discrete random variable, it often makes sense to talk about the probability of one
particular outcome occurring. For continuous random variables, it only makes sense to talk about the
occurrence of a value of the random variable falling in an interval of some finite length. So, for example, we
could ask for the probability that exactly 5 apples would need to be weighed before we find one that weighs
more than 300 g. On the other hand, it would not make sense to ask for the probability that the next apple
selected has a mass of exactly 300 g (meaning 300 to infinity decimal places), because we have no way of
determining when a mass is exactly 300 g, and clearly, even if the typical apple had a mass in the
neighborhood of 300 g, the more precisely we were able to measure those masses, the less likely we would
be to find an apple with mass 300.00000000000000000000000000000000000000000 g . However, it
Page 2 of 5
Basic Probability
David W. Sabo (1999)
makes logical sense to ask: what is the probability that the mass of the next apple will be between 299.0 g
and 301.0 g.
Probability
Now, we can define what is meant by a probability, and write down some of its basic properties.
Suppose that we repeat a random experiment many times: for instance, we flip a coin many times. Perhaps
we notice that after a large number of flips, that the coin falls heads up about 50% of the time. Indeed, as
we continue to flip the coin, we are well aware that we cannot predict the outcome of any one flip with any
certainty at all, but over the long run, it is more and more obvious that 50% of the coin flips result in heads
up.
In a case such as this, we say that the probability that any one flip of the coin will result in the outcome
"heads" is 50% or 0.50. This means that over the long run, 50% of the coin flips results in heads. It means
that for any one coin flip, the likelihood of getting "heads" is 0.50 (and, in fact, in this case, we are just as
likely no get "not heads" or "tails" -- after all, if we get "heads" 50% of the time, then we must also be getting
"tails" 50% of the time). This definition of probability is called the long-run relative frequency interpretation
of probability. We can formalize it as follows:
suppose a random experiment is performed N times, where N is a very large number.
suppose that in those N repetitions of the experiment, event A occurs m times.
Then, the probability of event A, denoted Pr(A) is given by
Pr( A) 
m
N
(PROB-1)
Remember, the implication here is that N is a very large number, and that as N gets very large this ratio
settles down to some constant value.
The fundamental properties of a probability are clear immediately from this definition. First, since m must
obviously be a number between 0 and N, we must be able to say that
Property 1:
0  Pr(A)  1
meaning that a probability is a number between 0 and 1 inclusive. If
Pr(A) = 0, then event A will never happen (or never be observed). An event A with a probability of
zero is called an impossible event
Pr(A) = 1, then event A occurs whenever the experiment is performed (since m = N must be true).
An event A with a probability of one is called a certain event.
Pr(A) is near zero, then event A is relatively unlikely to occur. Small probabilities mean low
likelihood of occurrence.
Pr(A) is near one, then event A is quite likely to occur. Large probabilities (near 1) mean a high
likelihood of occurrence.
In statistical inference, we will make statements about a population or estimate the value of a population
parameter using information from a random sample. For those results, we will be able to calculate or select
a probability of being wrong (error probability) or a probability of being right. We will never be able to assert
with certainty that our claim about a population is correct. However, if we can make a statement about a
population which has a low probability of being wrong (or a high probability of being correct), to that extent
we can be confident of our claim or justified in making the claim or the estimate.
Secondly:
David W. Sabo (1999)
Basic Probability
Page 3 of 5
Pr(S) = 1
Property 2:
meaning that when the experiment is performed, it is certain that the event S will occur. Remember, S is the
sample space, the set of all possible outcomes. This may seem like a rather obvious thing to say, but this
property has a number of very useful practical consequences.
Thirdly:
The probability of an event A is the sum of the
probabilities of simple events making up A.
Property 3:
This is quite a
powerful and subtle property which we will exploit from time to time throughout the course. Among other
things, it means that if A and B are two mutually exclusive events, and if C is the event that either A or B
occurs, then
Pr(C) = Pr(A or B) = Pr(A) + Pr(B)
(PROB-2)
Since A and B are mutually exclusive, they share no simple events in common. Further, since C occurs
whenever A or B occurs, every simple event in either A or B must be a simple event in C. Thus, C
corresponds to the aggregate of simple events in A and B. Thus, Pr(C) is the sum of the probabilities of the
simple events in A and B, and hence is the sum of Pr(A) (which is the sum of the probabilities of the simple
events in A) and Pr(B) (which is the sum of the probabilities of the simple events in B).
Finally,
Pr(A) = 1 - Pr(Ac)
Property 4:
This is one of the most useful of all four properties from a practical point of view. It means that if we know
the probability of some event A, then it is very easy to compute the probability of its complement Ac. Notice
that this property follows from the previous two. We know that between them, A and A c cover all possible
outcomes, and so together must be equivalent to the entire sample space S. Thus, Pr(A) + Pr(Ac) = 1,
which on rearrangement gives the result above.
The Subjective Interpretation of Probability
The long-run relative frequency interpretation of a probability is intuitive and also gives us an experimental
and mathematical recipe for determining probabilities. However, it really applies only to situations in which it
is meaningful to think of the same experiment being repeated an indefinitely large number of times. This is
fine when we talk about flipping coins or dealing cards, but in many scientific and technical applications of
probability, it doesn't really make sense to consider an experiment repeatable.
For example, according to one table produced by a national bureau of statistics, there is a 57% probability
that a male who has just turned 50 years old will live another 25 years. We understand that what this means
is that of all the males who have just turned 50 years old right now, 57% of them will still be alive 25 years
from now (assuming current conditions continue). However, what does this number mean for an individual
male who has just turned 50 years old. After all, he will either be alive 25 years from now or he will not. For
an individual, there is no such thing as being 57% alive, at least as the word is used here.
In this sort of situation, and others where it really doesn't make sense to think of repeating some experiment
many times, probabilities are taken to be something like a degree of belief in the likelihood of an outcome.
In one sense, the 57% probability of a 50 year old male living another 25 years makes perfect sense as a
long run relative frequency, when applied to a large group of 50 year olds. For the individual male in this
group, it really is an subjective sense that he has a slightly better than even chance of living 25 more years.
He is not certain to live another 25 years -- that would require a probability of 100%. Nor is his life certain to
end before 25 years are up -- that would require a probability of 0%. The probability of 57% means that it is
slightly more likely than not that he will live at least another 25 years.
Page 4 of 5
Basic Probability
David W. Sabo (1999)
You see from this that it is often not too useful to attempt to determine probabilities accurate to a large
number of significant figures. If this fictitious 50 year old man were told that the probability of living another
25 years was really only 56.5%, he would probably not consider his future prospects seriously diminished.
On the other hand, it could be quite sobering to be told that rather than a 57% chance of surviving another
25 years, you had only a 5% chance of doing so.
Although we attach probabilities to many results and conclusions about populations, and the meaning of
those probabilities grows out of the long run relative frequency interpretation, keep in mind that rarely do we
picture the outcome to which the probability is applied as being one of a large set of repetitions of some
experiment. I may tell you that there is a 95% probability that the Rhinoceros Party will win the next
Canadian federal election. This means that if I had performed the same experiment many, many times, and
carried out the projection of a winner in the same way each time, then my methodology would correctly
predict the winner in 95% of the elections so analyzed. However, since the next Canadian federal election
can only be done once, and I will either be right or wrong about the Rhinoceros Party winning, we must
interpret this 95% value as simply a statement of very strong confidence in my prediction.
David W. Sabo (1999)
Basic Probability
Page 5 of 5