Download Notes - Voyager2.DVC.edu

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Chapter 4: ORGANIZING POPULATIONS
There are two reasons for studying samples rather than whole populations when applying
statistical methods: (1) convenience and (2) the Law of Large Numbers. Samples are smaller than
populations and thus easier to manage/analyze (1). Also, we may not be able to observe an entire
population when organizing and analyzing information.
But large enough samples, "representative"
samples, are sufficient to answer questions dealing with the relative frequency (or probability) of
unobserved elements in populations (2). In order to accumulate a large enough sample (a representative
sample) we must first be able to characterize the entire population of a data set. Chapter FOUR is the
first step in the process of describing a population.
Probability Theory (chapter 4) and Probability
Distributions (chapter 5) will enable us to organize information about a population in a way we can
compare sample information to population data: this enables us to figure out if a sample “represents” a
population in the range and frequency of values contained therein before we reach any conclusions from
said samples.
Probability Distributions enable us to organize information about a population in a way that we
can easily compare to sample information/data.
Following our coverage of Probability Distributions
(chapters 5 and 6) we will be able to ascertain when a sample “represents” a population in the range and
frequency of values contained therein (chapter 7).
BASIC TERMINOLOGY
A random event is one which has not one but many different possible outcomes. The collection
of all outcomes or elements, observed or unobserved, constitute the sample or probability space for a
random event under study. When the sample space is unknown, the number of outcomes and their
relative frequency are established by performing many, repeated "statistical experiments" or by
collecting observations in a systematic fashion and listing all different event-occurrences or outcomes, as
well as their frequency. Calculating the relative frequency table of many repetitions of these experiments
yields, according to the Law of Large Numbers, an estimate of the true probabilities of all outcomes in
the sample space.
Probabilities of outcomes may also be defined by the so called classical approach: this is the
approach we employed in the first two lectures on probability in class and which is covered in your
textbook in great detail.
1
Chapter 4. PROBABILITY THEORY
Probability theory is the study of uncertainty. Its purpose is to enable the understanding of events
and/or outcomes which are not [definitively singular or known in advance].
OUR purpose in studying
Probability Theory is to continue the process of learning about organizing and analyzing samples from
populations. Probability theory will enable us to organize populations so that we can then proceed with
our course objectives (Analysis and Inference from sample data).
RULES OF PROBABILITIES
A.
Since true probabilities for events in a sample space are the relative frequencies of occurrence of
events, they must at least obey the rules of relative frequencies:
1. Each probability is at least 0 and at most 1 in value.
2. The sum of all probabilities of all events in the sample space is 1.
3. A zero (0) probability event is one which can not happen, just like a probability of 1 implies such
event will happen with certainty.
B.
When simple outcomes combine to form other, more complex events, called compound events,
the probability of the combined outcomes will involve operations on the "unconditional"
probabilities of each simpler event. Two factors to assess before knowing how to calculate these
more complex probabilities are Independence
1
of events and mutual exclusiveness2
of
events.
C.
Compound probabilities of three different kinds: CONDITIONAL (GIVEN THAT), AND-type (the
intersection of sets of outcomes, ) and OR-type (the union of sets of outcomes, )
combinations.
1) The CONDITIONAL type of compound-event probabity is evaluated by counting the frequency of
occurrence of the event whose probability we are evaluating, in relation to the event we are using
as a "conditioning agent" in the probability statement.
2) The latter two cases of probabilities for compound events, namely the probability of the
intersection and the union of events, are measured using the MULTIPLICATION RULE of
probability, and the ADDITION RULE of probability, respectively.
1
Two events are Independent when the probability of one event is not changed by the ocurrence of the
other, and vice-versa. Detailed examples will be presented in class.
2 Two events are Mutually Exclusive when the probability of one event changes to zero (0) given the
ocurrence of the other, and vice-versa. Detailed examples will be presented in class.
2
PROBABILITY THEORY
A horse race [EVENT] is going to take place. It is going
to involve 8 horses that are identical in ability, jockey
chemistry, track record, etc.
2
Assume that only and exactly one horse can win the race.
At the end of the race an announcement is made that
sounds like "...and horse '#' wins the race"
[OUTCOMES].
Simple outcomes of this event are descriptions of the
individual horses that may win the race.
For example, "horse 3 wins the race" is a simple
outcome.
3
1
4
5
8
6
Compound outcomes of this event are descriptions of
groups of individual horses that include the winner among
their group.
For example, if Allan bet on horses {2, 5, 7} to
win, then "{2, 5, 7} wins the race" is a compound outcome.
7
The classical approach to calculate probabilities of simple and compound events/outcomes that are
UNCONDITIONAL (as in the two above examples):
P(outcome) 
count of # of ways outcome occurs
size of the probabilit y space
Examples:
P(horse 3 wins the race) = 1/8
P(A) = P(Allan wins the bet) = P({2, 5, 7} wins the race) = 3/8
Given (unconditional compound) outcomes:
A = {2,5,7}; B = {2,4,8}; C = {1,3,6}; D = {1,3,5,7}; E = {2,4,6,8}; F = {1,2,3,4}
Other types of compound outcomes:
1. Conditional, |, outcomes.
2. Intersecting (the intersection, , of 2 unconditional outcomes).
3. The union, , of unconditional outcomes.
Conditional (restricted) Probabilities:
P(outcome | given occurrence) 
restricted count of # of ways outcome occurs
restricted size of the probabilit y space
For example:
P(A|E)= The probability of A winning the bet given that you know E was a winner
=1/4
= 25%
Suggestion: always begin by calculating the denominator (bottom of the fraction).
Try: P(B|A) "dependence"; P(C|E) "mutually exclusive"; P(F|E) "independence".
The intersection of events and the Multiplication Rule of Probability:
3
P(outcome1  outcome2)  P(outcome1)  P(outcome 2 | outcome1)
 P(outcome 2)  P(outcome1 | outcome 2)
For example:
P(C and D) = P( C D) = P( C)*P(D|C) = (3/8) * (2/3) = 2/8 = 25%
Suggestion: calculate each part separately and then multiply them.
Try: P(A and B); P(E and F); P(C and E).
The union of events and the Addition Rule of Probability:
P(outcome1  outcome2)  P(outcome1)  P(outcome 2) - P(outcome1 outcome2)
For example:
P(C or D)
= P( C
D)
= P( C) + P(D) - P(C and D)
= (3/8) + (4/8) - (2/8)
= 5/8 = 62.5%
Suggestion: calculate each part separately and then multiply them.
Try: P(A or B); P(E or F); P(C or E).
COUNTING RULES
Counting Rules make the calculation of odds or probabilities a lot easier. They help us organize
events by informing us of the number of possible ways in which we can arrange them. In class, we will
cover the following:

Fundamental Counting Rule: counts the number of sequences that are possible when matching
the number of outcomes of one event with the outcomes of any number of other events. For
example, if you own 3 slacks, 4 shirts, 2 pairs of shoes, 2 pairs of socks, 1 undergarment and 2
coats, then you can combine your dresswear into 3x4x2x2x1x2 = 96 distinct outfits.

The Factorial Rule (!): counts the number of sequences that are possible when matching
sequences of numbers in arrays of length equal to the number of possible outcomes.
For
example, 5! = 5x4x3x2x1=120 = five numbers can be arranged in 120 different 5-tuples.

The Permutations Rule (nPr): counts the number of sequences that are possible when matching
distinct sequences of numbers in arrays of shorter/lesser length than the number of possible
outcomes, in which order matters.
For example, 5P2 = 5x4 = 20 = there are 20 ways of
arranging numbers in pairs (sequences of length 2) when one can choose from five distinct
numbers, in which order matters.
4

The Combinations Rule (nCr): counts the number of sequences that are possible when
matching distinct sequences of numbers in arrays of shorter/lesser length than the number of
possible outcomes, in which order does not matter. For example, 5C2 = 5x42 = 10 = there are
10 ways of arranging numbers in pairs (sequences of length 2) when one can choose from five
distinct numbers, in which order does not matter.
Counting Rules
I.
Fundamental counting rule: the number of possible sequence-arrangements of joint
compound events equals the product (multiplication) of the number of arrangements of each
component/part.
For example, if a car model can be offered to customers in 4 interior colors and 8
exterior colors, then the total number of car arrangements (by interior-exterior) is
4*8 = 32.
II.
Factorial counting rule: the number of possible arrangements of distinct sequences of n
objects into n-tuples is equal to n! (which reads "n factorial")
For example, how many quartets of the below objects can be formed so that each object
only appears once in the 4-tuple. Answer : 4! = 4*3*2*1 = 24 ways.
Or
Or
Or
The first object can be selected to fit in any of four 4 spots in the sequence; once a spot is selected for it,
it cannot be used again in the sequence (because of the distinct nature of the sequences), and neither
can the spot reserved for it be employed by another symbol.
or
or
The second object can be selected to fit in any of three 3 remaining spots in the sequence; once a spot is
selected for it, it cannot be used again in the sequence (because of the distinct nature of the sequences),
and neither can the spot reserved for it be employed by another symbol.
Or
The third object can be selected to fit in any of two 2 remaining spots in the sequence; once a spot is
selected for it, it cannot be used again in the sequence (because of the distinct nature of the sequences),
and neither can the spot reserved for it be employed by another symbol.
The fourth object can be selected to fit in the only (1) remaining spot in the sequence.
Thus, applying the fundamental counting rule, we get that the total number of arrangements is
4x3x2x1 = 24. This special pattern of counting is called factorial because we are "factoring out" one
possibility after every consecutive selection in the sequence --until we exhaust the list of items being
counted. This accomplishes our objective to have every item of the sequence be different from every
other one item in the list (distinct), while accounting for all items on the list.
5
III.
Permutations counting rules (shuffles): count the number of distinct sequences of items
picked from a larger list of possibilities. A permutations count is essentially a factorial count
cut short because not all possible items get selected.
For example, "count the number of was in which 2 items can be picked out of a list of four
items in such a way that the order of the sequence is important." Answer: 4P2 = 4 x 3 = 12
ways.
Or
Or
Or
The first object can be selected from any of four 4 available; once a spot is occupied by one of the items,
it cannot be used again; neither can the object repeat itself in the sequence (because of the distinct
nature of the sequences).
or
or
And the second picked object can be any of three 3 remaining choices.
Combinations counting rule: count the number of distinct sequences of items picked from a larger list
of possibilities in such a way that the order of appearance of the elements does not "matter" (affect the
total count of sequences).
For example, "count the number of ways in which 2 items can be selected form a list of four
items in such a way that the order of appearance of items in a sequence does not matter.
Answer: 4C2 = (4P2) 2! = (4 x 3) 2 = 6 ways.
3 ways =
+
2 ways =
+
1 way =
adds up to 6 ways.
6
Probability Distributions
Probability Distributions are complete descriptions of populations that tell us what occurs in the
population (probability space) and how often we observe all different occurrences (probabilities of all
outcomes in the probability space).
Example #1:
EVENT: # of tails in two (2) tosses of a fair coin
OUTCOMES: H, T
PROB. SPACE: {0, 1, 2}
SIZE OF PROB. SPACE = 4
Tree Diagram of all Outcomes of the above EVENT
HH
0
HT
1
H
TH
1
T
TT
2
H
H
T
T
X
0
1
2
All (0X2)
P(X)
1/4
2/4
1/4
4/4 = 1 = 100%
A probability Distribution is
the equivalent of a relative
frequency table for a
population of data
7
Mean of a population, :
 = middle value of a population
= (not statistical, but) weighted (P(X)) average of all population values (X)
=  X*P(X)
Process steps:
1. Multiply outcomes by respective probabilities: X*P(X)
2. Add all up
Example #1 (cont.):
Step 1
P(X)
X*P(X)
1/4
0
2/4
2/4
1/4
2/4
4/4 = 1 = 100%
1 = (Step 2)
Variance of a population, :
 = average squared dispersion in the population
= (not statistical, but) weighted (P(X)) average of all population squared deviations from .
=  (X - )2*P(X)
Process steps:
1. Difference the outcomes with the population mean, , to get DEVIATIONS.
2. Square DEVIATIONS.
3. Multiply DEVIATIONS by respective probabilities: (X - )2*P(X).
4. Add all up.
Step 1
Step 2
Step 3
X
P(X)
X*P(X)
(X - )
(X - )2
P(X)*(X - )2
0
¼
0
-1
1
1/4
1
2/4
2/4
0
0
0
2
¼
2/4
+1
1
1/4
1/2 = 
4/4 = 1 = 100%
All (0X2)
1=
Step 4
X
0
1
2
All (0X2)
8