Download Lecture 10 - Rice University

Document related concepts

Foundations of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
Permutations & Combinations and
Distributions
Krishna.V.Palem
Kenneth and Audrey Kennedy Professor of Computing
Department of Computer Science, Rice University
1
Take Home II - Generalizing the
sum of expectations result (hint)
2. Prove that the expectation of sum of n random variables is
equal to the sum of expectation of the n random variables.
 Let x1, x2, x3…. xn be n random variables
 Let z = x1 + x2 + x3…. + xn
n
To prove
E ( z )   E ( xi )
i 1
Hint for the proof
 Use the result E(X+Y)=E(X)+E(Y) to generalize for n
random variables
Consider E(X1 + X2 + X3…. + Xn )
Let X2 + X3…. + Xn = Y1
Then E(X1 + Y1) = E(X1) + E(Y1)
 E ( X 1  X 2  ... X n )  E ( X 1 )  E ( X 2  X 3  ....  X n )
Now consider X3…. + Xn = Y2 and repeat the same procedure
3
Contents
 Permutations and Combinations
 Calculating probabilities using combinations
 Distribution
 Proof of Law of Large Numbers
 Binomial Distribution
 Normal Distribution
4
Permutations vs. Combinations
 Both are ways to count the possibilities
 The difference between them is whether order matters
or not
 Consider a 5-card hand:
 A♦, 5♥, 7♣, 10♠, K♠
 Is that the same hand as:
 K♠, 10♠, 7♣, 5♥, A♦
 Does the order the cards are handed out matter?
 If yes, then we are dealing with permutations
 If no, then we are dealing with combinations
5
Permutations
 A permutation is an ordered arrangement of the
elements of some set S
 Let S = {a, b, c}
 c, b, a is a permutation of S
 b, c, a is a different permutation of S
 An r-permutation is an ordered arrangement of r
elements of the set
 A♦, 5♥, 7♣, 10♠, K♠ is a 5-permutation of the set of cards
 The notation for the number of r-permutations: P(n,r)
 For example, poker hand is one of P(52,5) permutations
6
Permutations
 Number of poker hands (5 cards):
 P(52,5) = 52*51*50*49*48 = 311,875,200
 r-permutation notation: P(n,r)
 The poker hand is one of P(52,5) permutations
P(n, r )  n(n  1)( n  2)...( n  r  1)
n!

(n  r )!

n
i
i  n  r 1
7
Deriving the formula of Permutations
 There are n ways to choose the first element
n-1 ways to choose the second
n-2 ways to choose the third
…
n-r+1 ways to choose the rth element
 By the product rule, that gives us:
P(n,r) = n(n-1)(n-2)…(n-r+1)
8
Combinations
 What if order doesn’t matter?
 In poker, the following two hands are equivalent:
 A♦, 5♥, 7♣, 10♠, K♠
 K♠, 10♠, 7♣, 5♥, A♦
 The number of r-combinations of a set with n elements,
where n is non-negative and 0≤r≤n is:
n!
C (n, r ) 
r!(n  r )!
9
Deriving the formula for Combinations
 Let C(n,r) be the number of ways to generate unordered
combinations
 The number of ordered combinations (i.e. r-permutations) is
P(n,r)
 The number of ways to order a single one of those r-permutations
P(r,r)
 The total number of unordered combinations is the total number
of ordered combinations (i.e. r-permutations) divided by the
number of ways to order each combination
 Thus,
C(n,r) = P(n,r)/P(r,r)
(1)
10
Deriving the formula for Combinations
 But from the derivation of permutation formula, we know that
P(n, r )  n(n  1)( n  2)...( n  r  1) 
n!
(n  r )!
(2)
 Hence, substituting n=r, we get
r!
(r  r )!
 Replacing (2) and (3) in (1), we get
P(r , r ) 
(3)
P(n, r ) n! /( n  r )!
C (n, r ) 

P(r , r ) r! /( r  r )!
n!
C (n, r ) 
r!(n  r )!
(since, 0! = 1)
11
In-class Exercise - 1
 Card Terminology:
 face value – same number cards (2-10, J, Q, K, A)
 has 4 cards of same face value
 suite – set of cards with same symbol
 four suites – diamond, heart, spade, clubs
 each suite has 13 cards
 Q) In a standard deck of cards, compute the number of ways
you can deal each of the following five-card hands in poker.
 1. Total number of different possible hands (five cards in a hand)
 2. Number of distinct Flush (all 5 cards have the same suite)
 3. Number of distinct Four of a kind (4 same face value cards)
 A) 1. C (52,5)
2. C (13,5) * C (4,1)
12
3. C (13,1) * C (48,1)
13
Contents
 Permutations and Combinations
 Calculating probabilities using combinations
 Distribution
 Proof of Law of Large Numbers
 Binomial Distribution
 Normal Distribution
14
In-Class Exercise -2
 Q) Now, compute the probability of getting a flush in a five-
card poker game?
Probability
 A)
No. of favorable events
of outcome
Total no. of events
 Number of favorable events = C (13,5) * C (4,1)
 Total no. of events = C (52,5)
 Hence, Probability = C (13,5) * C (4, 1)/ C ( 52,5)
15
In-Class Exercise - 3
 Consider an example: In an experiment of 20 coin
tosses, we want to calculate the probability of heads
falling exactly 5 times. How do we do this?
 Solution:
 Probability of heads in 1 coin toss = ½
 Probability of heads falling in 5 of the coin tosses = ½* ½* ½* ½* ½
= (1/2)5 (Method of intersection of events)
 Probability of heads not falling in 1 coin toss = ½
 Probability of heads not falling in the rest (20-5=15) coin tosses =
(1/2)15
16
Use of Combinations to Calculate Probabilities
 Hence, the probability of getting exactly 5 heads out of 20
tosses = (1/2)5 *(1/2)15 =(1/2)20
Is this correct?
 Q) Did we account for which of the coin tosses had an event
HEAD?
 A) No
 Q) How do we account for it? Permutations or Combinations?
 A) Combinations, as the order of selection is not important
17
Use of Combinations to Calculate Probabilities
Q) How do we select 5 tosses out of 20 tosses with heads outcome using combinations?
Let us make a table of all possible outcomes of 20 coins which have 5 HEADs
1
2
3
4
5
6
7
8
9
10 11 12 13 14 15 16 17 18 19 20
H
H
H
H
H
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
T
H
H
H
H
H
H
H
T
T
H
H
T
T
H
T
T
T
T
T
T
T
T
T
T
T
…
We can see that this table can be generated by choosing 5 places out of the 20 places where H can
occur.
Thus the total number of such combinations would be C(20,5)
18
Use of Combinations to Calculate Probabilities
 Hence, the probability of getting exactly 5 heads out of 20 coin tosses
is given by = C (20,5) * (1/2)20
How do we generalize this method of computing probabilities?
Question
 Consider an example: In an experiment of 20 “biased” coin tosses,
we want to calculate the probability of heads falling exactly 5 times.
How do we do this?
 Given probability of HEAD = p
19
Use of Combinations to Calculate Probabilities
 Consider an example: In an experiment of 20 “biased” coin tosses,
we want to calculate the probability of heads falling exactly 5 times.
How do we do this?
 Assume that the probability of HEAD = p
 Solution:
 Probability of heads in 1 coin toss = p
 Probability of heads falling in 5 of the coin tosses = p*p*p*p*p
=
(p)5
 Probability of heads not falling in 1 coin toss = 1-p
 Probability of heads not falling in the rest (20-5=15) coin tosses = (1p)15
20
Use of Combinations to Calculate Probabilities
 If we generalize the number of trials and the number of HEADs or
successes also we obtain
 Assume that in n trails of an event we want to compute the
probability P of getting k successes when the probability of
success in each trial is p
 We denote this by the following expression
P(number of heads=k) = C(n,k) * pk * (1-p)n-k
Binomial Distribution
21
Schedule for the next 2 weeks
 5 Oct – Tutorial Session II
 Covers Expectations, Permutations & Combinations, Basic
Distributions
 7 Oct – Mini Project 1
 15 % of Final Grade
 Can do it as a take home if the time provided in the class is not
sufficient
 12 Oct – Fall Break Holiday
 14 Oct – Project Proposal report due and in class discussion
on the proposals
22
Project Discussion
23
Contents
 Permutations and Combinations
 Calculating probabilities using combinations
 Distribution
 Binomial Distribution
 Normal Distribution
24
What is a distribution?
Consider the following experiment
Event
Probabilities
Event 1
p1
Event 2
p2
..Etc.
..Etc.
Define a variable x which takes as many
values as the number of events
Event
X
Event 1
1
Event 2
2
..Etc.
..Etc.
Therefore using the probabilities of the events, we can define a function which relates the
variable x and the probabilities of the events
Probability (Event i)
p( x  i)  pi
Where i={1,2,…}
Here ‘x’ is called a random variable.
Distribution
What is a distribution?
A distribution is a function defined on the random variable that gives the value of the
probability of the random variable taking a particular value
The probability distribution describes the range of possible values that a random variable can
attain and the probability that the value of the random variable is within any (measurable) subset
of that range.
 x  {event space}
Examples of a Distribution
i
p(i)
1
1/6
2
1/6
3
1/6
4
1/6
5
1/6
6
1/6
Uniform Distribution
Binomial Distribution
Guassian Distribution
Example of a Distribution
 Suppose you flip a coin two times.
 This experiment can have four possible outcomes: HH, HT, TH,
and TT.
 Now, let the variable random X represent the number of
Heads that result from this experiment.
 X can take on the values 0, 1, or 2.
 The table, equation and graph below, which associate each
outcome with its probability, are all representations of probability
distribution for above example.
P(X  0)  0.25
P(X  1)  0.50
P(X  2)  0.25
27
Distribution Table
Distribution Equation
Distribution graph
Video on Terms in Distributions
28
Variance
 Variance of a random variable or probability distribution is a
measure of statistical dispersion, averaging the squared distance of
its possible values from the expected value (mean).
 If random variable X has expected value (mean) μ = E(X), then
the variance Var(X) of X is given by:
Variance
29
Standard Deviation
 Standard deviation is the positive square root of the
variance. It is given by:
 Low standard deviation indicates that the data points tend to be very close to the same
value (the mean), while high standard deviation indicates that the data are “spread out”
over a large range of values
30
A plot of a normal distribution (or bell curve). Each colored band has a width of one standard deviation.
Useful derivation for Variance
 In probability theory, the computational formula for the
variance Var(X) of a random variable X is the formula
Derivation
(from definition)
(expansion of expectation formula)
31
Contents
 Permutations and Combinations
 Calculating probabilities using combinations
 Distribution
 Proof of Law of Large Numbers
 Binomial Distribution
 Normal Distribution
32
Law of Large Numbers
 The law of large numbers (LLN) describes the long-term
stability of the mean of a random variable.
 Given a random variable with a finite expected value, if its
values are repeatedly sampled, as the number of these
observations increases, their mean will tend to approach and
stay close to the expected value
 for example, consider the coin toss experiment. The frequency of heads (or
tails) will increasingly approach 50% over a large number of trials.
 Mathematically, it can be represented as,
if Mean is
33
, then
Proof of Law of Large Numbers
 First, let us derive the Chebyshev Inequality which simplifies
the derivation of law of large numbers
 Chebyshev Inequality: Let X be a discrete random variable with
expected value µ= E(X), and let > 0 be any positive real number
Proof of Chebyshev Inequality
 Let m(x) denote the distribution function of X. Then the
probability that X differs from µ by at least
34
is given by
Proof of Law of Large Numbers
 We know that,
 But, V(X) is clearly at least as large as
 Replacing (x- µ)2 with
 Hence, we get
35
, to get a lower bound,
Proof of Law of Large Numbers
 Let X1, X2, . . . , Xn be an independent trials process, with finite
expected value µ = E(Xj) and finite variance
 Let Xn be the mean of X1,X2,… Xn. Hence,
 Equivalently,
 But from Chebyshev’s inequality, we have
36
= V (Xj ).
Proof of Law of Large Numbers
 Replacing X with Xn, we get
 Hence, we get
 As n approaches infinity, the expression approaches 1. Hence,
we have obtained,
37
Binomial Distribution
 Binomial distribution is the discrete probability
distribution of the number of successes in a sequence of n
independent yes/no experiments, each of which yields
success with probability p
 It can be applied in a wide variety of practical situations
for k = 0,1,2,3…. n, where
is called the ‘Binomial Coefficient’
38
Contents
 Permutations and Combinations
 Calculating probabilities using combinations
 Distribution
 Proof of Law of Large Numbers
 Binomial Distribution
 Normal Distribution
39
Binomial Distribution
 Binomial distribution is a very interesting distribution in the
sense that it can be applied in a wide variety of practical
situations.
 An example,
 Assume 5% of a very large population to be green-eyed.
 You pick 40 people randomly.
 The number of green-eyed people you pick is a random variable
X which follows a binomial distribution with n = 40 and p =
0.05.
 Let us see how this distribution varies with different values of n
and p with respect to X.
40
For the previous
example, this graph shows
the variation in probability
Notice how it peaks in the
middle and dies away at the
ends
probability(p)
Binomial Distribution
X=number of green eyed people
Another elementary example of a binomial distribution is:
Roll a standard die ten times and count the number of sixes.
Denote the number of sixes by the random variable X
The distribution of this random number X is a binomial distribution with n = 10
and p = 1/6.
Can you plot this distribution and see how it varies with X
41
In-Class Exercise
 Let us try out an example of a binomial distribution:
 Consider a standard die roll for 20 times
Q) Denote the number of times the outcome of the roll an even number by a
random variable X. Compute the probability distribution of X = 8 for this
event.
Q) Denote the number of times the outcome of the roll is ‘6’ by the random
variable Y. Compute the probability distribution of Y equal to 4 for this
event.
Q) Denote the number of times the outcome of the roll is ‘2’ by the random
variable Z. Compute the probability distribution of Z less than or equal to 4
for this event.
Use Binomial Distribution to solve these questions.
42
Attributes of Binomial Distribution
 If X ~ B(n, p) (that is, X is a binomially distributed
random variable with total ‘n’ events and probability of
success ‘p’ in each event),
 Expected value or mean of X is
 Variance of X is
 Standard deviation of X is
43
Video on Binomial Distribution : A Summary
44
Derivation of Variance of Binomial Distribution
 We have seen that variance is equal to
 In using this formula we see that we now also need the
expected value of X 2:
 We can use our experience gained before in deriving the
mean. We know how to process one factor of k. This gets us
as far as
45
Derivation of Variance of Binomial Distribution
 (again, with m = n − 1 and s = k − 1). We split the sum into
two separate sums and we recognize each one
 The first sum is identical in form to the one we calculated in
the Mean (above). It sums to mp. The second sum is unity.
 Using this result in the expression for the variance, along
with the Mean (E(X) = np), we get
46
Deriving the Expectation of Binomial Distribution
 If X ~ B(n, p) (that is, X is a binomially distributed random
variable with total ‘n’ events and probability of success ‘p’ in each
event), then the expected value of X is
 We apply the definition of the expected value of a discrete random
variable to the binomial distribution
 The first term in the summation (for k=0) equals to 0 and can be
removed. In the rest of the summation, we expand the C(n,k)
term,
47
Deriving the Expectation of Binomial Distribution
 Since n and k are independent of the sum, we get
 Assume, m = n − 1 and s = k − 1.
 Limits are changed accordingly
This is similar to the expansion of a binomial theorem
where x=1-p, y=p, m=n & s=k
 Hence, as (x+y) = ((1-p)+p) = 1, we get
48
Derivation of Variance of Binomial Distribution
 We have seen that variance is equal to
 We now compute the value of E(X2):
 Use a similar approach as in the derivation of the mean to
 expand C(n,k)
 assume m = n − 1 and s = k − 1
49
Derivation of Variance of Binomial Distribution
 We split the sum into two separate sums
 The first sum is identical in form to the one we calculated in
the Mean (above). It sums to mp. The second sum is unity
(binomial theorem).
 Hence, we get
50
In-Class Exercise
 Let us continue the previous example of the binomial
distribution:
 Consider a standard die roll for 100 times instead of 20 times
Q) Denote the number of times the outcome of the roll is ‘2’ by the random
variable X. Compute the probability distribution of X greater than or equal
to 60 for this event.
Difficult
 What if we consider the die roll a million times and need to
compute the probability that X is greater than or equal to
100,000 for this event?
51
Impossible
!
How to Compute Distributions for Large ‘N’?
 Abraham de Moivre noted that the shape of the binomial
distribution approached a very smooth curve when the number of
events increased
 he considered a coin toss experiment
 De Moivre tried to find a mathematical expression for this curve
 to find the probabilities involving large number of events more easily.
 led to the discovery of the Normal curve
52
Example by De Moivre
Coin Toss Experiment
Random variable X = Number of heads
Number of events ‘N’ increases
Can be approximated
as a curve
53
Video on Galton Board Game
 Demonstrates how Binomial distribution gives rise to a
Normal/Gaussian distribution as number of trials/events
tends to infinity
54
Contents
 Permutations and Combinations
 Calculating probabilities using combinations
 Distribution
 Binomial Distribution
 Normal Distribution
55
Video on Normal Distribution
56
First 2 mins only
Normal Distribution
 To indicate that a real-valued random variable X is normally
distributed with mean μ and variance σ2 ≥ 0, we write
 The normal distribution is defined by the following equation:
 All normal distributions are symmetric and have bell-shaped
density curves with a single peak.
57
Note: Normal distribution is a continuous probability distribution while Binomial
distribution is a discrete probability distribution
In-Class Exercise
 Let us try out an example of a normal distribution:
 Consider a coin toss experiment for 1000 tosses
Q) Denote the number of times the outcome of the toss is heads by a random variable
X. Compute the probability distribution of X occurring at most 600 times.
How would you use Binomial Distribution to solve this question?
A)
600
 C(1000, k ) *(1 / 2)
1000
k 0
Difficult
How would you use Normal Distribution to solve this question?
A) Since, the original event is a binomial distribution and we use normal distribution to
approximate it, we can use µ=np &
= np(1-p). Hence,
x<=600; µ = 1000*1/2 = 500 and
= 1000*1/2*(1-1/2) =250
Substituting this in the normal distribution equation, we get
Calculating, we get Probability of x<=600 = 0.65542
58
Source of calculation: http://stattrek.com/Tables/Normal.aspx
Examples of Few Applications of
Normal Distribution
 Approximately normal distributions occur in many situations
 In counting problems
 Binomial random variables, associated with yes/no questions;
 Poisson random variables, associated with rare events;
 In physiological measurements of biological specimens:
 logarithm of measures of size of living tissue (length, height, weight);
 length of inert appendages (hair, claws, nails, teeth) of biological
specimens, in the direction of growth
 Measurement errors
 Financial variables
 Light intensity
 intensity of laser light is normally distributed;
59
Normal Distribution
 To indicate that a real-valued random variable X is normally
distributed with mean μ and variance σ2 ≥ 0, we write
 The normal distribution is defined by the following equation:
 All normal distributions are symmetric and have bell-shaped
density curves with a single peak.
60
Note: Normal distribution is a continuous probability distribution while Binomial
distribution is a discrete probability distribution
In-Class Exercise
 Let us try out the previously stated “nearly impossible” problem
using a normal distribution:
 Consider a coin toss experiment for 1,000,000 tosses
Q) Denote the number of times the outcome of the toss is heads by a random variable
X. Compute the probability distribution of X occurring at most 100,000 times.
How would you use Binomial Distribution to solve this question?
A)
100, 000
 C(1000000 , k ) *(1 / 2)
1, 000, 000
Difficult
k 0
How would you use Normal Distribution to solve this question?
61
In-Class Exercise
 Since, the original event is a binomial distribution and we can
use normal distribution to approximate it.
 We know that µ=np &
= np(1-p). Hence,
x<=100000; µ = 1,000,000*1/2 = 500,000 and
= 1,000,000*1/2*(1-1/2) =250,000
 Substituting this in the normal distribution equation, we get
 Calculating the integral with limits from 0 to 100,000;
62
we get Probability of x<=100,000 = 0.0548
Source of calculation: http://stattrek.com/Tables/Normal.aspx
Examples of Few Applications of
Normal Distribution
 Approximately normal distributions occur in many situations
 In counting problems
 Binomial random variables, associated with yes/no questions;
 Poisson random variables, associated with rare events;
 In sports statistical analyses:
 calculating mean physical attributes like heights, weights etc and their
standard deviations
 estimating the probabilities of winning the games
 Measurement errors
 Financial variables
 Light intensity
 intensity of laser light is normally distributed;
63
END
64
Example Application of Bayes Theorem
65