Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Bases of the theory of
probability and
mathematical
statistics.
Slide ‹#›
An experiment is a situation involving chance or
probability that leads to results called outcomes.In
the problem above, the experiment is spinning the
spinner.
 An outcome is the result of a single trial of an
experiment.The possible outcomes are landing on
yellow, blue, green or red.
 An event is one or more outcomes of an
experiment.One event of this experiment is landing
on blue.
 Probability is the measure of how likely an event is.
Slide ‹#›
Definitions
1
Certain
.5
50/50
Probability is the numerical
measure of the likelihood that
the event will occur.
Value is between 0 and 1.
Sum of the probabilities of all events
is 1.
0
Impossible
Slide ‹#›
Experimental vs.Theoretical
Experimental probability:
P(event) = number of times event occurs
total number of trials
Theoretical probability:
P(E) = number of favorable outcomes
total number of possible outcomes
Slide ‹#›
Identifying the Type of Probability
You draw a marble out of
the bag, record the color,
and replace the marble.
After 6 draws, you
record 2 red marbles
P(red)= 2/6 = 1/3
 Experimental
(The result is found by
repeating an
experiment.)
Trial
Red
Blue
1
2
1
1
3
4
1
1
5
1
6
1
Total
Exp. Prob.
2
4
1/3
2/3
Slide ‹#›
The complement of A is everything in
the sample space S that is NOT in A.
•If
S
A
the rectangular
box is S, and the
white circle is A,
then everything in
the box that’s
outside the circle is
Ac , which is the
complement of A.
Slide ‹#›
Theorem
 Pr
(Ac) = 1 - Pr (A)
Example:
If A is the event that a randomly selected
student is male, and the probability of A is
0.6, what is Ac and what is its probability?
Ac is the event that a randomly selected
student is female, and its probability is 0.4.
Slide ‹#›
The union of A & B (denoted A U B)
is everything in the sample space that is in
either A or B or both.
S
A
•The
B
union of A & B is the whole white area.
Slide ‹#›
The intersection of A & B (denoted A∩B) is
everything in the sample space that is in
both A & B.
S
A
B
•The
intersection of A & B is
the pink overlapping area.
Slide ‹#›
Example
A family is planning to have 2 children.
 Suppose boys (B) & girls (G) are equally
likely.
What is the sample space S?
S = {BB, GG, BG, GB}
Slide ‹#›
Example continued
If E is the event that both children are the
same sex, what does E look like & what is
its probability?
E
= {BB, GG}
Since
boys & girls are equally likely,
each of the four outcomes in the sample
space S = {BB, GG, BG, GB} is equally
likely & has a probability of 1/4.
So
Pr(E) = 2/4 = 1/2 = 0.5
Slide ‹#›
Example cont’d: Recall that
E = {BB, GG} & Pr(E)=0.5
What
is the complement of E and what is
its probability?
Ec
Pr
= {BG, GB}
(Ec) = 1- Pr(E) = 1 - 0.5 = 0.5
Slide ‹#›
Example continued
If
F is the event that at least one of the
children is a girl, what does F look like &
what is its probability?
F
= {BG, GB, GG}
Pr(F)
= 3/4 = 0.75
Slide ‹#›
Recall: E = {BB, GG} & Pr(E)=0.5
F = {BG, GB, GG} & Pr(F) = 0.75
What is E∩F?
{GG}
What is its probability? 1/4 = 0.25
Slide ‹#›
Recall: E = {BB, GG} & Pr(E)=0.5
F = {BG, GB, GG} & Pr(F) = 0.75
What is the EUF?
{BB, GG, BG, GB} = S
What is the probability of EUF?
1
If you add the separate probabilities of E & F
together, do you get Pr(EUF)? Let’s try it.
+ Pr(F) = 0.5 + 0.75 = 1.25 ≠ 1 = Pr (EUF)
Why doesn’t it work?
We counted GG (the intersection of E & F) twice.
Slide ‹#›
Pr(E)
A formula for Pr(EUF)
Pr(EUF)
= Pr(E) + Pr(F) - Pr(E∩F)
If
E & F do not overlap, then the
intersection is the empty set, & the
probability of the intersection is zero.
When
there is no overlap,
Pr(EUF) = Pr(E) + Pr(F) .
Slide ‹#›
Independent Events
We can deduce an important result from the conditional
law of probability:
If B has no effect on A, then, P(A B) = P(A) and we say
the events are independent.
( The probability of A does not depend on B. )
So, P(A|B) = P(A  B)
P(B)
becomes
or
P(A) = P(A  B)
P(B)
P(A  B) = P(A)  P(B)
Slide ‹#›
Independent Events
Tests for independence
P(A B) = P(A)
P(B A) = P(B)
or
P(A  B) = P(A)  P(B)
Slide ‹#›
The Multiplication Rule
If events A and B are independent, then the
probability of two events, A and B occurring in a
sequence (or simultaneously) is:
P( A and B) = P( A  B) = P( A)  P( B)
This rule can extend to any number of independent
events.
Two events are independent if the occurrence of the
first event does not affect the probability of the
occurrence of the second event. More on this later
Slide ‹#›
Mutually Exclusive
Two events A and B are mutually exclusive if and
only if:
P( A  B) = 0
In a Venn diagram this means that event A is
disjoint from event B.
A
B
A and B are M.E.
A
B
A and B are not M.E.
Slide ‹#›
The Addition Rule
The probability that at least one of the events A
or B will occur, P(A or B), is given by:
P( A or B) = P( A  B) = P( A)  P( B)  P( A  B)
If events A and B are mutually exclusive, then the
addition rule is simplified to:
P( A or B) = P( A  B) = P( A)  P( B)
This simplified rule can be extended to any number
of mutually exclusive events.
Slide ‹#›
Conditional Probability
Conditional probability is the probability of an event
occurring, given that another event has already
occurred.
Conditional probability restricts the sample space.
The conditional probability of event B occurring,
given that event A has occurred, is denoted by
P(B|A) and is read as “probability of B, given A.”
We use conditional probability when two events
occurring in sequence are not independent. In other
words, the fact that the first event (event A) has
occurred affects the probability that the second
event (event B) will occur.
Slide
‹#›
Conditional Probability
Formula for Conditional Probability
P( A  B)
P( B  A)
P( A | B) =
or P( B | A) =
P( B)
P( A)
Better off to use your brain and work out
conditional probabilities from looking at the sample
space, otherwise use the formula.
Slide ‹#›
Assigning
Probabilities
Two basic requirements
for assigning probabilities
1. The probability assigned to each experimental outcome
must be between 0 and 1, inclusively. If we let Ei denote the
ith experimental outcome and P(Ei) its probability, then this
requirement can be written as
0 P(Ei) 1 for all I
2. The sum of the
for all the experimental
 probabilities
outcomes must equal 1.0. For n experimental outcomes, this
requirement can be written as
P(E1)+ P(E2)+… + P(En) =1
Slide ‹#›
Classical Method
If an experiment has n possible outcomes, this
method
would assign a probability of 1/n to each
outcome.
Example
Experiment: Rolling a die
Sample Space: S = {1, 2, 3, 4, 5, 6}
Probabilities: Each sample point has a
1/6 chance of occurring
Slide ‹#›
Slide ‹#›
THEORETICAL PROBABILITY
I have a quarter
My quarter has a heads
side and a tails side
 Since my quarter has only
2 sides, there are only 2
possible outcomes when I
flip it. It will either land on
heads, or tails
HEADS
TAILS
Slide ‹#›
THEORETICAL PROBABILITY
When I flip my coin, the
probability that my coin
will land on heads is 1 in
2
What is the probability
that my coin will land on
tails??
HEADS
TAILS
Slide ‹#›
Theoretical Probability
Right!!! There is a 1 in 2 probability that my
coin will land on tails!!!
A probability of 1 in 2 can be
written in three ways:
•As a fraction:
HEADS
½
•As a decimal: .50
TAILS
•As a percent: 50%
Slide ‹#›
Theoretical Probability
I have three marbles in a bag.
1 marble is red
1 marble is blue
1 marble is green
I am going to take 1 marble
from the bag.
What is the probability that I will
pick out a red marble?
Slide ‹#›
Theoretical Probability
Since there are three
marbles and only one
is red, I have a 1 in 3
chance of picking out a
red marble.
I can write this in three
ways:
As a fraction: 1/3
As a decimal: .33
As a percent: 33%
Slide ‹#›
Experimental Probability
Experimental probability is
found by repeating an
experiment and observing the
outcomes.
Slide ‹#›
Experimental Probability
Remember the bag of marbles?
The bag has only 1 red, 1 green,
and 1 blue marble in it.
There are a total of 3 marbles in
the bag.
Theoretical Probability says there
is a 1 in 3 chance of selecting a
red, a green or a blue marble.
Slide ‹#›
Experimental Probability
Draw 1 marble from the bag.
It is a red marble.
Record
the outcome on the tally sheet
Marble
number red blue green
1
1
2
3
4
5
6
Slide ‹#›
Experimental Probability
Put the red marble back in the bag and
draw again.
 This time your drew a green marble.
 Record this outcome on the tally sheet.
Marble
number red blue green
1
1
2
1
3
4
Slide ‹#›
Experimental Probability
Place the green marble back in the bag.
 Continue drawing marbles and recording
outcomes until you have drawn 6 times.
(remember to place each marble back in
the bag before drawing again.)
Slide ‹#›
Experimental Probability
After 6 draws your
chart will look similar
to this.
 Look at the red
column.
 Of our 6 draws, we
selected a red marble
2 times.
Marble
number red blue green
1
1
2
1
3
1
4
1
5
1
6
1
Total
2
1
3
Slide ‹#›
Experimental Probability
The experimental
probability of drawing a
red marble was 2 in 6.
This can be expressed
as a fraction: 2/6 or 1/3
a decimal : .33
or a percentage: 33%
Marble
number red blue green
1
1
2
1
3
1
4
1
5
1
6
1
Total
2
1
3
Slide ‹#›
Experimental Probability
Notice the
Experimental
Probability of
drawing a red,
blue or green
marble.
Marble
number red blue green
1
1
2
1
3
1
4
1
5
1
6
1
Total
2
1
3
2/6
3/6
Exp.
or
or
Prob.
1/3
1/6 1/2
Slide ‹#›
Comparing Experimental and
Theoretical Probability
Look at the chart at
the right.
Is the experimental
probability always the
same as the
theoretical
probability?
red
Exp.
Prob.
Theo.
Prob.
blue green
1/3
1/6
1/2
1/3
1/3
1/3
Slide ‹#›
Comparing Experimental and
Theoretical Probability
In this experiment, the
experimental and
theoretical
probabilities of
selecting a red marble
are equal.
red
Exp.
Prob.
Theo.
Prob.
blue green
1/3
1/6
1/2
1/3
1/3
1/3
Slide ‹#›
Comparing Experimental and
Theoretical Probability
The experimental
probability of selecting a
blue marble is less than
Exp.
the theoretical probability.
Prob.
The experimental
Theo.
probability of selecting a Prob.
green marble is greater
than the theoretical
probability.
red
blue green
1/3
1/6
1/2
1/3
1/3
1/3
Slide ‹#›
Point and interval estimations of
parameters of the normally updiffused sign. Concept of
statistical evaluation.
Slide ‹#›
What is statistics?
a branch of mathematics that provides
techniques to analyze whether or not your
data is significant (meaningful)
Statistical applications are based on
probability statements
Nothing is “proved” with statistics
Statistics are reported
Statistics report the probability that similar
results would occur if you repeated the
experiment
Slide ‹#›
Statistics deals with numbers
Need to know nature of numbers collected
 Continuous
variables: type of numbers
associated with measuring or weighing; any
value in a continuous interval of
measurement.
Examples:
Weight of students, height of plants, time to flowering
 Discrete
variables: type of numbers that are
counted or categorical
Examples:
Numbers of boys, girls, insects, plants
Slide ‹#›
Standard Deviation and
Variance
Standard deviation and variance are the
most common measures of total risk
They measure the dispersion of a set of
observations around the mean observation
Slide ‹#›
Standard Deviation and
Variance (cont’d)
General equation for variance:
2
n
Variance =  2 =  prob( xi )  xi  x 
i =1
If all outcomes are equally likely:
n
2
1
 =   xi  x 
n i =1
2
Slide ‹#›
Standard Deviation and
Variance (cont’d)
Equation for standard deviation:
Standard deviation =  =  2 =
2
n
 prob( x )  x  x 
i =1
i
i
Slide ‹#›
1.The Normal distribution –
parameters m and  (or 2)
Comment: If m = 0 and  = 1 the distribution is
called the standard normal distribution
0.03
Normal distribution
with m = 50 and  =15
0.025
0.02
Normal distribution with
m = 70 and  =20
0.015
0.01
0.005
0
0
20
40
60
80
100
120
Slide ‹#›
The probability density of the normal distribution
1
f ( x) =
e
2
 xm 
2
,  x  
2 2
If a random variable, X, has a normal distribution
with mean m and variance 2 then we will
write:
X ~ N m ,
2
Slide ‹#›
The Chi-square distribution
The Chi-square (c2) distribution with n d.f.
  1  2 n 1  1 x
 2 x2 e 2
f  x  =    n2 
0
n
n 2  x
 1
2
2
x
e
 n2 n
= 2  2 
0
x0
x0
x0
x0
Slide ‹#›
Graph: The c2 distribution
(n = 4)
0.2
(n = 5)
(n = 6)
0.1
0
0
4
8
12
16
Slide ‹#›
Basic Properties of the Chi-Square distribution
If z has a Standard Normal distribution then
z2 has a c2 distribution with 1 degree of
freedom.
2. If z1, z2,…, zn are independent random variables
each having Standard Normal distribution then
1.
U = z12  z22  ...  zn2
has a c2 distribution with n degrees of freedom.
3. Let X and Y be independent random variables
having a c2 distribution with n1 and n2 degrees of
freedom respectively then X + Y has a c2
distribution with degrees of freedom n1 + n2.
Slide ‹#›
continued
4. Let x1, x2,…, xn, be independent random variables
having a c2 distribution with n1 , n2 ,…, nn degrees
of freedom respectively then x1+ x2 +…+ xn has a
c2 distribution with degrees of freedom n1 +…+ nn.
5. Suppose X and Y are independent random variables
with X and X + Y having a c2 distribution with n1
and n (n > n1 ) degrees of freedom respectively
then Y has a c2 distribution with degrees of
freedom n - n1.
Slide ‹#›
The non-central Chi-squared distribution
If z1, z2,…, zn are independent random variables each
having a Normal distribution with mean mi and
variance 2 = 1, then
U = z  z  ...  zn
2
1
2
2
2
has a non-central c2 distribution with n degrees of
freedom and non-centrality parameter
n
 = 12  mi2
i =1
Slide ‹#›
Mean and Variance of non-central c2
distribution
If U has a non-central c2 distribution with n degrees of
freedom and non-centrality parameter
=
Then
n
1
2
m
i =1
n
2
i
E U  = n  2 = n   mi2
i =1
VarU  = 2n  4 
If U has a central c2 distribution with n degrees of
freedom and  is zero, thus
EU  = n
VarU  = 2n
Slide ‹#›
Estimation of Population Parameters
Statistical inference refers to making inferences about a population
parameter through the use of sample information
The sample statistics summarize sample information and can be
used to make inferences about the population parameters
Two approaches to estimate population parameters
Point estimation: Obtain a value estimate for the population parameter
 Interval estimation: Construct an interval within which the population
parameter will lie with a certain probability
Slide ‹#›
Point Estimation
In attempting to obtain point estimates of population parameters, the
following questions arise
What is a point estimate of the population mean?
 How good of an estimate do we obtain through the methodology that we
follow?
Example: What is a point estimate of the average yield on ten-year
Treasury bonds?
To answer this question, we use a formula that takes sample
information and produces a number
Slide ‹#›
Point Estimation
A formula that uses sample information to produce an estimate of a
population parameter is called an estimator
A specific value of an estimator obtained from information of a
specific sample is called an estimate
Example: We said that the sample mean is a good estimate of the
population mean
The sample mean is an estimator
 A particular value of the sample mean is an estimate
Slide ‹#›
Interval Estimation
In the probabilistic interpretation, we say that
A 95% confidence interval for a population parameter means that, in
repeated sampling, 95% of such confidence intervals will include the
population parameter
In the practical interpretation, we say that
We are 95% confident that the 95% confidence interval will include the
population parameter
Slide ‹#›
Constructing Confidence Intervals
Confidence intervals have similar structures
Point Estimate  Reliability Factor  Standard Error
Reliability factor is a number based on the assumed distribution of the
point estimate and the level of confidence
Standard error of the sample statistic providing the point estimate
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Known Variance
X the sample mean, then we are interested in the confidence
If is
interval, such that the following probability is .9
.9 = P 1.645  Z  1.645
X m
= P  1.645 
 1.645
/ n
1.645 
  1.645
= P
 X m 
n
n 
1.645
1.645 
= P X 
mX 
n
n 
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Known Variance
Following the above expression for the structure of a confidence
interval, we rewrite the confidence interval as follows
X  1.645 
n
Note that from the standard normal density
PZ  1.65 = FZ 1.65 = 0.95
P( Z  1.65) = FZ  1.65 = 0.05
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Known Variance
Given this result and that the level of confidence for this interval (1) is .90, we conclude that
The area under the standard normal to the left of –1.65 is 0.05
 The area under the standard normal to the right of 1.65 is 0.05
Thus, the two reliability factors represent the cutoffs -z/2 and z/2 for
the standard normal
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Known Variance
In general, a 100(1-)% confidence interval for the population mean
m when we draw samples from a normal distribution with known
variance 2 is given by
X  z / 2
n
where z/2 is the number for which
PZ  z / 2  =
2
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Known Variance
Note: We typically use the following reliability factors when
constructing confidence intervals based on the standard normal
distribution
90% interval: z0.05 = 1.65
 95% interval: z0.025 = 1.96
 99% interval: z0.005 = 2.58
Implication: As the degree of confidence increases the interval
becomes wider
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Known Variance
Example: Suppose we draw a sample of 100 observations of returns
on the Nikkei index, assumed to be normally distributed, with sample
mean 4% and standard deviation 6%
What is the 95% confidence interval for the population mean?
The standard error is .06/
The confidence interval is .04  1.96(.006)
= .006
100
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance
In a more typical scenario, the population variance is unknown
Note that, if the sample size is large, the previous results can be
modified as follows
The population distribution need not be normal
The population variance need not be known
The sample standard deviation will be a sufficiently good estimator of
the population standard deviation
Thus, the confidence interval for the population mean derived above
can be used by substituting s for 
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance
However, if the sample size is small and the population variance is
unknown, we cannot use the standard normal distribution
If we replace the unknown  with the sample st. deviation s the
following quantity
t=
X m
s/ n
follows Student’s t distribution with (n – 1) degrees of freedom
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance
The t-distribution has mean 0 and (n – 1) degrees of freedom
As degrees of freedom increase, the t-distribution approaches the
standard normal distribution
Also, t-distributions have fatter tails, but as degrees of freedom
increase (df = 8 or more) the tails become less fat and resemble that
of a normal distribution
Slide ‹#›
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance
In general, a 100(1-)% confidence interval for the population mean
m when we draw small samples from a normal distribution with an
unknown variance 2 is given by
X  tn 1, / 2
s
n
where tn-1,/2 is the number for which
Ptn1  tn1, / 2  =
2
Slide ‹#›
Confidence Interval for the Population Variance
of a Normal Population
Suppose we have obtained a random sample of n observations from
a normal population with variance 2 and that the sample variance is
s2. A 100(1 - )% confidence interval for the population variance is
n  1s 2
c n21, / 2
2
 
n  1s 2
c n21,1 / 2
Slide ‹#›
End
Slide ‹#›