Download 1 Probability 2 Trials

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
1 Probability
(i.d.) and they are independent. The major consequence of independence is the multiplication rule.
For example,
A CM, when operated, produces an outcome. The sample space is the collection of all possible
outcomes of the CM. An event is any collection of outcomes.
Before the CM is operated, the probability of the event A, denoted by P (A), is a number that
measures the likelihood that A will occur.
How are probabilities assigned to events?
If we assume the ELC, then each possible outcome is equally likely to occur.
If we assume the ELC, then we assign probabilities to events as follows. For any event A,
P (A) =
The number of outcomes in A
.
The number of outcomes in S
If we do not assume the ELC, there are two possibilities.
1. Suppose that the sample space is finite and consists of k possible outcomes: 1, 2, . . . k. The
probability of outcome j is denoted by pj . Each pj ≥ 0 and they sum to one.
2. Suppose that the sample space is an infinite sequence and consists of possible outcomes:
0, 1, 2, . . .. The probability of outcome j is denoted by pj . Each pj ≥ 0 and they sum to one.
If A and B are events, then (A or B) is the event that contains all elements that are in A and/or
B; (AB) is the event that contains all elements that are in both A and B.
Two events, A and B, are called disjoint or mutually exclusive if they have no elements in
common; in other words, if AB is the empty set.
Rules of Probability
1. The probability of the sample space equals 1.
2. For any event A, 0 ≤ P (A) ≤ 1.
3. If A and B are disjoint events, then P (A or B) = P (A) + P (B).
c
4. P (A ) = 1 − P (A).
5. If A is a subset of B, then P (A) ≤ P (B).
6. For any events A and B, P (A or B) = P (A) + P (B) − P (AB).
2 Trials
Consider repeated operations of a CM. Each operation is called a trial and yields the value of a
random variable. The random variables are denoted by X1 for the first trial, X2 for the second
trial, and so on. Trials are i.i.d. if, and only if, the Xi ’s all have the same probability distribution
1
P (X1 = 3, X2 = 1, X3 = 4) = P (X1 = 3)P (X2 = 1)P (X3 = 4).
Let A be an event for which on every trial A either occurs or does not occur; i.e. A is a function
of a single operation of the CM. If the trials are i.i.d. then the relative frequency of occurrence of
event A in n trials converges to P (A) as n tends to infinity. In other words, for large n the relative
frequency of occurrence of A is approximately equal to P (A).
If each trial has two possible outcomes, 1 (S) and 0 (F), then we have Bernoulli Trials.
Let X denote the total number of successes in n BT. The probability distribution of X is given
by the following equation.
P (X = x) =
n!
px q n−x , for x = 0, 1, . . . , n.
x!(n − x)!
(1)
Equation 1 is called the binomial probability distribution with parameters n and p; it is
denoted by Bin(n, p).
Suppose that a random variable X takes on values that are equally spaced. Let δ denote the
distance between successive values of X. Then the probability histogram for the probability distribution is drawn as follows.
1. Place the possible values, x, of X on a horizontal number line.
2. Draw rectangles that are centered at the x’s. The base of each rectangle is δ and the height
is P (X = x)/δ.
Every probability distribution has a mean (µ), variance (σ 2 ) and standard deviation (σ). For
√
Bin(n, p), µ = np, σ 2 = npq and σ = npq.
Let X be any random variable with mean µ and standard deviation σ. Then the standardized
version of X is denoted by Z and is given by the equation:
Z=
X −µ
.
σ
If X ∼ Bin(n, p) we can approximate P (X ≥ x) by standardizing to
√
P (Z ≥ (x − 0.5 − np)/ npq). We can approximate this probability for Z by using the snc.
A finite population is a well-defined collection of individuals.
We visualize a finite population as a box of cards. Let p denote the proportion of the cards in
the box marked ‘1’ and q = 1 − p.
We refer to a population box as Box(N,p) to denote a box with N cards, of which N × p cards
are marked ‘1.’
Consider the CM: Select one card at random from Box(N, p). After operating this CM, place
the selected card back into the population box. Repeat this process n times. This operation is
referred to as selecting n cards at random with replacement. Viewing each selection as a trial,
we can see that we have BT.
2
3 Estimation of p
3. Successes cannot be simultaneous.
Assume that we have n BT, but we don’t know the value of p. Nature knows the value of p. We
collect data and observe x successes in the n trials.
The point estimate of p is p̂ = x/n. The point estimate is correct if p̂ = p. Only Nature knows
whether a particular p̂ is correct. They are rarely correct.
A confidence interval (CI) estimate of p is correct if the interval includes p. Only Nature knows
whether a particular CI is correct, but, before collecting data, the researcher knows the probability
that an interval will be correct. If this probability is, say, 95% then after collecting data we call the
resultant interval a 95% confidence interval.
Intervals can be 2-sided (the default value if not stated) or 1-sided. In this exam, a 1-sided CI
is always an upper bound interval.
There are two possible CIs: exact and approximate. Exact is obtained from a website. The
approximate CI is
q
q
p̂ ± z p̂q̂/n and p̂ + z p̂q̂/n
for the 2- and 1-sided CI. The value of z is determined by the confidence level as shown in the
following table.
z:
2-sided CL
1-sided CL
1.282 1.645 1.960 2.326 2.576
80% 90%
95%
98%
99%
90% 95% 97.5% 99% 99.5%
4 The Poisson Distribution
If the sample space is the sequence, 0, 1, . . . , then a researcher might want to use the Poisson
Distribution. The Poisson has one parameter, θ > 0. Exact Poisson probabilities can be obtained
from a website.
For a Poisson(θ), µ = σ 2 = θ. The snc can be√used to obtain approximate probabilities. In
P (X ≥ x) just standardize x to z = (x − 0.5 − θ)/ θ and proceed as for the binomial.
If θ is unknown we can use the value of X to estimate it. The point estimate is x and, following
the presentation for the binomial, we can use the snc to obtain an approximate confidence interval
for θ. The result for 2- and 1-sided intervals are:
√
√
x ± z x and x + z x.
With these assumptions, it turns out that the probability distribution of the number of successes
in any interval of time is the Poisson distribution with parameter θ, where θ = λ × w, where w > 0
is the length of the interval and λ > 0 is a feature of the process, often called its rate.
5 The Goodness of Fit Test
A Chi-Squared curve is characterized by the value of one parameter, called its degrees of freedom
(df ). The degrees of freedom can be any positive integer, 1, 2, 3, . . . . Our symbol for this curve
will be χ2 (df ). For example, χ2 (5) is the Chi-Squared curve with df = 5.
We assume that we have a CM with a finite number, k, of possible outcomes that can be
operated repeatedly and, when so operated, yields i.i.d. trials. Whether the outcomes are categories
or numbers, we assign numbers to each outcome: 1, 2, . . . k or 0, 1, 2, . . . (k − 1). The probability
of outcome i is denoted by pi . The Goodness of Fit Test is used when we have a theory about the
values of the pi ’s and we want to evaluate whether or not the theory is reasonable.
In general, let pi0 denote the theory’s value of pi . This makes the hypotheses:
• H0 : pi = pi0 for all i
• H1 : Not H0 ; i.e. pi 6= pi0 for at least one i.
A test can result in a correct decision, a Type 1 error (rejecting a true null) or a Type 2 error (failing
to reject a false null).
The researcher must specify the significance level of the test. It is denoted by α and is the
probability of a Type 1 error.
The data consists of the observed frequencies of the various possible outcomes, denoted by
O’s. Each O has its E, calculated as Ei = npi0 .
The observed value of the test statistic is
χ2 =
X
(Oi − Ei )2 /Ei ,
summed over all cells.
We reject the null if, and only if, χ2 ≥ χ2α (k − 1). The P-value is the area under the χ2 (k − 1)
curve to the right of the observed value of the test statistic.
It is also possible to use a website to get an exact CI for θ.
If X ∼ Bin(n, p) with n very large, p very small and np moderate, then probabilities for X can
be approximated by using Poisson(θ), where θ = np.
Below are the assumptions for a Poisson Process.
1. The number of successes in disjoint intervals are independent of each other.
2. The probability distribution of the number of successes counted in any time interval only
depends on the length of the interval.
3
4