Download unit I biostate (Mrs. Neha Barve)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Probability: Fundamentals (discrete
and continuous probability
models)
Neha Barve
Lecturer Bioinformatics
DAVV
Topic covered
•
•
•
•
•
•
Probability fundamentals
Definitions
Events
Probability models (discrete and continuous)
Expectation and variance
Examples
Probability
• Probability is study of random experiments .
• It is a measure of whether a particular event will occur or not.
• A measure of chance or probability of occurrence of an event, a
number between 0 and 1.
• If an event occurs the probability is 100%.
• If an event does not occur the probability is 0%.
• If not sure the probability lies between 0 to 1.
• The uses of probability
– Begins with gambling.
– Now applied to analyze data in astronomy, mortality data, traffic flow,
telephone interchange, genetics, epidemics, investment...
Probability Terms
• Random Experiment: A process leading to at least 2 possible
outcomes with uncertainty as to which will occur.
• Event: An event is a subset of all possible outcomes of an
experiment.
– Intersection of Events: Let A and B be two events. Then the intersection
of the two events, denoted A  B, is the event that both A and B occur.
– Union of Events: The union of the two events, denoted A  B, is the event
that A or B (or both) occurs.
– Complement: Let A be an event. The complement of A (denoted ) is the
event that A does not occur.
– Mutually Exclusive Events: A and B are said to be mutually exclusive if
at most one of the events A and B can occur (two events are mutually
exclusive if they cannot occur at the same time. An example is tossing a
coin once, which can result in either heads or tails, but not both.).
• Basic Outcomes: The simple possible results of an experiment. One
and exactly one of these outcomes must occur. The set of basic
outcomes is mutually exclusive and collectively exhaustive.
• Sample Space: The totality of basic outcomes of an experiment.
collectively exhaustive
• Means that at least one of the outcomes must
happen, so these two possibilities together
exhaust all the possibilities. However, not all
mutually exclusive events are collectively
exhaustive. For example, the outcomes 1 and
4 of a single roll of a six-sided die are
mutually exclusive (cannot both happen) but
not collectively exhaustive (there are other
possible outcomes).
Basic Probability Rules
1. For any event A, 0  P(A)  1.
2. If A and B can never both occur (they are mutually
exclusive), then
P(A and B) = P(A  B) = 0.
3. P(A or B) = P(A  B) = P(A) + P(B) - P(A  B).
4. If A and B are mutually exclusive events, then P(A or B) =
P(A  B) = P(A) + P(B).
5. P(Ac) = 1 - P(A).
Independent Events
• Two events A and B are said to be independent if the fact
that A has occurred or not does not affect your assessment
of the probability of B occurring. Conversely, the fact that B
has occurred or not does not affect your assessment of the
probability of A occurring.
6. If A and B are independent events, then
P(A and B) = P(A  B) = P(A)  P(B).
Probability models
• Two parts in coin tossing.
– A list of possible outcomes.
– A probability for each outcome.
• The Sample space S of a random phenomenon is
the set of all possible outcomes.
– Examples. S={heads, tails}={H,T}
– General analysis is possible.
• What is the probability of “exactly 2 heads in four tosses of a coin”?
• What kind of rules that any assignment of probabilities must satisfy?
• An event is an outcome or a set of outcomes. (= it is a subset of the
sample space)
• A={HHTT,HTHT,HTTH,THHT,THTH,TTHH}
• In a probability model, events have probabilities that satisfy ...
• Two events A and B are independent if knowing that one occurs
does not change the probability that the other occurs.
• If A and B are independent,
P(A and B) = P(A)P(B)
the multiplication rule for independent events.
Conditional Probability
• A conditional probability is the probability
that event A occurs when the sample space is
limited to event B.
• This is read "the probability of A, given B". It is
commonly notated P(A | B). The two events
are separated by a vertical line
• Example: One of the businesses that have grown out of the
public's increased use of the internet has been providing
internet service to individual customers; those who provide
this service are called Internet Service Providers (ISPs).
– More recently, a number of ISPs have developed business models
whereby they do not need to charge customers for internet service at
all, by collecting fees from advertisers, and forcing the non-paying
customers to view these advertisements.
– Jupiter Communications estimates that by the end of 2003 20% of web
users will have a free ISP. 6% of all web users, it is estimated, will
have both a free ISP and a paid ISP account.
• In 2003, what proportion of internet users is expected to do
the following?
a) subscribes to both a free ISP and a paid ISP.
b) subscribes only to a paid ISP.
c) subscribes only to a free ISP.
P(A  B)= P(A|B)P(B)= P(B|A)P(A)
• In these simple calculations, we are making use of the
conditional probability formula:
P(A|B) = P(A holds given that B holds) = P(A∩B)/P(B)
• This relationship is known as Bayes' Law, after the English
clergyman Thomas Bayes (1702-1761), who first derived it.
Bayes' Law was later generalized by the French
mathematician Pierre-Simon LaPlace (1749-1827).
Bayes
Laplace
Random Variables
• A random variable is a variable whose value is a
numerical outcome of a random phenomenon.
• A random variable is a numerical description of the
outcome of an experiment (e.g., the possible results of
rolling two dice: (1, 1), (1, 2) , etc.).
• Random variables can be classified as either discrete (a
random variable that may assume either a finite number
of values or an infinite sequence of values) or as
continuous (a variable that may assume any numerical
value in an interval or collection of intervals).
Random Variable
• A random variable is called discrete if it has countably many
possible values; otherwise, it is called continuous.
• The following quantities would typically be modeled as
discrete random variables:
– The number of defects in a batch of 20 items.
– The number of people preferring one brand over another in a market
research study.
– The credit rating of a debt issue at some date in the future.
• The following would typically be modeled as continuous
random variables:
– The proportion of defects in a batch of 10,000 items.
– The time between breakdowns of a machine.
–Sometimes, we approximate a discrete random variable with a
continuous one if the possible values are very close together; e.g., stock
prices are often treated as continuous random variables.
Difference
• A continuous variable is one that can take any real
numerical value. For example
– The length of a strip can be anything.
– A person's height and age can take any real values, within
reasonable limits.
• Whereas, discrete variables will only have values that
are whole numbers, For example
– Number of people on a football team.
– The number of major planets in the solar system.
– No star could ever have 5.62 major
planets
Distribution: discrete
• If X is a discrete random variable then we denote its pmf by PX.
– The rule that assigns specific probabilities to specific values for a
discrete random variable is called its probability mass function or pmf.
– For any value x, PX(x) is the probability of the event that X = x; i.e.,
PX(x) = P(X = x) = probability that the value of X is x.
– We always use capital letters for random variables. Lower-case letters
like x and y stand for possible values (i.e., numbers) and are not
random.
– A pmf is graphed by drawing a vertical line of height PX(x) at each
possible value x.
•It is similar to a histogram, except that the height of the line (or bar) gives
the theoretical probability rather than the observed frequency.
• The pmf gives us one way to describe the distribution of a random
variable. Another way is provided by the cumulative probability function,
denoted by FX and defined by FX(x) = P(X≦ x)
– It is the probability that X is less than or equal to x.
– The the pmf gives the probability that the random variable lands on a
particular value, the cpf gives the probability that it lands on or below
a particular value. In particular, FX is always an increasing function.
Examples
•
•
•
•
Three tosses of fair coin:
There are eight possible outcomes.
These will constitute the sample space.
Let the number of heads be the random
variable X, sample space S.
S
=
[
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
]
X
=
[
x1
x2
x3
x4
x5
x6
x7
x8
]
=
[
3
2
2
2
1
1
1
0
]
• Let X be a discrete random variable and also let x1,x2,x3…..
Be the values that X can assume in increasing order of
magnitude.
• Let
P(X= xi) = f (xi) = 1,2,3…
• Be the probability of xi,
Σ f(x) = 1
• f(x) is known as probability function or pdf.
• Probability function for the coin tossed:
• Probability of each of the 8 outcomes is 1/8.
P( X = 0 ) = P ( x8 ) = 1/8
P( X = 1 ) = P ( x5 ) + P ( x6 ) + P ( x7 ) = 1/8 + 1/8 + 1/8 = 3/8
P( X = 2 ) = P ( x2 ) + P ( x3) + P ( x4 ) = 1/8 + 1/8 + 1/8 = 3/8
P( X = 3 ) = P ( x1 ) = 1/8
Probability density function is that :
X
0
1
2
3
f(x)
1/8
3/8
3/8
1/8
Histogram
3\8
1\8
0
1
2
3
Distribution: continuous
• The distribution of a continuous random variable cannot be
specified through a probability mass function because if X is
continuous, then P(X = x) = 0 for all x; i.e., the probability of
any particular value is zero. Instead, we must look at
probabilities of ranges of values.
– The probabilities of ranges of values of a continuous random variable
are determined by a density function. It is denoted by fX. The area
under a density is always 1.
– The probability that X falls between two points a and b is the area
under fX between the points a and b. The familiar bell-shaped normal
curve is an example of a density.
• The cumulative distribution function or cdf of a continuous
random variable is obtained from the density in much the
same way a cpf is obtained from the pmf of a discrete
distribution.
– The cdf of X, denoted by FX, is given by FX(x) = P(X≦ x).
– FX(x) is the area under the density fX to the left of x.
• Let there be a function f(x) such that
• f(x) ≥ 0
• f(x) is pdf
• The probability of X lying between a and b
defined by
Expectation and Variance
• In probability theory, the expected
value (or expectation, or mathematical
expectation, or mean, or the first moment) of
a random variable is the weighted average of
all possible values that this random variable
can take on.
• The weights used in computing this average
correspond to the probabilities in case of a
discrete random variable, or densities in case
of a continuous random variable.
Discrete random variable, finite case
• Suppose random variable X can take value x1 with
probability p1, value x2 with probability p2, and so
on, up to value xk with probability pk. Then
the expectation of this random variable X is
defined as
• Since all probabilities pi add up to one: p1 + p2 + ...
+ pk = 1, the expected value can be viewed as
the weighted average, with pi’s being the weights:
Example
• Let X represent the outcome of a roll of a sixsided die.
• The possible values for X are 1, 2, 3, 4, 5, 6, all
equally likely (each having the probability
of 1/6 ). The expectation of X is
• Hence the formula for expectation is
Variance
• In probability theory and statistics,
the variance is a measure of how far a set of
numbers are spread out from each other. It is
one of several descriptors of a probability
distribution, describing how far the numbers
lie from the mean (expected value). In
particular, the variance is one of
the moments of a distribution.
Example
• if a coin is tossed twice, the number of heads
is: 0 with probability 0.25, 1 with probability
0.5 and 2 with probability 0.25.
• Thus the mean of the number of heads is 0.25
× 0 + 0.5 × 1 + 0.25 × 2 = 1,
• and the variance is
(1-0.5)2 + (1-0.5)2= 0.5
Cumulative density
function
• If a random variable can take values
x1,x2,x3,……, than the distribution function is
given by
Distribution function at random
variable
F(X) remains the same or increases as X increase. Hence F(x)
is said to be a monotonically increasing funcation
Continuous random variable
• A random variable that can take on an infinite number of
values is known as a continuous random variable.
• There are infinite possible values of X, the probability that
it takes on any particular valueis 1/∞ or 0.
• Hence probability function in this case cannot be defined as
in the discrete case.
• In a continuous case probability that X lies between two
different values is non-zero.
• Examples:
• 1) if X represent the height of a person, then the probability
that it is exactly 160 cm would be zero but the probability
btween 155 cm and 165 cm would be non zero.
• 2) if one measures the width of an oak leaf, the result
of 3½ cm is possible, however it has probability zero
because there are uncountably many other potential
values even between 3 cm and 4 cm. Each of these
individual outcomes has probability zero, yet the
probability that the outcome will fall into the interval
(3 cm, 4 cm) is nonzero. (Formally, each value has
an
infinitesimally
small
probability,
which statistically is equivalent to zero.)
• Let there be a function f(x) such that
• f(x) ≥ 0
• f(x) is pdf
• The probability of X lying between a and b defined by
• For a continuous case the probability of X being equal to
any particular value is zero. Hence < sign can be replaced
by the sign ≤ thus
• P(a<X<b) = P (a ≤ X < b) = P(a < X ≤ b) = P (a ≤ X ≤ b)
Probability density function of
continuous random variable
Normal distribution
• Data can be "distributed" (spread out) in different ways.
(left or right or jumbled)
• But there are many cases where the data tends to be
around a central value with no bias left or right, and it gets
close to a "Normal Distribution“.
•
•
•
•
•
We say the data is "normally distributed“ if
The Normal Distribution has:
mean = median = mode
symmetry about the center
50%
of
values
less
than
and 50% greater than the mean
the
mean
Mathematical term
• The normal (or Gaussian) distribution is a continuous
probability distribution that has a bell-shaped probability
density function, known as the Gaussian function or
informally the bell curve: and given by
where
parameter
μ
is
the mean or expectation (location of the peak)
and σ 2 is the variance, the mean of the squared
deviation, (a "measure" of the width of the
distribution). σ is the standard deviation. The
distribution with μ = 0 andσ 2 = 1 is called
the standard normal.
Standard Deviations
• When you calculate the standard deviation of your data, you
will find that:
68% of values are within
1 standard deviation of the mean
95% are within 2 standard deviations
99.7% are within 3 standard deviations
• The number of standard deviations from the mean is also
called the "Standard Score", "sigma" or "z-score".
•
•
•
•
z is the "z-score" (Standard Score)
x is the value to be standardized
μ is the mean
σ is the standard deviation
Example
• A survey of daily travel time had these results (in
minutes): 26, 33, 65, 28, 34, 55, 25, 44, 50, 36, 26, 37,
43, 62, 35, 38, 45, 32, 28, 34
• The Mean is 38.8 minutes, and the Standard
Deviation is 11.4 minutes
• Convert the values to z-scores ("standard scores").
• To convert 26:
• first subtract the mean: 26 - 38.8 = -12.8,
• then divide by the Standard Deviation: -12.8/11.4 = 1.12
• So 26 is -1.12 Standard Deviations from the Mean
• Here are the first three conversions
Original Value
Calculation
Standard Score
(z-score)
26
(26-38.8) / 11.4 =
-1.12
33
(33-38.8) / 11.4 =
-0.51
65
(65-38.8) / 11.4 =
+2.30
...
...
...
Central limit theorem
• CLT indicates that the probability density of sum of N
independent random variable tends to approach a normal
density as the N increases.
• The mean and variance of this normal density are the sums
of mean and variance of N independent random variable.
• When you throw a die ten times, you rarely get ones only.
The usual result is approximately same amount of all
numbers between one and six. Of course, sometimes you
may get a five sixes, for example, but certainly not often.
• The reason for this is that you can get the middle values in
many more different ways than the extremes. Example:
when throwing two dice: 1+6 = 2+5 = 3+4 = 7, but only 1+1
= 2 and only 6+6 = 12.
•
• The formal representation of the central limit theore m
looks like this:
• when X1, X2,... are independent
of random variablea X, to which applies:
observations
Hypothesis testing
• Hypothesis testing is a way of systematically quantifying how
certain you are of the result of a statistical experiment.
• Example (tossing a coin 100 times and make a judgment about
whether coin is fair or not)
• Null Hypothesis : It is a hypothesis which states that there is no
difference between the procedures and is denoted by H0.
• Alternative Hypothesis : It is a hypothesis which states that there is
a difference between the procedures and is denoted by HA.
• Test Statistic : It is the random variable X whose value is tested to
arrive at a decision.
• Conclusion : If the test statistic falls in the rejection/critical region,
H0 is rejected, else H0 is accepted.
• Table 1. Various types of H0 and HA
Case
Null Hypothesis H 0
Alternate
Hypothesis H A
1
m1 = m2
m1 � m2
2
m1 < m2
m1 > m2
3
m1 > m2
m1 < m2
• Rejection Region : It is the part of the sample space (critical region)
where the null hypothesis H0 is rejected. The size of this region, is
determined by the probability (α) of the sample point falling in the
critical region when H0 is true. α is also known as the level of
significance, the probability of the value of the random variable
falling in the critical region. Also it should be noted that the term
"Statistical significance" refers only to the rejection of a null
hypothesis at some level α.It implies only that the observed
difference between the sample statistic and the mean of the sampling
distribution did not occur by chance alone.
Example - Efficacy Test for New drug
• Drug company has new drug, wishes to compare it
with current standard treatment
• Federal regulators tell company that they must
demonstrate that new drug is better than current
treatment to receive approval
• Firm runs clinical trial where some patients
receive new drug, and others receive standard
treatment
• Numeric response of therapeutic effect is obtained
(higher scores are better).
• Parameter of interest: mNew - mStd
Example - Efficacy Test for New drug
• Null hypothesis - New drug is no better than standard trt
 New  Std  0
H 0 :  New  Std  0
• Alternative hypothesis - New drug is better than standard trt
H A :  New   Std  0
• Experimental (Sample) data:
y New
y Std
s New
sStd
nNew
nStd
Sampling Distribution of Difference in Means
• In large samples, the difference in two sample means is
approximately normally distributed:
2
2 



1
Y 1  Y 2 ~ N  1   2 ,
 2 


n
n
1
2


• Under the null hypothesis, 1-2=0 and:
Z
Y1 Y 2

2
1
n1


2
2
~ N (0,1)
n2
• 12 and 22 are unknown and estimated by s12 and s22
Example - Efficacy Test for New drug
• Type I error - Concluding that the new drug is better than the
standard (HA) when in fact it is no better (H0). Ineffective drug is
deemed better.
– Traditionally a = P(Type I error) = 0.05
• Type II error - Failing to conclude that the new drug is better (HA)
when in fact it is. Effective drug is deemed to be no better.
– Traditionally a clinically important difference (D is assigned
and sample sizes chosen so that:
b = P(Type II error | 1-2 = D)  .20
Error
• When using probability to decide whether a statistical
test provides evidence for or against our predictions,
there is always a chance of driving the wrong
conclusions. Even when choosing a probability level of
95%, there is always a 5% chance that one rejects the
null hypothesis when it was actually correct. This is
called Type I error, represented by the Greek letter α.
• It is possible to err in the opposite way if one fails to
reject the null hypothesis when it is, in fact, incorrect.
This is called Type II error, represented by the Greek
letter β.
These two errors are represented in
the following chart
Type of decision
H0 true
H0 false
Reject H0
Type I error (a)
Correct decision (1b)
Accept H0
Correct decision (1a)
Type II error (b)
Steps in Hypothesis Testing
Identify the null hypothesis H0 and the alternate
hypothesis HA.
2
Choose a. The value should be small, usually less than
10%. It is important to consider the consequences of
both types of errors.
3
Select the test statistic and determine its value from
the sample data. This value is called the observed value
of the test statistic. Remember that a t statistic is
usually appropriate for a small number of samples; for
larger number of samples, a z statistic can work well if
data are normally distributed.
4
Compare the observed value of the statistic to the
critical value obtained for the chosena.
5
Make a decision.
If the test statistic falls in
the critical region:
Reject H0 in favour of HA.
If the test statistic does not
fall in the critical region:
Conclude that there is not
enough evidence to reject
H0.
Numerics
Thank you