Download Chapter 7 – Random Variables and Probability Distributions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Chapter 7 – Random Variables and Probability Distributions
(Linking probability with statistical inference)
7.1 Random Variables
Random variables – variable whose value is subject to uncertainty, because the value
depends on the outcome of a chance experiment. They are represented using lower
case variables.
Discrete random variable – set of possible values are isolated points on a number
line. Discrete can be finite or infinite!
Continuous random variable – set of possible values include intervals of the
number line.
7.2 Probability Distributions for Discrete Random Variables
Probability Distribution of a discrete random variable x
 A model that describes the long-run behavior of the variable.
 It gives the probability associated with each possible value of x. Each probability
is the limiting relative frequency of occurrence of the corresponding x value
when the chance experiment is performed repeatedly.
 Commonly displayed using tables, probability histograms, and formulae.
Properties of Discrete Probability Distributions
 For every possible x value, 0 ≤ p(x) ≤ 1

 p  x  1
allx
7.3 Probability Distributions for Continuous Random Variables
Probability Distribution for a continuous random variable x
 Is specified by a mathematical function denoted by f(x) and called the density
function
 The graph of density function is a smooth curve (i.e. density curve)
 Two requirements:
o f(x) ≥ 0 (i.e. curve cannot dip below horizontal axis)
o Total area under the density curve is 1.
 The probability that x falls in any particular interval is the area under the curve
between the least and greatest values of the interval.
 Three common events with probability calculations for continuous random
variables:
o a < x < b (x is a value between two numbers)
o x < a (x is a value less than a given number)
o x > b (x is a value greater than a given number)
 For any two number a and b, with a < b
P( a ≤ x ≤ b) = P(a < x ≤ b) = P(a ≤ x < b) = P(a < x < b)
o The probability that a continuous random variable x lies between a lower limit a
and an upper limit b is:
P(a < x < b) = (cumulative area to the left of b) – (cumulative are to the left of a) or
P(a < x < b) = P(x < b) – P (x < a)
7.4 Mean and Standard Deviation of a Random Variable
Mean Value of a Random Variable x – denoted by  x , describes where the probability
distribution is centered. The “mean” value is sometimes referred to as the expected value
and is denoted by E(x).
 x =  x P  x 
allx
Standard Deviation of a Random Variable x – denoted by  x , describes variability in the
probability distribution. A small  x indicates the observed values tend to be close to the
mean (i.e. little variability) and a large  x indicates more variability in the observed
values of x.
Variance (denoted by 
2
x
=
 x   
x
2
 P  x
allx
So (standard deviation)  x =  2 x
(The mean, variance, and standard deviation of a random variable can be found in your
calculator by putting the values of the random variable in L1 and the probabilities in L2.
 x will be the X and  x will be the x and note that n=1 because the sum of the
probabilities is 1!)
The mean, variance and standard deviation of a linear function –
If x is a random variable with mean  x , variance  2 x , and standard deviation  x ;
and a and b are numerical constants, then the random variable y can be defined by:
y = ax + b and is called a linear function of the random variable x
the mean of y = ax + b is  y =  a bx =a +b  x
the variance of y is  2 y =  2 a bx =b2  2 x from which it follows
the standard deviation of y is  y =|b|  x
The mean, variance and standard deviation of a linear combination –
If x1, x2, …, xn are random variables; and a1, a2, …, an are numerical constants; then the
random variable y is defined as:
y = a1x1 + a2x2 + … +anxn is a linear combination of the xi’s!
and  y = a1x1 + a 2 x 2 +
independent)
+a n x n
and  2 y =  a1x1 + a 2 x 2 +
2
and  y =  a1x1 + a 2 x 2 +
=a11 + a22 + … + ann (regardless of whether the xi’s are
+a n x n
+a n x n
=
= a 21 21  a 2 2 2 2 
 a 2 n 2 n
a21 21  a22 22 
 a2 n 2 n
7.5 Binomial and Geometric Distributions
Arise when an experiment consists of making a sequence of dichotomous (two possible
values) observations called trials.
Binomial Probability Distribution – results from a sequence of trials that meet these
criteria:
 There are a fixed number of observations
 Each trial can result in one of only 2 mutually exclusive outcomes (labeled S –
success or F- failure)
 Outcomes of different trials are independent
 The probability that a trial results in S is the same for each trial
Binomial Random Variable x = number of successes observed when a binomial
experiment is performed
n x
Then the binomial distribution is :
P(x) = n Cx x 1   
where n = number of independent trials,  = constant probability of a success, and
nCx is a combination (n objects taken x at a time when order does not matter)
n!
represented by
or in other words:
x ! n  x !
n!
n x
 x 1   
x ! n  x !
Can be done in your calculator using: binompdf(#trials, prob. of success, # successes)
(found in the distr menu by pressing 2nd, vars on the TI-84 or going into the Stats/List
application on the TI-89 and then F5 to get to the distr menu.)
P(x) =
The cumulative binomial distribution is similar but gives the probability of up to a certain
number of successes:
binomcdf(#trials, prob. of success, maximum # of successes)
mean value of a binomial random variable:  X  n
variance of a binomial random variable  2 X  n 1   
standard deviation of a binomial random variable:  X  n 1   
Geometric Probability Distribution – is interested in the number of trials need to get a
success and has the criteria:
 Each trial can result in one of only 2 mutually exclusive outcomes (labeled S –
success or F- failure)
 Outcomes of different trials are independent
 The probability that a trial results in S is the same for each trial
(only difference is there is not a fixed number of trials!!!!)
Geometric Random Variable x = number of trials to first success when an experiment is
performed
x 1
Then the geometric distribution is:
P(x) = 1    
Can be done in the calculator using geometpdf(prob. of success, # of first successful
trial)
or geometcdf(prob. of success, most # of trials)
mean value of a geometric random variable:  X 
variance of a binomial random variable  2 X
1

1   

2
standard deviation of a binomial random variable:  X 
1   
2
7.6 Normal Distributions
Normal Distributions (also called a normal curve) are important because:
 They provide a reasonable approximation to the distribution of many variables by
using a function that matches all possible outcomes of a random phenomenon
with their associated probabilities. Since there are an infinite number of
outcomes, a definite probability cannot be matched with a particular outcome.
 They play an important role in inferential statistics.
There are many normal distributions and they are:
 bell-shaped, symmetrical, and unimodal
 the area under the curve is still 1
 The curve continues infinitely in both directions and is asymptotic to the x-axis as
it approaches ±
 they are defined by their mean () and their standard deviation ().
 The smaller the standard deviation, the taller and thinner the curve
 The larger the standard deviation, the wider and flatter the curve
 The “points of inflection” (where the curve changes from down to up) are at ±1
The Standard Normal Curve (or z curve) is the normal distribution with  = 0 and  = 1
and it is consistent with the Empirical Rule; where 68% fall within ±1, 95% fall within
±2, and 99.7% fall within ±3.
Table A: Standard normal probabilities gives the probabilities for any z* between -3.89
and 3.89.
Area under z curve to the left of z* = P(z < z*) = P(z ≤ z*)
Area under z curve to the right of z* = 1 – (area under z curve to the left of z*)
Area under z curve between a and b = P( z < b) – P( z < a)
We can also use the Table A: Standard normal probabilities in reverse to find values that
give a certain percentage. This can also be done using the invNorm function of the
calculator where the arguments are invNorm (area, mean, standard). The calculator
uses a default mean of zero and default standard deviation of 1 if none is given.
If locations other than whole standard deviations are desired, then we compute a zscore and use Table A: Standard normal probabilities.
The z-score is the number of standard deviations a value is from the mean. It is
computed by:
z
x

(this can also be found using the normalcdf function of the calculator using -1000 and
1000 as the upper or lower limits)
To convert a z-score back to an x value use: x =  + z