Download 4.1 Two Types of Random Variables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
University of California, Davis
Department of Statistics
Summer Session II
Statistics 13
August 8, 2012
Date of latest update: August 8
Lecture 4: Random Variables and Probability Distributions
Definition 4.1 A random variable is a variable that assumes numerical values associated with the random outcomes of an experiment, where one (and only one) numerical
value is assigned to each sample point.
Example 1 For the coin tossing example, the sample space is {H, T }. We can represent
the outcome by a random variable X such that
(
1 if the outcome is H
X=
0 if the outcome is T
Example 2 A dice is rolled and the up face is recorded. The sample space is {1, 2, 3, 4, 5, 6}.
Here, we can represent the outcome by a random variable X where X is the number shown
in the up face. Notice that X is just equal to the realized sample point (i.e.1, 2, 3, 4, 5, 6).
4.1 Two Types of Random Variables
Definition 4.2
called discrete.
Random variables that can assume a countable number of values are
Example 3 The random variables in Example 1 and 2.
Example 4 The number of earthquakes in next year: It can be 0, 1, 2, . . . .
Definition 4.3 Random variables that can assume values corresponding to any of the
points contained in an interval are called continuous.
Example 5 The length of time X (in minutes) a student takes to complete an one-hour
exam: 0 ≤ X ≤ 60.
1
4.2 Probability Distributions for Discrete Random Variables
Definition 4.4 The probability distribution of a discrete random variable is a graph,
table, or formula that specifies the probability associated with each possible value that
the random variable can assume.
Requirements for the Probability Distribution of a Discrete Random Variable
X
1. p(x) ≥ 0, for all values of x.
P
2.
p(x) = 1
where the summation of p(x) is over all possible values of x.
Example 6 Suppose you pay $5 to play a game. If you win, you get $6 back, otherwise
you get nothing. Let $X be the winning from the game. The probability distribution of X
is given by
x
6
0
p(x) 1/3 2/3
Definition 4.5
Definition 4.6
The mean, or expected value, of a discrete random variable X is
P
µ = E(X) = x xp(x).
The variance of a random variable x is
P
P
σ 2 = E[(X − µ)2 ] = x (x − µ)2 p(x) = x x2 p(x) − µ2 .
Example 7 Refer to Example 6, the expected value of X is given by
µ = E(X) = 6 ×
1
2
+ 0 × = 2.
3
3
and the variance is given by the following calculation:
E(X 2 ) = 62 ×
1
2
+ 02 × = 12.
3
3
Then the variance is given by
σ 2 = V ar(X) = E(X 2 ) − µ2 = 12 − 22 = 8.
Example 8 Refer to Example 6 and 7, the expected profit of the game is
E(X) − 5 = 2 − 5 = −3.
2
Definition 4.7 The standard deviation
of a discrete random variable is equal to the
√
2
square root of the variance, or σ = σ .
Chebyshev’s Rule and Empirical Rule for a Discrete Random Variable
Let X be a discrete random variable with probability distribution p(x), mean µ, and
standard deviation σ. Then, depending on the shape of p(x), the following probability
statements can be made:
Chebyshev’s Rule
Empirical Rule
Applies to any
Applied to probability distributions
probability distribution that are mound shaped and symmetric
P (µ − σ < X < µ + σ)
≥0
≈ 0.68
P (µ − 2σ < X < µ + 2σ)
≥ 3/4
≈ 0.95
P (µ − 3σ < X < µ + 3σ)
≥ 8/9
≈ 1.00
4.3a The Binomial Distribution
Many experiments result in dichotomous responses (i.e., responses for which there exist
two possible alternatives, such as Yes-No, Pass-Fail, Defective-Nondefective, or MaleFemale). A simple example of such an experiment is the coin-toss experiment. A coin
is tossed a number of times, say, 10. Each toss results in one of two outcomes, Head or
Tail, and the probability of observing each of these two outcomes remains the same for
each of the 10 tosses. Ultimately, we are interested in the probability distribution of x,
the number of heads observed. Many other experiments are equivalent to tossing a coin
(either balanced or unbalanced) a fixed number n of times and observing the number x of
times that one of the two possible outcomes occurs. Random variables that posses these
characteristics are called binomial random variables.
Characteristics of a Binomial Random Variable
1. The experiment consists of n identical trials.
2. There are only two possible outcomes on each trial. We will denote one outcome by
S (for Success) and the other by F (for Failure).
3. The probability of S remains the same from trial to trial. This probability is denoted
by p, and the probability of F is denoted by q = 1 − p.
4. The trials are independent.
5. The binomial random variable X is the number of S’s in n trials.
3
The Binomial Probability Distribution
p(x) = nx px q n−x
(x = 0, 1, 2, · · · , n)
where
p
q
n
x
=
=
=
=
Probability of a success on a single trial
1−p
Number of trials
Number
of successes in n trials
n
n!
=
x
x!(n − x)!
We write X ∼ B(n, p).
Mean, Variance, and Standard Deviation for a Binomial Random Variable
Mean: µ = np
Variance: σ 2 = npq
√
Standard deviation: σ = npq
4.3b The Poisson Distribution
Consider a type of event occurring randomly through time, say the number of earthquakes.
Let X be the number occurring in a unit interval of time. Then under the following
conditions, X can be shown mathematically to have a Poisson(λ) distribution.
Characteristics of a Poisson Random Variable
1. The events occur at a constant average rate of λ per unit time.
2. Occurrences are independent of one another.
3. More than one occurrence cannot happen at the same time.
The Poisson Probability Distribution
A random variable X taking values 0, 1, 2, . . . has a Poisson distribution if
P (X = x) =
e−λ λx
x!
for x = 0, 1, 2, . . .
We write X ∼ Poisson(λ). [Note that P (X = 0) = e−λ as λ0 = 1 and 0! = 1.]
For example, if λ = 2, we have
P (0) = e−2 = 0.135335, P (3) =
4
e−2 ×23
3!
= 0.180447.
As required for a probability function, it can be shown that the probabilities P (X = x)
all sum to 1.
Learning to use the formula Using the Poisson probability formula, verify the following: If λ = 1, P (0) = 0.36788, and P (3) = 0.061313.
Example 9 While checking the proofs of some theorems in the first four chapters of a
mathematical statistics textbook, the authors found 1.6 printer’s errors per page on average. We will assume the errors were occurring randomly according to a Poisson process.
Let X be the number of errors on a single page. Then X ∼ Poisson(λ = 1.6). We will
use this information to calculate a large number of probabilities.
a. The probability of finding no errors on a particular page is
P (X = 0) = e−1.6 = 0.2019
b. The probability of finding 2 errors on any particular page is
P (X = 2) =
e−1.6 (1.62 )
2!
= 0.2584
c. The probability of no more than 2 errors on a page is
e−1.6 (1.60 ) e−1.6 (1.61 ) e−1.6 (1.62 )
+
+
0!
1!
2!
= 0.2019 + 0.3230 + 0.2584 = 0.7833.
P (X ≤ 2) = P (0) + P (1) + P (2) =
d. The probability of more than 4 errors on a page is
P (X > 4) = P (5) + P (6) + P (7) + P (8) + . . .
so if we tried to calculate it in a straightforward fashion, there would be an infinite
number of terms to add. However, if we use P (A) = 1 − P (Ac ), we get
P (X > 4) = 1 − P (X ≤ 4) = 1 − [P (0) + P (1) + P (2) + P (3) + P (4)]
= 1 − (0.2019 + 0.3230 + 0.2584 + 0.1378 + 0.0551)
= 1 − 0.9762 = 0.0238.
e. Let us now calculate the probability of getting a total of 5 errors on 3 consecutive
pages. Let Y be the number of errors in 3 pages. The only thing that has changed
is that we are now looking for errors in bigger units of the manuscript so that the
average number of events per unit we should use changes from 1.6 errors per page
to 3 × 1.6 = 4.8 errors per 3 pages. Thus,
5
Y ∼ Poisson(λ = 4.8)
and
e−4.8 (4.85 )
5!
P (Y = 5) =
= 0.1747.
f. What is the probability that in a block of 10 pages, exactly 3 pages have no errors?
There is quite a big change now. We are no longer counting events (errors) in
a single block of material so we have left the territory of the Poisson distribution.
What we have now is akin to making 10 tosses of a coin. It lands “heads” if the page
contains no errors. Otherwise it lands “tails”. The probability of landing “heads”
(having no errors on the page) is given by (a), namely P (X = 0) = 0.2019. Let W
be the number of pages with no errors. Then
W ∼ B(n = 10, p = 0.2019) and
P (W = 3) = 10
(0.2019)3 (0.7981)7 = 0.2037.
3
4.4 Continuous Probability Distributions for Continuous Random
Variables
Just as we describe the probability distribution of a discrete random variable by specifying
the probability that the random variable takes on each possible value, we describe the
probability distribution of a continuous random variable by giving its density function.
If X is a continuous random variable, then a density function is a function f (x) with
domain (a, b) which satisfies the following three properties:
1. f (x) ≥ 0 for all x in (a, b).
Rb
2. −a f (x)dx = 1, and
3. for any a ≤ c < d ≤ b, P (c < X < d) =
Rd
c
f (x)dx
Notes:
1. a and b could possibly be −∞ and ∞.
2. f (x) is NOT a probability (i.e. f(1) is not the probability that X = 1), it is the
probability density.
3. P (c < X < d) is the area under the curve f (x) between c and d.
4. For a continuous random variable, the probability of any single value is zero. ie.
P (X = c) = 0. This also means that P (c ≤ X ≤ d) = P (c < X < d) = P (c ≤ X <
d) = P (c < X ≤ d).
5. The domain of f (x) can include the endpoints [a, b], or not (a, b) (or a combination).
6
Characteristics of the density function
1. the density function is always nonnegative. i.e. the graph of the density function
always lies on or above the x-axis.
2. the areas under the density curve is 1.
We think of a continuous random variable with density function f as being a random
variable that can be obtained by picking a point at random from under the density curve
and then reading off the x-coordinate of that point. Because the total area under the
density curve is 1, the probability that the random variable takes on a value between a
and b is the area under the curve between a and b. More precisely, if X is a random
variable with density function f and a < b, then
Rb
P (a ≤ X ≤ b) = a f (x)dx
4.5 The Normal Distribution
One of the most commonly observed continuous random variables has a bell-shaped
probability distribution (or bell curve). It is known as a normal random variable
and its probability distribution is called a normal distribution.
Probability Distribution for a Normal Random Variable x
Probability density function:
1
2
f (x) = √ e−(1/2)[(x−µ)/σ]
σ 2π
where
µ
σ
π
e
=
=
=
=
Mean of the normal random variable x
Standard deviation
3.1416 . . .
2.71828 . . .
P (x < a) is obtained from a table of normal probabilities.
Definition 4.8 The standard normal distribution is a normal distribution with
µ = 0 and σ = 1. A random variable with a standard normal distribution, denoted by
the symbol Z, is called a standard normal random variable.
Example 10
The probability that a standard normal random variable exceeds 1.96 in
absolute value is
P (|Z| > 1.96) = P (Z < −1.96 or Z > 1.96) = 0.05
7
Property of Normal Distributions
If X is a normal random variable with mean µ and standard deviation σ, then the random
variable z denoted by the formula
X −µ
Z=
σ
has a standard normal distribution. The value z describes the number of standard deviations between x and µ.
Steps for Finding a Probability Corresponding to a Normal Random Variable
1. Sketch the normal distribution and indicate the mean of the random variable X.
Then shade the area corresponding to the probability you want to find.
2. Convert the boundaries of the shaded area from x values to standard normal random
variable z values by using the formula
z=
x−µ
σ
Show the z values under the corresponding x values on your sketch.
3. Use the standard normal table to find the areas corresponding to the z values. If
necessary, use the symmetry of the normal distribution to find areas corresponding
to negative z values and the fact that the total area on each side of the mean equals
0.5 to convert the areas from the table to the probabilities of the event you have
selected.
4.6 Descriptive Methods for Assessing Normality
In next session, we learn how to make inferences about the population based on the information contained in the sample. Several of these techniques are based on the assumption
that the population is approximately normally distributed. Consequently, it will be important to determine whether the sample data come from a normal population before we
can apply these techniques properly. A number of descriptive methods can be used to
check for normality.
Determining whether the Data Are from an Approximately
Normal Distribution
1. Construct either a histogram or stem-and-leaf display for the data, and note the
shape of the graph. If the data are approximately normal, the shape of the histogram
or stem-and-leaf display will be similar to the normal curve (i.e., the display will be
mound shaped and symmetric about the mean).
8
2. Compute the intervals x̄±s, x̄±2s, and x̄±3s, and determine the percentage of measurements falling into each. If the data are approximately normal, the percentages
will be approximately equal to 68%, 95%, and 100%, respectively.
3. Find the interquartile range IQR and standard deviation s for the sample, and then
calculate the ratio IQR/s. If the data are approximately normal, then IQR/s≈1.3.
4. Construct a normal probability plot for the data. If the data are approximately
normal, the points will fall (approximately) on a straight line.
Definition 4.9 A normal probability plot for a data set is a scatterplot with the
ranked data values on one axis and their corresponding expected z-scores from a standard
normal distribution on the other axis.
Example 11 Below is a normal probability plot for the NBA heights from the 2008-9
season. Do these data appear to follow a normal distribution?
4.7 Sampling Distribution
We assumed that we knew the probability distribution of a random variable, and using this
knowledge, we were able to compute the mean, variance, and probabilities associated with
the random variable. However, in most practical applications, the true mean and standard
deviation are unknown quantities that have to be estimated. Numerical quantities that
describe probability distributions are called parameters For instance, p, the probability
of a success in a binomial experiment, and µ and σ, the mean and standard deviation,
respectively, of a normal distribution, are examples of parameters.
Definition 4.10 A parameter is a numerical descriptive measure of a population.
Because it is based on the observations in the population, its value is almost always
unknown.
9
We have also discussed the sample mean x̄, sample variance s2 , sample standard
deviation s, and the like, which are numerical descriptive measures calculated from the
sample. We will often use the information contained in these sample statistics to make
inferences about the parameters of a population.
Definition 4.11 A sample statistic is a numerical descriptive measure of a sample.
It is calculated from the observations in the sample.
Note that the term statistic refers to a sample quantity and the term parameter refers
to a population quantity.
Definition 4.12 The sampling distribution of a sample statistic calculated from a
sample of n measurements is the probability distribution of the statistic.
4.8 The Central Limit Theorem
We are always interested in making an inference about the parameters of some population. For example, the mean µ and variance σ 2 can specify the distribution of a normal
distribution. From lecture note 2, we know the sample mean X̄ and sample variance are,
in general, good estimators of µ and σ 2 respectively. We now develop pertinent information about the sampling distribution for this useful statistics.
Properties of the Sampling Distribution of X̄
1. The mean of the sampling distribution equals the mean of the sampled population.
That is, µX̄ = E(X̄) = µ. When this property holds, we say X̄ is an unbiased
estimator of µ.
2. The standard deviation of the sampling distribution equals
Standard deviation of sampled population
Square root of sample size
√
That is, σX̄ = σ/ n
The standard deviation σX̄ is often referred to as the standard error of the mean.
Theorem 4.1 If a random sample of n observations is selected from a population with a
normal distribution, the sampling distribution of X̄ will be a normal distribution.
Theorem 4.2: Central Limit Theorem Consider a random sample of n observations
selected from a population (any population) with mean µ and standard deviation σ.
Then when n is sufficiently large, the sampling distribution of X̄ will be approximately
a
√
normal distribution with mean µX̄ = µ and standard deviation σX̄ = σ/ n. The larger
the sample size, the better will be the normal approximation to the sampling distribution
of X̄.
10