Download chapter5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
STP 420 SUMMER 2002
STP 420
INTRODUCTION TO APPLIED STATISTICS
NOTES
PART 2 – PROBABILITY AND INFERENCE
CHAPTER 5
FROM PROBABILITY TO INFERENCE
Introduction
The distribution of a statistic
A statistic from a random sample or randomized experiment is a random variable. The
probability distribution of the statistic is its sampling distribution.
Population Distribution
The population distribution of a variable is the distribution of its values for all members
of the population. The population distribution is also the probability distribution of the
variable when we choose one individual from the population at random.
1
STP 420 SUMMER 2002
5.1
Sampling Distributions for Counts and Proportions
The random variable X is a count of the occurrences of some outcome in a fixed number
of observations n.
Sample proportion - pˆ 
X
n
The binomial distributions for sample counts
The binomial setting
1.
There are a fixed number n of observations.
2.
The n observations are all independent.
3.
Each observation falls into one of just two categories, which for convenience we
call “success” and “failure”.
4.
The probability of a success, call it p, is the same for each observation.
Binomial distributions
The distribution of the count X of successes in the binomial setting is called the binomial
distribution with parameters n and p. The parameter n is the number of observations, and
p is the probability of a success on any one observation. The possible values of X are the
whole numbers from 0 to n. X is B(n, p).
2
STP 420 SUMMER 2002
Binomial distributions in statistical sampling
Sampling distribution of a count
When the population is much larger than the sample, the count X of successes in as SRS
of size n has approximately the B(n, p) distribution if the population proportion of
successes is p. (population >= 10 as large as sample)
Finding binomial probabilities: tables
The tables (Table C) give all the possibilities of k (k  0), for the n (n > 1) trials.
Consider the experiment, rolling two dice and recording the sum of the two faces.
X is the discrete random variable with values 2, 3, …, 12
P(X  3 ) = P(X = 3) + P(X = 2)
P(X < 4 ) = P(X = 3) + P(X = 2)
P(X  7 ) = P(X = 7) + P(X = 8) + … + P(X = 12)
Binomial mean and standard deviation
If a count X has the binomial distribution B(n, p), then
X = np
 X  np(1  p)
3
STP 420 SUMMER 2002
Sample proportions
p
count of successes in sample X

size of sample
n
Mean and Standard Deviation of a sample proportion
Let p̂ be the sample proportion of success in an SRS of size n drawn from a large
population having population proportion p of successes. The mean and standard deviation
of p̂ are
 pˆ  p
 pˆ 
p(1  p)
n
Normal approximation for counts and proportions
Draw an SRS of size n from a large population having population proportion p of
X
successes. Let X be the count of successes in the sample and pˆ  the sample
n
proportion of successes. When n is large, the sampling distributions of these statistics are
approximately normal:
X is approximately N(np,
p̂ is approximately N(p,
np(1  p) )
p(1  p)
)
n
We want n and p such that np  10 and n(1-p)  10.
4
STP 420 SUMMER 2002
The continuity correction
Figure 5.5 shows how a normal curve approximates the binomial distribution and there is
a correction factor that improves the accuracy.
For a discrete distribution (binomial)
P(X = 6) can be computed but the bar actually begins at 5.5 and stops at 6.5
If a continuous distribution (normal) is used to approximate, P(X = 6) = 0 and we try to
find the probability of an interval instead.
 X  10 9.5  10 

Eg. P(X  9) = P(X  9.5) = P
  P(Z  -0.17) = 0.4325
3 
 3
Continuity correction – interval 0.5 below a whole number to 0.5 above the whole
number.
Binomial formulas
Binomial coefficient – number of ways of arranging k successes among n observations
n
n!
  
for k = 0, 1, 2, …, n
 k  k!(n  k )!
where
n! = n  (n - 1)  (n – 2)  …  3  2  1
Binomial Probability
If X has the binomial distribution B(n, p) with n observations and probability p of
successes on each observation, the possible values of X are 0, 1, 2, …, n. If k is any one
of these values, the binomial probability is
n
P( X  k )    p k (1  p) n k
k 
5
STP 420 SUMMER 2002
5.2
The Sampling Distribution of a Sample Mean
Discrete random variables – uses counts and proportions
Continuous random variables – uses measured data and can find mean, percentiles, or sd
Important: averages are less variable than individual observations and also more normal
The mean and standard deviation of x
1
Mean of X is x  ( X 1  X 2  ...  X n )
n
1
1
Mean of x  x  (  X1   X 2  ...   X n )  (     ...   )   is the same as the
n
n
population mean
For independent observations,
2
1
1
The variance is  x2    ( X2 1   X2 2  ...   X2 n )    ( 2   2  ...   2 ) 
n
n
n
2
2
Let x be the mean of an SRS of size n from a population having mean  and standard
deviation .
The mean of x is  x  
The standard deviation of x is  x 

n
6
STP 420 SUMMER 2002
The sampling distribution of x
Sampling Distribution of a Sample Mean
If a population has the N(, ) distribution then the sample mean x of n independent
observations has the N(, /n) distribution.
The sample mean of an SRS from a normal population has a normal distribution
In general, any linear combination of independent normal random variables is also
normally distributed.
The central limit theorem
Draw as SRS of size n from a population with mean  and finite standard deviation .
When n is large, the sampling distribution of the sample mean x is approximately
normal:
x ~ N(,

n
)
The distribution of the population does not have to be normally distributed and the
quantities do not have to be independent (some correlation is not a problem)
Exponential distribution – strongly right skewed
- applicable to problems involving the time required to serve a customer or to
repair a machine.
- as sample sizes n increases the exponential curve starts to look more like the
normal curve.
7
STP 420 SUMMER 2002
Beyond the basics – Weibull distributions
- Appropriate distribution when considering experiments such as time to
something lasts before it fails.
- Engineers uses it to study the reliability of products.
-
eg. Infant mortality (most products fail almost immediately or very early)
eg. Early failure (kind of right skewed)
eg. Old-age wear out (kind of left skewed)
8