Download Formulas for Ch 5

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Formulas for Ch 5
For categorical data, we can use the count of successes, X
X = n and X =
 (1   )n (this is needed for the first homework problem)
or we can use the sample proportion, p, where p = X/n. Note: we prefer the proportion because we can
control the size of the standard deviation with the sample size, n.
The mean of p is the mean of the population, , the true proportion of the population, and the standard
deviation of p is the square root of the product of the true proportion of successes times the true proportion
of failures divided by the sample size, n.
 p   , the true proportion of the population,  p 
 (1   )
n
If n and n(1 ) are at least 10, then the sampling distribution of the sample proportion,
(
p is approximately N  ,
 (1   )
2
)
n
Note: there must be at least 10 successes and 10 failures for p to be approximately normal.
To find probabilities, we need the distribution --- shape, center and spread.
1. shape is normal IF n AND n(1) are at least 10.
2. center (mean) is  if we have a random sample
3. spread (sd) is (1)/n, again if we have a random sample.
We then convert p to a z-score by substracting the mean and dividing by the standard deviation.
For example: Say the sample proportion of 100 M&M’s, p100, is approximately normal with a mean of
10% and standard deviation of 3%, so p100 ~ N (0.10, 0.032 ) . How likely are we to see a sample
proportion greater than 20% 
P  p  0.2   P( Z 
(0.2   p )
p
) = P( Z >
(0.20  0.10)
) = P( Z > 3.33 ) = P( Z < 3.33) = 0.0004
0.03
For numeric data, we use the sample mean, X = xi / n . The mean of the sample mean is the mean of the
population (center) and the standard deviation of the sample mean is the standard deviation of the
population divided by the sample size, n (spread).
X  X ,  X 
X
n
If the population is normal, then the sampling distribution of the sample mean will be normal (shape). If the
population is not normal, but the sample size is large enough (at least 30), the sampling distribution of the
sample mean will be approximately normal.
2

2 
x  x

x
 and
= z, a standard normal, N( 0, 12)
X is approximately N  μ X ,
x
n 



The distribution of the difference of two normally distributed random variables, X and Y is normal with
2
mean, XY = X  Y, and standard deviation,  X-Y   X2   Y2  X  Y ~ N( X  Y,  X2   Y2 )
Often we need to know how likely it is for X to be bigger than Y (or less than). This is the same as saying
the difference, XY is positive.
P( x  y  0)  P( Z 
( x  y)   X Y
)
 X Y