Download 1 Review of Statistics: Understanding P

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Review of Statistics: Understanding P-Values (Required, Page 784, 5th edition)
Monte Carlo Simulation (Optional)
After we compute the t statistic, the P-value for a two-tailed test is given by
p-value =
(| |
| |)
(
| |)
(
| |) = 2* (
| |)
The last equality is due to the symmetry of the Student’s T distribution.
stata function ttail(df, |t|)
(
| |) can be found using
In stata, we can run a simulation (called Monte Carlo) to obtain the p-value. Basically we can generate
many random numbers that follows the t distribution. Then we compute the frequency (probability) that
those random numbers are greater than the t statistic. For a two-tailed test, we multiply that frequency
by 2 and get the p-value. The stata codes are below:
sca count = 0
forvalues i = 1/100000{
sca randomnumber = rt(51)
sca count = count + (randomnumber > 0.5320)
}
dis “pvalue for one-tailed test is” count/100000
dis “pvalue for two-tailed test is” 2*count/100000
The basic ideas of hypothesis testing are
(1) big t values (extremely large or extremely small values) are unlikely to happen
(2) if we get a big t value indeed, this is evidence against the null hypothesis.
More explicitly, if the t statistic is big (in absolute value), p-value will be very small (close to zero).
Therefore, we can use the fact that either the t statistic is big or the p-value is small to reject the null
hypothesis.
1
Review of Statistics: Sample Mean
(Appendix C2)
The expectation and variance have the following properties (c is constant, A and B are random variables):
(
(
)
(
)
(
)
( )
( )
)
( )
( )
( )
( )
(
)
The sample mean is
̅
∑
Consider a random sample (i.i.d sample) in which ( )
( )
(
)
for any i, j.
We can show that for a random sample we have
( ̅)
(
)
So the sample mean obtained from a random sample is an unbiased estimator for the population mean.
Nice!
Moreover,
( ̅)
Note
(
is the standard deviation.
√
)
is the standard error.
( ̅ ) approaches zero. In other words, the sample mean ̅ is getting closer and
As n goes to infinity,
closer to the population mean as the sample size increases. This result is called law of large number.
Nice!
Review of Statistics: Normal Distribution
(Appendix B5)
A normal random variable can be denoted as
(
)
2
Consider a special linear transformation, called standardizing (obtaining the z-score):
Exercise 1: Find ( )
( )
It follows that
(
)
For the standard normal distribution, Table G.1 in the text book reports
z is a specific value. In stata, (
) is given by function normal(z)
Exercise 2: Find
Exercise 3: find
(
(
) where the lowercase
)
so that
(
)
( ). Find and so that (
Exercise 3. Let
95% confidence intervals for With 95% probability,
. Those two values are called critical values.
)
. Those two values define the
will take a value inside that interval.
Exercise 4: Find the 95% confidence intervals for the population mean if we have large sample (n>120).
In stata, (
) is given by function normal(z). The value of z so that
function invnormal( ).
(
)
is given by
3
Excise 5: Suppose the SAT score ( ) follows normal distribution. The average score is 21, so
the standard deviation is 5, so
.
and
(a) Find the probability that a student's SAT score is greater than 30.
(b) Suppose one student's score is c. Find c so that 90% students earn scores lower than him. Now we
know how the admission office decides that cutoff number for the SAT score (a student is accepted if his
SAT score is above that number).
Critical thinking: how to check whether a random variable follows normal distribution?
Central Limit Theorem (page 767, Appendix C3, 5th edition)
The central limit theorem says that as n goes to infinity, the sample mean follows a normal distribution
̅
(
)
Or equivalently,
̅
̅
(
)
√
The standardized sample mean
̅
is also called the t statistics (t value, t ratio).
√
Summary: how to run a t test?
Step 1: compute sample mean, standard deviation, standard error
Step 2: specify the null hypothesis, and obtain the t statistic
Step 3: specify the alternative hypothesis (one-sided or two-sided), and compute the p-value
Step 4: reject the null hypothesis is the p-value is less than 0.05
4