Download George Hamada Stats 460 Lecture on T

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pattern recognition wikipedia , lookup

Data analysis wikipedia , lookup

Corecursion wikipedia , lookup

Confidence interval wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Data assimilation wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Transcript
George Hamada
Stats 460
Lecture on T-stats Fri. 9/10/2004
What is T-test?
1. A computed quantity used to decide hypothesis tests.
2. When dealing with a quantitative data, we use a t-multiplier opposed to the zmultiplier we used with a proportion.
3. The general formula for calculating a t-stat for making an inference about a single
population data is
T-stat= Observed sample statistic-Tested Value/Standard Error
Or
Sample Statistic – Null Value/Null Standard Error
Observed valued sample statistic = statistic of interest from the sample (in a given
problem it is usually the mean.)
Tested Value = Hypothesized population parameter
/ square root of n = standard deviation of the sampling
Standard Error=
distribution divided by the square root of n.
Confidence Interval for a population mean
1. The basic formula for all types of confidence interval is the same
Estimate +- Multiplier * Standard Error
2. Point Estimate = A number computed from a sample to represent a population
parameter.
The point estimate for a population mean mu is the sample mean
So the Standard Error for x(bar) is s/square root of n
.
3. When computing a CI for quantitative data, we always need a multiplier that
reflects the sample size (n).
a. In this case, it’s df=n-1.
4. So the final product is
x bar + - t * s/square root of n…population mean
In Stat 200, we learned the following inferences which deal with the use of t-test.
Estimating a mean
Test about a mean
Estimating the difference of two means
Test to compare two means
Estimating a mean with paired data
Test about a mean about paired data
Estimating a mean
Parameter = One population mean mu
Statistic = x bar
Type of data = Quantitative
Examples = What is the average cholesterol level of adult males?
Analysis= 1-sample t-interval
X bar +- t * s/square root of n
Conditions = Data is approximately normal or have a large sample size (rule of
thumb is n 30 )
Test about a mean
Parameter=One population mean mu
Statistic= Sample mean x bar
Examples = Is the average GPA of science majors higher than 3.2?
Analysis=
H (null): mu=mu(null)
H (alt.): mu not equal to mu (null)
This is two sided.
H(alt.): mu > mu(null)
This is one sided.
H(alt.): mu < mu(null)
One sided as well
•
•
For H(alt): mu not equal to mu (null) the p-value is 2 times the prob.
For H(alt): mu <mu(0) and H(alt.): mu > mu(0), the p-value is the prob
Conditions=Data approximately normal or have a large sample size
Estimating the difference of two means
Parameter=Difference in two population means mu1 – mu2
Statistic=Difference in two sample means xbar1-xbar2
Examples = How different are the mean SATs of males and females at Penn
State?
Analysis= 2 sample t – interval.
(x bar1 – x bar2) +- t * square root (s^2 (1)/n1 + s^2 (2)/n2)
Conditions=Independent samples from the two populations
Data in each sample are about normal or large samples.
Test to Compare two means
Parameter= Difference in two population means mu1-mu2
Statistic=Difference in two sample means xbar1 – xbar2.
Examples=Do the mean pulse rates of athletes and non-athletes differ?
Analysis= H(o): mu1 = mu2
H(a): mu1 not equal to mu2
H(a): mu1 >mu2
H(a): mu1<mu2
2 – sample T-test
t= (xbar1 – xbar2) – 0 / square root ( s^2(1)/n1 + s^2(2)/n2
Conditions=Independent samples from the two populations
Large data or normal
Estimating a mean with paired data
Parameter=mean of paired difference
Statistic=sample mean of difference
Examples=What is the difference in pulse rates, on the average, before and after
an exercise.
Analysis= Paired T-interval
Xbar2 – Xbar1 = X (Difference)
D(bar) +- t * s(d)/ square root of n.
Conditions=differences are approximately normal
Large number of pairs
Test about a mean with paired data
Parameter=Mean of paired difference
Statistic=Sample mean of difference
Example=Is the difference in IQ of pairs of twins zero?
Are the pulse rates of people higher after exercise?
Analysis= H(o): mu(o) = 0
H(a): mu(d) not equal to o
H(a): mu(d) >0
H(a): mu(d)<0
Paired v.s. Independent
Independent data: Occurs when the observations are not related in anyway.
For example, we take a random sample of 50 Greeks and 50 non-Greeks’ GPA.
The scores from the first Greek and the first non-Greek are not related. The
observations are independent.
Paired data: We selected 50 random subjects to participate in a diet study. We
record the weight before and after certain time period. This is a paired data
because it is a repeated measurement on the same subject. Also, if the problem
states twins, husband and wives, then this is also considered a paired data.
When doing inference steps there’s 5 basic steps you should follow.
1. Decide what technique/test to use for the problem
2. State the null and alternative hypothesis,
Keep in mind that the null is the favored claim and the alternative is ..yea
just the alternative hypothesis.
3. Check the conditions. If the conditions are met, then the test is valid.
4. Calculate the p-value and/or confidence interval.
5. Write a conclusion.
Reject or do not the reject the null is the point of performing a hypothesis test.
You are not accepting nor claiming the null in these problems.
Key Note
There are two types of errors in hypothesis testing. When making a conclusion, please be
aware that we could have made one of two errors.
Type 1 error = Occurs when the null hyp. Is actually true, but we reject it.
Type 2 error= Occurs when the null hyp. Is actually false, but we fail to reject it.
The End