Download Finding the t-value having area 0.05 to it`s right

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Basic Quantitative Methods in the Social
Sciences
(AKA Intro Stats)
02-250-01
Lecture 7
In Review…
• Sampling distribution = The distribution of a statistic over repeated sampling
from a specified population.
• Standard error = The standard deviation of a sampling distribution (tells us how
much variability we will get over repeated sampling)
• If we know the shape and parameters (e.g., mean and standard deviation) of the
sampling distribution of a statistic, we can derive the position of a particular
statistic in the overall distribution.
More Review…
• John got a 76% on the last midterm in statistics. The class mean was 65%, and
the standard deviation was 6.
• Can we determine the position of John’s score in the class distribution? Yes! We
can calculate a Z-score.
• SO: We know the position of John’s score (i.e., z-score), the probability of this
score occurring in the class (which we get from the z-score), and the amount of
sampling error.
And More Review…
• What if we know that our class has a mean on the midterm of 68%
with a Standard Deviation of 7, and we want to know if the first 3 rows
did better than the rest of class…
• Can we consider the first 3 rows as a sample of the class (population)
and do a Z-test? YES! Why? We know the POPULATION PARAMETERS
(mean and standard deviation – so the modified Z formula will work).
More Review….
• Crucial to understand: We can do this because we KNOW the standard deviation
of the population.
• What if we want to know how our class mean (that is, 65%) compares with
introductory statistics courses across the country.
• Can we calculate the Z-score of our class mean to find out it’s position?
More Review…
• NO! Why not? Because we do not know the standard deviation of the
sampling distribution of the mean (i.e., our class is no longer the
population. Now our class is a sample in a larger population of
statistics classes)
Last Review Slide… 
• Central Limit Theorem: Given a population with mean  and standard
deviation , the sampling distribution of the mean (the distribution of
sample means) will have a mean equal to  and a standard deviation
equal to  =  /N. The distribution will approach the normal
distribution as N (sample size) increases.
Introduction to t-Distributions
• The reality is, we rarely know the population standard deviation (
).
• The t-distributions are a “family of theoretical distributions” that can be used
when:
 We are dealing with interval or ratio data
 Our data is normally distributed
 The population standard deviation ( ) is unknown.
t-Distributions continued..
• Review (ok, I lied, this is the last review slide):
• A normal distribution is a population of z-scores where z is defined as:
• Note: Here we know the population’s S.D ()
t-Distributions continued..
• A t-distribution is a population of t-scores where t is defined as:
• X-bar is mean of random sample
 Do you see the standard error of the mean in the formula?
t-Distributions continued….
• t-distributions are similar to the normal distribution in that they are
unimodal and symmetrical. They have a mean of 0, negative values
below the mean, and positive values above the mean.
t-distributions and degrees of freedom
• Because the definition of t involves a term obtained from a sample (that is the
estimated standard error of the mean), which in turn involves the degrees of
freedom associated with the sample, there is a different t distribution for every
degrees of freedom (sample size).
• The t-distribution can be found in Table E.6 (p.444 in Howell). You will notice an
extra column not in the normal curve table (df).
t-distributions and degrees of freedom (continued)
• Here we see a representation of a set of t distributions.
• Note that if the df is large t and z are the same, and they depart as the df gets smaller.
Basic Properties of t-Curves
• Property 1: The total area under a t-curve is equal to 1.
• Property 2: A t-curve extends indefinitely in both directions, approaching, but
never touching the horizontal axis as it does so.
• Property 3: A t-curve is symmetrical about 0.
• Property 4: As the number of degrees of freedom becomes larger, t-curves look
increasingly like the standard normal curve.
• Property 5: Every t-score has a certain probability of occurrence in a specific
t-distribution. As such, the values of t which enclose or cut-off given proportions
of the appropriate t-distribution can be calculated.
• z table:
z table vs. t table
 Gives the area above and below each specified value of z.
• t table:
 A different t distribution is defined for each possible number of degrees of
freedom.
 Gives values of t that cut off particular critical areas, for example, the .05 and
.01 levels of significance.
Values of t
Finding the t-value having area 0.05 to it’s right
Intro to Confidence Intervals
• Recall: Although random sampling has no inherent bias, we cannot
expect any given sample to perfectly represent its population. Why?
Sampling error!
• SO: The sample mean will almost always be a different value than the
population mean.
A Confidence Interval Is:
• A score interval calculated by a procedure with a specified probability
of producing an interval containing the parameter (i.e., from the
population).
• Can we make a statement about how confident we are that a sample
mean is close to the (unknown) population mean?
Example:
• 25 people around Windsor are approached at random and asked to
rate how good a job Jean Chretien is doing as Prime Minister, on a
scale of 1 (he stinks) to 20 (he’s great). The mean rating was 8, and
the standard deviation was 7.558. How confident are we that this
mean of 8 is close to the mean of the overall population of Ontario?
SO:
• The sample mean (8) is deemed to be the mean of a distribution which conforms
to the t distribution at df = n-1.
• By choosing t values which enclose a specified proportion of that t distribution,
we can construct an interval of plausible values of .
Don’t worry, it’s not as complicated as that last sentence seemed
• If we choose critical values for t at 0.05 confidence level, there is a
0.95 probability that the score interval we generate will contain .
• This score interval is termed the 95% confidence interval (95% C.I.)
Confidence Limits on Mean
• Sample mean (8) is a point estimate
• We want an interval estimate
 Probability that interval computed this way includes  = 0.95
How do we get the t value?
• n = 25, so df = n-1 = 24
• Look at the t distribution table for df = 24 at 0.05 level of confidence
(for two tails).
• The critical value for t = 2.064
That is to say….
• If we took 100 samples (25 people in each sample) from the same
population, 95% of the samples would produce a mean between 4.88
and 11.12.
• t.01
Our example illustrated…
What if we wanted to be 99% confident?
= 2.797 (1.5116) = 4.23
SO: 8  4.23 = 3.77 to 12.23
SO: If we took 100 samples of 25 people, 99 of the samples would produce a
mean between 3.77 and 12.23.
Things to Remember…
• Other things being equal, increasing the confidence level (say from
95% to 99%) increases the size of the confidence interval. Why?
• Because less certainty (confidence) is associated with greater
precision (smaller interval).
Things to Remember (continued)
• Other things being equal, an increase in the size of the sample
standard deviation increases the size of the confidence interval. Why?
• More variable data indicate more sampling error which in turn means
less certainty can be attached to the accuracy of a particular estimate.
Things to Remember (continued)
• Other things being equal, an increase in the size of the sample
decreases the size of the confidence interval. Why?
• Because larger samples provide more stable (less variable) estimates
which in turn means that on average, sampling error is less and
greater certainty can be attached to the accuracy of an estimate.
One more example:
• 16 University of Windsor students were polled regarding how much
they pay for rent each month, producing a mean of $500.00 a month,
with a standard deviation of $60.00. Compute 95% confidence limits
for the population mean of University of Windsor students.
• t.05 (with df =15) = 2.131
• t.05
= 2.131 (60/4) =
• t.05
= 2.131 (15) = 31.97
Here we go…
SO: 500  31.97 = 468.03 to 531.97
SO: If we took 100 samples of 16 people, 95 of the samples would produce a mean of
$468.03 to $531.97 per month.
•
One sample t-tests: Rationale
Sometimes we know the population mean () of a variable, and we wish to
determine whether the mean of a sample
differs significantly from the
population mean.
Assumptions:
•
•
Normal population or large sample.
The population’s standard deviation is not known.
Why t and not z?: Review
• Gosset noted that when we use the sample’s standard deviation
instead of the population’s (which we do not know), the distribution
changes as a function of sample size. If n is large, it is very close to the
normal distribution. But smaller sample sizes lead to skewed
distributions, which would give us too many “significant” results.
• To compensate, we compare our t-value with it’s own distribution.
• Say we know that the average cell phone user uses 3000 minutes of cellular air
time each year () .
• Dr. Z hypothesizes that business executives spend more time on their cell phone
each year than does the average cell phone user. She interviews a sample of 20
business executives, and finds that they use on average 3500 minutes of cellular
air time each year, with a standard deviation 300 minutes. Did this sample of
business executives use significantly more cellular air time than the average cell
phone user? Test at the .01 level of significance.
Hypothesis testing with the one sample t-test
• We can test the null hypothesis:
• H0: The mean number of cell phone minutes used per year by business
executives does not differ from the mean of the average cell phone user.
Let’s try it!
= 300 / 20 = 300 / 4.4721
= 67. 0826
Calculating t…
One tailed or two tailed?
Do we want to use a two-tailed, a left tailed, or a right-tailed test in
this example?
Is it significant? P values revisited
Refer to the t-table…
• Remember, df = n –1 = 19.
• As mentioned, we’ll use a one-tailed test.
• If we set our level of significance at .01, the critical t-value is 2.539
(this is called tcrit).
• tobt (7.454) > than tcrit (2.539). Therefore: We reject the H0.
• Can we state our conclusion in words?
The size of t and the Decision about H0 are affected
by…..
•
•
•
•
•
The actual obtained difference
The magnitude of sample variance
The sample size
The significance level (.05? .01?)
Whether the test is 1 or a 2 tailed
Underlying Assumptions
Underlying Assumptions
• The one-sample t-test assumes that the raw data were a random
sample – that is, the raw scores must be independent of each other.
• This assumption must not be violated, or the t-test is worthless.
• The one-sample t-test assumes that the dependent variable is
normally distributed in the population.
• This assumption can be (and usually is) violated to some degree – the
t-test is a “robust” test, it tolerates some violation of the normality
assumption.
A second Approach: Confidence Intervals
Re-visited
• An alternative to the one-sample t-test is to calculate confidence
intervals for the sample mean. If the population’s mean falls outside
the sample mean’s confidence interval, H0 is rejected.
• It’s logical: A 95% Confidence Interval suggests that 95% of samples
will produce means within the interval. If the population mean falls
outside the interval, it is significantly different than the sample mean.
Let’s look at our example:
• Student’s t distribution - Example



1. Calculate/state the mean of the sample - 302
2. Calculate/state the standard deviation of the sample - 56
3. Calculate t
And using the CI approach…
Some Examples
• A researcher is concerned that police officers are not in good physical
shape because they eat too many doughnuts. He hypothesizes that
police officers eat significantly more doughnuts than do the
population. If the average Windsorite eats 30 doughnuts every year
(), and a sample of 15 police officers produce a mean of 26
doughnuts a year with a standard deviation of 10, what conclusions
can be drawn at the .05 level of significance?
•
•
•
•
•
•
•
H o: ?
Ha:?
One tail or two tail?
What test do we use?
Formula?
Decision re: the Ho?
Conclusion?
Get to work!
• The average Canadian has an IQ of 100 with a standard deviation of
15. Dr. F hypothesizes that chronic Ecstasy users would lose
intelligence over time. She samples 30 chronic Ecstasy users and
gives them IQ tests. This sample produces a mean IQ of 94 with a
standard deviation of 13.
• What conclusions can be drawn at the .01 level of significance?
• HINT: Watch out for extraneous information!
•
•
•
•
•
•
•
H o: ?
Ha:?
One tail or two tail?
What test do we use?
Formula?
Decision re: the Ho?
Conclusion?
Get to work!
• The average movie-goer spends $8.00 on food each time he or she goes to the
movies. George Lucas thinks he can help theatres sell more food by putting
subliminal messages in Star Wars that show Yoda with a bag of popcorn stating
“May the Corn be with you”. A theatre manager evaluates George Lucas’ claim by
keeping track of how much Star Wars viewers spend on food. His sample of 120
Star Wars viewers generates a mean of $8.75 (spent on food while viewing Star
Wars) with a standard deviation of $0.84. Is George’s subliminal ad campaign
effective?
• What conclusions can be drawn at the .05 level of significance?
•
•
•
•
•
•
•
H o: ?
Ha:?
One tail or two tail?
What test do we use?
Formula?
Decision re: the Ho?
Conclusion?
Get to work!
• The average University of Windsor student spends a total of 25 hours
studying for final exams each semester. The Dean is concerned about
students who live in Residence – she believes they may be partying
too much at the expense of studying. She samples 40 students who
live in Res, and find that they study on average 18 hours for final
exams, with a standard deviation of 4.3 hours. Does the Dean have
reason to be concerned?
• Test at the .01 level of significance.
•
•
•
•
•
•
•
H o: ?
Ha:?
One tail or two tail?
What test do we use?
Formula?
Decision re: the Ho?
Conclusion?
Get to work!
Last one
• Now solve the last problem using the confidence interval approach.
Create 95% and 99% confidence limits.
• Hint: Watch your tails!
And Now the Answers
• The following slides contain the answers to these last few problems.
No peeking until you try the problems and calculate them through to
the end!
Problem #1
• A researcher is concerned that police officers are not in good physical
shape because they eat too many doughnuts. He hypothesizes that
police officers eat significantly more doughnuts than do the
population. If the average Windsorite eats 30 doughnuts every year
(), and a sample of 15 police officers produce a mean of 26
doughnuts a year with a standard deviation of 10, what conclusions
can be drawn at the .05 level of significance?
Problem #1 - Solution
• One tail or two tail? 1 tailed
• What test do we use? t-test – we don’t have the population standard deviation
• n=15, df=14 tcrit.05 = 1.761
• To reject Ho, tobs would have to be > +1.761. Therefore, we retain the Ho, police officers do not eat
significantly more than do the general population.
• Problem #2:
• The average Canadian has an IQ of 100 with a standard deviation of
15. Dr. F hypothesizes that chronic Ecstasy users would lose
intelligence over time. She samples 30 chronic Ecstasy users and
gives them IQ tests. This sample produces a mean IQ of 94 with a
standard deviation of 13.
• What conclusions can be drawn at the .01 level of significance?
• HINT: Watch out for extraneous information!
Problem #2 - Solution
• One tail or two tail? 1 tailed
• What test do we use? z-test – we have the population standard deviation
• 1-tailed test at .01 alpha, so Zcrit.01 = -2.33
• Zobs > Zcrit so we retain the Ho, ecstasy users do not differ in intelligence from the average Canadian.
• P r o b l e m # 3:
• The average movie-goer spends $8.00 on food each time he or she goes to the
movies. George Lucas thinks he can help theatres sell more food by putting
subliminal messages in Star Wars that show Yoda with a bag of popcorn stating
“May the Corn be with you”. A theatre manager evaluates George Lucas’ claim by
keeping track of how much Star Wars viewers spend on food. His sample of 120
Star Wars viewers generates a mean of $8.75 (spent on food while viewing Star
Wars) with a standard deviation of $0.84. Is George’s subliminal ad campaign
effective?
• What conclusions can be drawn at the .05 level of significance?
Problem #3 - Solution
• One tail or two tail? 1 tailed
• What test do we use? t-test – we don’t have the population standard deviation
• n=120, df=119 tcrit.05 = 1.645 (note: use the last row in the table)
• tobs > tcrit so we reject the Ho, the subliminal messages seem to work, Starwars viewers seem to
spend more on food than do other moviegoers.
• P r o b l e m # 4:
• The average University of Windsor student spends a total of 25 hours studying
for final exams each semester. The Dean is concerned about students who live in
Residence – she believes they may be partying too much at the expense of
studying. She samples 40 students who live in Res, and find that they study on
average 18 hours for final exams, with a standard deviation of 4.3 hours. Does
the Dean have reason to be concerned?
• Test at the .01 level of significance.
Problem #4 - Solution
• One tail or two tail? 1 tailed at .01
• What test do we use? t-test – we don’t have the population standard deviation
• n=40, df=39 tcrit.01 = -2.423 (note: use df=40 since there is no row for 39 and 40 is the one right
after)
• tobs < tcrit so we reject the Ho, the dean’s concern does hold up – students in res do seem to study
less than other students.
• Problem #4b:
• Now solve the last problem using the confidence interval approach.
Create 95% and 99% confidence limits.
• Hint: Watch your tails!
Problem #4b Solution