Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

German tank problem wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Welcome to the Unit 5 Seminar for MM305!
I hope you have had a good evening so far.
I have a lot of information to share with you tonight. We may
not be able to go over all of them in our seminar but I have
written them (hopefully) straightforward enough that you can
go over them after our seminar and follow the concept. My
notes will give you the big picture and of course for a little
more detail you also need to read the book along with it.
It is not too late to catch up in this class but you need to hurry
up a little as concepts are going to get a little busier and you
need to spend more time to understand them. It won’t
necessarily get harder, just busier. I don’t mean to scare you.
I just want you to be aware of it so you can plan your time
accordingly.
Need Help?
Please use all the options to get help in this class. You can use
our office hours on Mondays and Wednesdays (8:00 pm to
9:00 pm ET on AIM) to get a one-on-one help.
You can also email me your questions but I really prefer that
you post your questions on the board (under Any Questions
link) so other students can also benefit from the questions and
their answers.
You can also use the NetTutor online tutoring service that is
sponsored by Kaplan. To access their service just click on the
NetTutor icon on your "MyDesk" page.
Anyone is using them on a regular basis and if so, are you
happy with the service? I know I have talked about this before
but it won’t hurt to share it again.
Excel Note:
It seems most of you are using excel and are not experiencing
too many problems. The problems will become a little more
challenging and excel will really help. Do practice though –it
will help.
You can get information about any statistical procedure by
typing the name of the procedure in the HELP command of
Excel. You will get an explanation and example for that
command or procedure (i.e. mean, standard deviation,
regression) .
Definition: Point Estimates
In most cases we don’t know the mean and standard deviation of a
particular parameter of interest (like height) of a large population
(think of census data, for example). So, we get an estimate of those
values by getting a sample from the population and calculating the
mean and standard deviation of that sample.
We call these sample mean and sample standard deviation values
the “Point Estimates” of the population mean and population
standard deviation. (A point estimate is just a single number used to
estimate some parameter of a population.
Definition: Interval Estimates
Since these sample means and standard deviations may not
be very accurate (i.e., the sample may not reflect the good
sample from the population) then we want to set an interval
around the value of sample mean and express that this
interval contains true population mean with a certain degree
of accuracy. This is called confidence interval.
μ
Sampling from a population
Suppose we draw 20 samples from a population and calculate
the mean of each. We would expect only 1 in 20 to be outside
of the interval that is 2 standard deviations above and below
the mean. (The arrow is pointing to the sample where its
mean is outside the confidence interval.)
μ
95% Confidence Interval
Now, suppose we draw a single sample from the population.
Also suppose we then build a 2 standard deviation interval
around the value obtained. In most cases, the true mean of
the population will be within the interval. (The exception
would be the sample that falls outside of the interval around
the true mean.
μ
Example: Confidence Interval
Suppose we observe that, in a sample of 50 commuters, the
average length of travel to work is 30 minutes with a
population standard deviation of 2.5 minutes.
What is the standard error for the sampling distribution?
Answer: 2.5 / sqrt(50) = 0.35
Example: Confidence Interval
Suppose we observe that, in a sample of 50 commuters, the
average length of travel to work is 30 minutes with a
population standard deviation of 2.5 minutes.
What is the standard error for the sampling distribution?
Answer: 2.5 / sqrt(50) = 0.35
To create the 95% confidence interval, you would take 2
standard errors and subtract and add it to the mean.
[ 30 – 0.7, 30 + 0.7] = [29.3, 30.7]
We can be 95% confident that the true population mean is
within that interval.
Confidence Interval
Suppose we observe that, in a sample of 100 cereal boxes,
the average weight of the cereal is 26.8 ounces with a
Population 2 ounces.
Everyone: What is the standard error for the sampling
distribution?
Confidence Interval
Suppose we observe that, in a sample of 100 cereal boxes,
the average weight of the cereal is 26.8 ounces with a
Population 2 ounces.
Everyone: What is the standard error for the sampling
distribution?
Answer: 2 / √(100) = 0.2
Confidence Interval
Suppose we observe that, in a sample of 100 cereal boxes,
the average weight of the cereal is 26.8 ounces with a
Population 2 ounces.
Everyone: What is the standard error for the sampling
distribution?
Answer: 2 / √(100) = 0.2
Everyone: Construct a 95% confidence interval for the mean.
Confidence Interval
Suppose we observe that, in a sample of 100 cereal boxes,
the average weight of the cereal is 26.8 ounces with a
Population 2 ounces.
Everyone: What is the standard error for the sampling
distribution?
Answer: 2 / √(100) = 0.2
Everyone: Construct a 95% confidence interval for the mean
Answer: 2 standard errors = 0.4 so subtract it from and add it
to the mean.
[ 26.8 – 0.4, 26.8 + 0.4] = [26.4, 27.2]
We can be 95% confident that the true population mean is
within that interval.
Excel: Confidence Interval
Suppose we observe that, in a sample of 50 commuters, the
average length of travel to work is 30 minutes with a
population standard deviation of 2.5 minutes.
Click on a cell and then type =CONFIDENCE(0.05,2.5,50) in
the Excel input box and click on ok. We only use the sample
size and standard deviation in this command. You will get the
value of 0.692951. 30 - 0.7 = 29.3 and 30 + 0.7 = 30.7
So, the expression =CONFIDENCE(0.05,2.5,50) equals
0.692951 or rounded to 0.7. Therefore, the interval of the
average length of travel to work (30 minutes) is calculated as:
30 +/- 0.7 minutes. This results in an interval of 30 + 0.7
= 30.7 and 30 – 0.7 = 29.3.
We are 95% confident that the commute time interval is from
29.3 to 30.7 minutes
Everyone: Use Excel to determine a Confidence Interval
Suppose we observe that, in a sample of 100 cereal boxes,
the average weight of the cereal is 26.8 ounces with a
Population standard deviation of 2 ounces. Use excel to
construct a 95% Confidence Interval for the mean.
Everyone: Use Excel to determine a Confidence Interval
Suppose we observe that, in a sample of 100 cereal boxes,
the average weight of the cereal is 26.8 ounces with a
Population standard deviation of 2 ounces. Use excel to
construct a 95% Confidence Interval for the mean.
Answer:
=CONFIDENCE(0.05,2,100)
Using Excel, the above is equal to 0.391993. The confidence
interval is therefore:
[ 26.8 – 0.39, 26.8 + 0.39 ] = [ 26.41, 27.19 ]
Hypothesis Testing
Basically, t test statistic and Z test statistic are used in
Hypothesis testing to reject or accept a claim. The claim is
usually Null Hypothesis (called H0) and if we reject H0 we
automatically accept Alternative Hypothesis (called H1)
because that is the only other option (kind of like plan B)
available to us.
Null and Alternative hypothesis are kind of complement of
each other. For example, if Null hypothesis claims that mean
value of something is less than or equal to a certain value
(book call this directional) then alternative would be mean
value is greater than that value. Or, if Null says mean is
equal to a certain value then Alternative says mean is NOT
equal to that value. Book call it non directional because it can
go to either direction.
Classical Approach
Calculate a test statistic, t or Z. Formulas for calculating t and
Z are in the book.
Z test statistic is given on page 318.
T test statistic is given on page 328.
There are 3 possible tests:
1. Right tailed test (described on pages 320 and 321)
2. Left tailed test (similar to a right tailed test)
3. Two tailed test (described on pages 318 and 319)
The flow chart on page 316 is a great help in guiding you on
which method to use for any particular Hypothesis situation.
Difference Between T and Z tests
The only major difference to find a t value from the table t in
the back of the book, you need to take TWO things to the t
table.
One is alpha (that you already know about and is usually
given) and the other element is DEGREE of FREEDOM. Degree
of freedom is just a number that helps us to have a more
accurate value for our t statistic.
DF (degree of freedom) value is sample size, n, minus 1 (n
1). It is basically another factor that comes to play to bring
accuracy to the calculations based on different sample sizes.
That is all there is into degree of freedom for us!
Example: t table
The T-table is located just before the Z-table on the inside
cover of the book.
As book shows in the back of the book in Table t, if you
are looking for a t value when alpha is 5% (one-sided or one
tailed test) and sample size is 74, you go to the Table and
look up the t (0.05,73). The t value when sample size is 73
and alpha is 0.05 is 1.666. Let me know if you are not getting
this value from the t table in the back of the book right before
the Z table.
T table
Just remember that the values in the body of the table
represent the shaded area (blue) in the t distribution as it is
shown in the back of the book table t.
If sample size approaches the value of infinity then t
distribution approaches standard normal distribution and the
two curves become identical. So, for example, Z of alpha
0.05 = 1.645 which is the same value as t (df=infinity, alpha
0.05) = 1.645.
In Practice
The t-test is the more practical case as we usually don't have
the standard deviation of a population parameter. If you
recall, when we know the standard deviation of the population
we use Z test. Now, we use student t test (which has a
formula which is very similar to Z test formula) because
standard deviation of population is not known.
The only difference is that we use standard deviation of the
sample, s, in the formula, instead of sigma. Calculation and
conclusion of t test is very similar to the calculation and
conclusion of a Z test. We can call these values -calculated t
or Z
Example: t-test
A study of the process costs indicates that the average weight
of the diamonds must be greater than 0.5 karat in order that
the process be operated at a profitable level. Do the six
diamond-weight measurements, 0.46, 0.61, 0.52, 0.48, 0.57,
0.54 present sufficient evidence to indicate that the average
weight of the diamonds produces by the process is in excess
of 0.5 karat?
We use t test because sample size is 6 (less than 30). It is a
one-sided test because question is about the value being
“greater than”.
H0: population average weight of the diamonds (mu) = 0.5
H1: population average weight of the diamonds (mu) > 0.5
Example t-test
We decide that the value of alpha to be 0.05 (rejecting top
5% of the t values). The degree of freedom is sample size
minus 1 so degree of freedom (df) for this problem is 6-1 = 5.
The Critical t value has the format of t alpha, df. So, for this
problem, it is: t 0.05, 5= 2.015 (from t table in the back of
the book).
That is, we will reject the Ho if the calculated t (calculated
using the formula) is greater that maximum acceptable table t
which is 2.015 (for this problem). In that case, we say the
calculated t is too large to be accepted according to our 5%
policy.
Example t-test
So, the Rejection Region for alpha = 5% and (6-1)= 5
degrees of freedom is when calculated t (using the formula) is
greater than 2.015 (look at the t distribution figure on the top
of t –table in the back of the book. The red area is the
rejection area).
If you use the t formula for this problem you will find
calculated t value to be 1.31. In this case calculated t is less
than critical t (table t), therefore, we do not reject the H0.
This implies that the data do not present sufficient evidence to
indicate that the mean diamond weight exceeds 0.5 karat.
P-Value Approach
The calculations, the meaning of alpha and P-value and
conclusion process are the same in both methods but
formulas are a little different. We will get familiar with
getting a t value from the table in our seminar a little later
on tonight.
Steps are outlined on pages 322 and 323.
p-value
The p-value is the probability in the “tail” area. In the
classical approach, you either reject of fail to reject. It
doesn’t give any information about whether a different value
of alpha would have given the opposite conclusion.
In the p-value approach, you find the level of alpha at which
the null hypothesis would be rejected. For example, if the p
value is .2 and it is a one tailed test, it indicates that the
probability of getting a sample with the stated mean is twenty
percent. (Pretty high and you would not want to reject the
null.) If the p-value is 0.0001, the probability that the sample
is drawn from a population stated in the null hypothesis is
very, very small. (In this case, you would reject the null.)
p-value
The example shown is for a right tailed test.
test statistic value
p-value
The example shown is for a left tailed test.
test statistic value
p-value
The example shown is for a two tailed test. (Find the area in
one tail, and double it.)
test statistic value