Download Statistics Unit 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Statistics Chapter 10b
Introduction to Inference
Time: 3 weeks
Confidence Intervals Vs. Tests of Significance
The goals:
CI:
Tests of Significance:
Star Free Throw Shooter
I claim that I make 80% of my free throws. To test my claim, you ask
me to shoot 20 free throws. I make only 8 of the 20. “Aha!” you say.
“Someone who makes 80% of his free throws would almost never
make only 8 out of 20. So I don’t believe your claim.”
You can say how strong the evidence against my claim is by giving the
probability that I would make as few as 8 out of 20 shots if I really
make 80% in the long run.
This probability is 0.0001. The small probability convinces you that
my claim is false.
The basic idea is that an outcome that would rarely happen if a claim
were true is
that the claim is not
true.
Hypothesis Testing
A hypothesis is a claim or statement about the value of a single
.
Fritos
For example, Frito-Lay claims that an average bag of Ruffles weighs
14 oz, so we choose a random sample of bags and see if the data
support the hypothesis that  =14.
Not guilty means
Guilty means
Null & Alternative Hypothesis
We assume one of the hypotheses to be true. First, the null
hypothesis or the hypothesis that says that there is
in the population.
or
We reject the null hypothesis in favor of the alternative hypothesis
only if
the
null hypothesis.
Possible Outcomes:
 There are 2 possible outcomes of a test of hypothesis:
.
 We reject Ho when
.
 If the sample doesn’t contain such evidence,
.
 In other words, we won’t be proving the “innocence”,
.
General Format:
Ho:
Ha:
P-Value
 ...is the probability that the observed statistic value (or an even
more extreme value) could occur if the null model were
correct.
 If the P-value is small enough,
.
 How small is small enough?
 Rule of thumb:
.
 Which is saying that chance alone would rarely produce such a
result
Formal Hypothesis Testing in 5 Easy Steps
 Name the test, define the variable, and set the  level.
 State the hypothesis
 State and check the conditions
 Calculate the test statistic and P-value (include a picture)
 Make a decision based on the P-value (in context)
2
Errors in Hypothesis Testing
Type I error:
Type II error:
Bolt Problem
A machine is set to produce bolts with an average diameter of 1 cm.
Every hour a sample is inspected and the machine is adjusted if there
is convincing evidence that the average diameter is not 1 cm. State the
hypothesis and describe both kinds of errors for this testing procedure.
Pregnancy Test
A pregnancy test is designed so that it will correctly detect a
pregnancy 99% of the time and correctly determine that a person isn’t
pregnant 90% of the time. If the null hypothesis is “not pregnant”,
describe both types of errors and find their probabilities.
 and 
If  = .05 , then we are using a testing procedure that will make a
Type I error about 5% of the time. That is, if we were to take many
samples and perform many tests, in about 5 out of every 100 tests, we
would reject the null hypothesis when it is actually true.
If the value of  goes down, the value of  goes up. (they are inversely
related)
3
Choosing 
 Choose largest  value tolerable (between .01 and .10)
 Judicial system
 Pregnancy test
Antibacterial Cream
We are testing a new cream on a small cut. We know from
previous research that with no medication, the mean healing time
is 7.6 days, with a standard deviation of 1.4 days. The claim we
want to test is that the new formulation speeds healing. We will
use a 5% significance level.
Hypothesis Testing in 5 Easy Steps
 Name the test, define the
variable, and set the  level.
 State the hypothesis
 State and check the
conditions
 Calculate the test statistic and
P-value (include a picture)
 Make a decision based on the
P-value (in context)
Procedure: Cut 25 volunteer college students and apply the new
formula to the wound. The mean healing time for these subjects is  =
7.1 days. We will assume that  = 1.4 days.
Hypothesis Tests for a Population Proportion
According to an article in the San Gabriel Valley Tribune (2-1303), “Most people are kissing the ‘right way’.” That is,
according to the study, the majority of couples tilt their heads to
the right when kissing.
Define p =
A researcher observed 124 couples kissing in various public places and
found that 83/124 (66.9%) of the couples tilted to the right.
Is this convincing evidence that p > .5?
4
What is the probability that we get a sample proportion this high by
random chance, assuming the null hypothesis is true?
Class
Activity
Perform the simulation:
0-4 = kiss to the right
5-9 = kiss to the left
For each run we will generate 124 integers from 0-9 to
represent the 124 observed couples. We will then count the number of
digits from 0-4. Finally, we will compute ê, the sample proportion of
couples that tilt to the right.
randInt(0,9,124)L1 then set window x:scale to 5
graph the histogram then trace
Can we reject the null hypothesis and conclude that the majority of
couples do tilt to the right when kissing?
P-Value
 The probability that we get an observed value as or more
extreme as the one we observed (assuming the null hypothesis
is true) is called a p-value
 In the previous kissing problem, the p-value was
.
What if the p-value for the kissing problem was .23 instead of 0?
Likely vs. Unlikely to Happen by Random Chance
What is the cut-off?
Calculating P-values
In the kissing example, we start by assuming p = .5
Test conditions:
P(p>.669) =
5
Use standard deviation of sampling
distribution:
σ ê =  p (1-p) / n
1-Sample z Test for Population Proportion
 Using the following conditions:
 Random sample from population of interest
 Large sample size: np > 10 and n(1-p) > 10
 Note: when checking conditions and calculating σê, we
always use the true value (p) if we know it. Since we assume a
value for p (Ho) when doing a hypothesis test, we will always
use this value.
 With confidence intervals, we do not make any assumptions
about the true value of p, so we have to use the value of ê to
estimate p
Kissing Example Revisited
Hypothesis Testing in 5 Easy Steps
 Name the test, define the
variable, and set the α level.
 State the hypothesis
 State and check the
conditions
 Calculate the test statistic and
P-value (include a picture)
 Make a decision based on the
P-value (in context)
Eating Alone
Are women less likely to eat alone? Suppose that a restaurant manager
observed people eating alone at his restaurant over several days. Of
the 48 solo eaters he observed, 20 were women. Does this data give
evidence at the .01 level that women are less likely than men to eat
alone?
6
Teenage Births
According the National Center for Health Statistics, 12.3% of all births
in the US were to teenagers in 1999. To see if this percentage is the
same in California, a random sample of 1000 CA births were
investigated and 111 were to teenage mothers. Can we conclude that
the percentage of teenage births is different in CA at the 10%
significance level?
Statistically Significant
 When the results of a study are unlikely to happen by chance
alone.
 Whenever we reject the null hypothesis, we have statistically
significant results.
 This does not mean that the results are also practically
significant.
 If results are practically significant, they usually lead to a
change of policy.
 Caution: when the sample size is really large, even very small
differences will give significant results
7
Power of a Test
The power of a test is the probability of correctly rejecting Ho and the
alternative is really true. That is, the probability of rejecting Ho when
it is false.
Decision & Action Decide there is no problem (Ho is true)
Decide that there is a problem (Ho is false)
100% alpha
Type I risk ():
•chance of acquitting an innocent
false alarm risk
defendant
•risk of crying wolf when there isn’t one
Do not reject Ho •Quality acceptance sampling; chance of •risk of convicting an innocent defendant
accepting a good lot
There isn’t a
•quality acceptance sampling; risk of
•SPC: chance of calling the process in
problem; the
rejecting a good lot
situation is as it control when it is
•SPC: risk of calling the process out of
•DOE: conclude that there is no difference control when it is in control
should be.
between the treatments when there isn’t •Design of experiments (DOE); risk of
concluding that there is a difference
between the treatments when there isn’t
Type II risk ():
Power (1-):
Risk of missing the problem
A test’s ability to detect a real problem, or
•Risk of not seeing the wolf
difference
Reject Ho.
•risk of acquitting a guilty defendant
•chance of seeing the wolf
There is a problem; •quality acceptance sampling; risk of
•chance of convicting a guilty defendant
the situation
shipping a bad lot
•quality acceptance sampling; chance of
requires
•SPC; risk of calling the process in control rejecting a bad lot
adjustment.
when it is out of control
•SPC: chance of calling the process out of
•DOE: chance of missing a difference
control when it is
between the treatments
•DOE: chance of detecting a difference
between treatments
If Ho is false, P(Type II error) =
The greater the power, the greater the chance of detecting the truth.
8
High School Diploma
Suppose that in the 1990 Census, 83% of Californians had a high
school diploma. The Department of Education believes this
percentage has gone up since then so they commission a survey to
estimate the true percentage. Define the parameter of interest, state the
hypotheses, and describe each kind of error in context. Describe the
power in context.
What Affects Power?
 The power of a test will be higher if:
 1. you increase the significance level (α)
 2. you increase the sample size (n)
 3. there is large discrepancy between the null
hypothesis and true value.
 4. there is little variability in the population
 Which can you control?
 What are the disadvantages of options 1 & 2?
 Increases the probability of type I error
 Increases the cost and time required for the study
9
TBBMC Problem
Can a 6-month exercise program increase the total body bone mineral
content of young women? A team of researchers is planning a study to
examine this question. Based on the results of a previous study, they
are willing to assume that σ = 2 for the percent change. A change in
TBBMC of 1% would be considered important, and the researchers
would like to have a reasonable chance of detecting a change this large
or larger. Is 25 subjects a large enough sample for this project?
Summary
What is the relationship between confidence intervals and hypothesis
tests?
Hypothesis tests are designed to answer the questions:
Confidence intervals are designed to answer the question:
Chapter 10b problems: # 27, 29, 30, 31, 33, 38, 42, 44, 49, 58, 61, 63, 65, 66, 68, 73, 74
10
In Quest of the Perfect Hypothesis Test
State what kind of distribution is appropriate for the data
Normal
t
Binomial
Geometric
Chi-Square
Other
AND state what you are comparing
A population to a sample
...called a 1 sample
distribution (fill in blank with
word “normal”, “z”, Binomial”, etc.)
OR are you comparing two samples to each other
...called a 2 sample
distribution
Choose type of test:
1 tailed test (< or >) or a 2 tailed test (not equal)
a sketch is a good idea
Use correct format of Ho and Ha
Ho:
=
.
Ha:
>, <, or 
.
Or state null and alternative hypothesis using sentences.
In general, use Greek symbols in hypothesis, not regular alphabetic symbols
****if you are going to set an α , do it HERE
Correct calculation of standard deviation
This is determined by knowing which distribution you choose.
Correctly calculate test statistic
The z score, or t, or chi-square, etc
Correct P-value (Doubled for two tailed)
Correct decision
In general, a small P-value indicates you should reject Ho. A large P-value
indicates you should fail to reject.
Correct and complete statement of conclusion, restating the question.
Example of a perfect conclusion:
“I reject the null hypothesis. Since the possibility of having a sample mean of 20.1 is so
small when the population mean is 18.2, (P-value of .0002) I conclude that Logan High’s
mean ACT is truly different from the national mean ACT.”
Or:
“I fail to reject the null hypothesis. There is not enough evidence to show that the
sample mean of 18.1 is different from the population mean of 18.2. My P-value of .31
indicates I could have gotten results this close to the population mean in random samples
31% of the time.”
Always use excellent organization, clarity and readability.