Download Extra PowerPoint Significance Testing Means and Proportions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Tests of Significance
Statistical Significance

An observed effect so large that it would
rarely occur by chance is called
statistically significant.
The Test of Significance

The test of significance asks the question:
 “Does
the statistic result from a real difference
from the supposition”
 or
 Does the statistic result from just chance
variation?”
Example
I claim that I make 80% of my free throws.
 To test my claim, you ask me to shoot 20
free throws.
 I make only 8 out of 20.
 You respond: “I don’t believe your claim. It
is unlikely that an 80% shooter makes only
8 of 20.”

Significance Test Procedure

Step 1: Define the population and parameter
of interest. State null and alternative
hypotheses in words and symbols.
 Population:
My free throw shots.
 Parameter of interest: proportion of made shots.
 Suppose I am an 80% shooter This is a
hypothesis, and we think that it is false. So we’ll
call it the null hypothesis, and use the symbol H0.
(Pronounced: H-nought) H0: p=.8
 You are trying to show that I’m worse than a 80%
shooter. Your alternate hypothesis is: Ha: p<80%.
Significance Test Procedure

Step 2: Choose the appropriate inference
procedure. Verify the conditions for using
the selected procedure.
 We
are going to use the Binomial Distribution:
 Each trial has either success or failure.
 Set number of trials.
 Trials are independent.
 Probability of success is constant.
Significance Test Procedure

Step 3: Calculate the P-value. The P-value
is the probability that our sample statistics
is that extreme assuming that H0 is true.
 Look
at Ha to calculate “What is the
probability of making 8 or fewer shots out of
20?”
 X is the number of shots made.
 P(X<8)=.0001017=binomcdf(20,.8,8)
Significance Test Procedure

Step 4: Interpret the results in the context of
the problem.
 You
reject H0 because the probability of being an
80% shooter and making only 8 of 20 shots is
extremely low. You conclude that Ha is correct;
the true proportion is less than 80%.

There are only two possibilities at this step
 “We
reject H0 because the probability is so low.
We accept Ha.”
 “We fail to reject H0 because the probability is not
low enough.”
Significance Test Procedure
1.
2.
3.
Identify the population of interest and the
parameter you want to draw conclusions
about. State null and alternate hypotheses.
Choose the appropriate procedure. Verify
the conditions for using the selected
procedure.
If the conditions are met, carry out the
inference procedure.


4.
Calculate the test statistic.
Find the P-value
Interpret your results in the context of the
problem
Example
Diet colas use artificial sweeteners to avoid
sugar. These sweeteners gradually lose their
sweetness over time. Manufacturers
therefore test new colas for loss of sweetness
before marketing them. Trained tasters sip the
cola along with drinks of standard sweetness
and score the cola on a “sweetness score” of
1 to 10. The cola is then stored for a month at
high temperature to imitate the effect of four
months’ storage. Each taster scores the cola
again after storage.
 What kind of experiment is this?

Example
Here’s the data:
 2.0, .4, .7, 2.0, -.4, 2.2, -1.3, 1.2, 1.1, 2.3
 Positive scores indicate a loss of
sweetness.
 Are these data good evidence that the
cola lost sweetness in storage?

Significance Test Procedure

Step 1: Define the population and parameter
of interest. State null and alternative
hypotheses in words and symbols.
 Population:
Diet cola.
 Parameter of interest: mean sweetness loss.
 Suppose there is no sweetness loss (Nothing
special going on). H0: µ=0.
 You are trying to find if there was sweetness loss.
Your alternate hypothesis is: Ha: µ>0.
Significance Test Procedure

Step 2: Choose the appropriate inference
procedure. Verify the conditions for using the
selected procedure.
 We
are going to use sample mean distribution:
 Do the samples come from an SRS?
 We don’t know.
 Is the population at least ten times the sample size?
 Yes.
 Is the population normally distributed or is the sample size
at least 25.
 We don’t know if the population is normally distributed,
and the sample is not big enough for CLT to come into
play.
Significance Test Procedure

Step 3: Calculate the test static and the Pvalue. The P-value is the probability that our
sample statistics is that extreme assuming
that H0 is true.
x-bar=1.02, σ=1
 Look at Ha to calculate “What is the probability of
having a sample mean greater than 1.02?”
 z=(1.02-0)/(1/root(10))=3.226,
 P(Z>3.226) =.000619=normalcdf(3.226,1E99)
 µ=0,
Significance Test Procedure

Step 4: Interpret the results in the context of
the problem.
 You
reject H0 because the probability of having
a sample mean of 1.02 is very small. We
therefore accept the alternate hypothesis; we
think the colas lost sweetness.
Assignment

Against All Odds Video www.learner.org,
Episode 20.
Making Sense of
Statistical Significance
Choosing a level of significance
(alpha level)

How plausible is H0?
 Depending
on H0 plausibility, you may choose
a smaller alpha.
 If H0 is very plausible, you will need to have
collect “more” evidence to reject it.
What are the consequences of
rejecting H0?

If rejecting H0 would
 costs
lots of money
 possibly cost lives
 costs jobs

then alpha is usually very small
Fishing for significance



Let’s say were trying to find a connection
between eating habits and intelligence.
Choose 40 foods, and assign people to increase
the amount of the foods they eat, and see if
there are any foods that make people smarter.
Of the 40 foods, we find that peeps and green
beans make you smarter with alpha=.05. Is this
a problem?
Inference for the
Mean of a Population
The t statistic


The t statistic is used
when we don’t know the
standard deviation of the
population, and instead
we use the sample
distribution as an
estimation.
The t statistic has n-1
degrees of freedom (df).
x 
t
s/ n
The t statistic




The t statistic is bigger
than the z statistic.
We say that t distribution
is a more conservative
distribution.
There is more area in the
tails.
The t statistic has n-1
degrees of freedom.
CI  x  t
*
s
n
The t statistic



In statistical tests of
significance, we still have
H0 and Ha.
We need to provide the
mu in the calculation of
the t statistic.
Looking at the t table is
fundamentally different
than the z table.
x 
t
s/ n
Example: Mr. Young Mopping



Let’s suppose that Mr.
Young has been told that
he should mop by 25
after 1.
We collect 12 samples
with an average 27.58
minutes after 1 p.m. with
a standard deviation of
3.848 minutes.
Is this evidence that his
true mean is after 1:25?
x 
t
s/ n
Step 1: Mr. Young Mopping

Population of interest:
 Mr.

Young’s mopping
Parameter of interest:
 average
time of arrival
during mopping

Hypothesis
 H0:
µ=25
 Ha: µ>25
x 
t
s/ n
Step 2: Mr. Young Mopping


We are using 1 sample t-test?
SRS?


No. Proceed with caution.
Normality?

Big sample size (> 40)
 Sample is somewhat normal
because the sample
distribution is single peaked,
no obvious outliers.

Population size is at least 10
times the sample size?

We assume that Mr. Young
has done a lot of mopping
x 
t
s/ n
Step 3: Mr. Young Mopping

Calculate the test
statistic, and calculate
the p-value.
27.58  25
t
3.848 / 12
 2.322
P(t  2.322) is between .025 and .02
Inference for a
Population Proportion
Parameters vs Statistics

Parameters
 Mean



Deviation
σ
 Proportion

p
Statistics
 Mean
µ
 Standard

x-bar
 Standard

Deviation
s
 Proportion

p-hat
What we know about inference

We are trying to make sense about what is
happening at the population level by looking at
sample data
 Step
1: “What is the population and the parameter of
interest?”

We make assumptions in the form of H0
 Step

1: “What is H0?”
We need to know about the distribution of the
sample statistic
 Step
2: “Is the distribution of sample means normal?”
Our inferential work so far…

Has been about the
distribution of sample  x  

means and the
x 
distribution of the
n
difference of sample 
x1  x2  1   2
means.
 12  22
 x1  x2 

n1 n2
x 
z
 n
x 
t
s n
t
x1  x2
s12 s22

n1 n2
But what about proportions?

We learned in
Chapter 9 about the
distribution of sample
proportions.
 pˆ  p
 pˆ 
pq
n
But what about proportions?

We know that the
distribution of
sample
statistic-parameter
Test statistic =
proportions is
standard dev. of statistic
approximately
p̂  p
z
normal when
pq
these conditions
n
are met…
 np>10
 nq>10
Simulation
A recent study concluded that 25% of all
U.S. teenage females have a STD.
 Simulate sampling 500 randomly chosen
teenage females using…

 randBin(500,.25)

Simulate finding the sample proportion by
using…
 randBin(500,.25)/500
Test of significance
(This is made up) A recent sample of 500
female teenagers from southeastern
Oakland county found the 22% have an
STD.
 Is this strong evidence to suggest that
teenage females from SE Oakland county
have a lower infection rate than the
national average?

1: Population, Parameter of Interest,
H0 and Ha
2: Procedure Name & Conditions
3: Calculations
4: Interpret
Confidence Intervals
CI  statistic  critical value   standard dev. of statistic 
CI  pˆ  z
*
ˆˆ
pq
n
Calculate the Confidence Interval
1: Population & Parameter of Interest
2: Procedure Name & Conditions
3: Calculations
4: Interpret