Download t - Wharton Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Transcript
Lecture 4
• Chapter 11 wrap-up
• Chapter 12.2 - Inference about the mean
when the s.d. is unknown
• Chapter 12.3 – Inference about a population
proportion
Hypothesis Testing – Basic Steps
1. Set up alternative and null hypotheses
2. Calculate test statistic, e.g. z-score
3. Find critical values and compare the test
statistic to critical value (rejection region
method) or find p-value (p-value method)
4. Make substantive conclusions.
Right-, Left, Two-Sided Tests
• Right-sided:

H1 :    0 ; xL   0  z
; rej. : x  xL
n
• Left-sided:

H1 :    0 ; xL   0  z
; rej. : x  xL
n
• 2-sided:

 xL 

x

x
  
L
H1 :    0 ;      0   z
; rej. : 

 xL 
 
2
n
 x  xL 
Summary: Steps in Testing
• Determine H 0 (  0 ) and H1 (right//left//2sided),
and decide on a significance level .
( )
• Rejection region method: calculate x and x L  
x  xL // x  xL // x  xL
reject if
• P-value method: calculate z  ( x  0 ) /( / n )
=> P(Z>z) // P(Z<z) // P(|Z|>|z|) from z-tables,
or “Prob>z” // ”Prob<z” // “Prob>|z|” from JMP;
reject if p-value   .
• Interpret the result and tell a story.
Relationship Between CIs and
Hypothesis Tests
• There is a duality between confidence intervals
and hypothesis tests
• We can construct a level  hypothesis test based
on a level 100(1   )% confidence interval by
rejecting H 0 :   0 if and only if  is not in the
confidence interval
• We can construct a level100(1   )% confidence
interval based on a level  hypothesis test by
including  in the confidence interval if and only
if the test does not reject H 0 :   0



Calculation of Type II error
1. State alternative for which you want to find
P(Type II error).
2. Find rejection region in terms of unstandardized
statistic (sample mean)
3. Find the probability of the sample mean falling
outside the rejection region if the alternative
under consideration is true (use standardization
relative to the alternative hypothesis mean to
calculate this probability).
Summary: Power Calculations
• Works only for the rejection region method,
and we don’t do it for 2-sided tests.
• Calculate
for level-  test.
xL  1
z
/ n
• Right-sided: P(Z<z) from z-table; 
=P(Z<z)
• Left-sided: P(Z>z) from z-table; 
=P(Z>z)
Frequent z -values

0.10
0.05
0.025
0.01
0.005
z
1.28
1.64
1.96
2.33
2.58
Practice Problems
• 11.68,11.84,12.40,12.46
Chapter 12
• In this chapter we utilize the approach
developed before to describe a population.
– Identify the parameter to be estimated or tested.
– Specify the parameter’s estimator and its
sampling distribution.
– Construct a confidence interval estimator or
perform a hypothesis test.
12.2 Inference About a Population
Mean When the Population Standard
Deviation Is Unknown
Recall that when  is known we use the following
statistic to estimate and test a population mean
z
x

n
When  is unknown, we use its point estimator s,
and the z-statistic is replaced then by the t-statistic
t-Statistic
x
t
s/ n
• When the sampled population is normally distributed, the t
statistic is Student t distributed with n-1 degrees of
freedom.
s
• Confidence Interval: x  t / 2,n1 n
where t / 2,n1 is
the  / 2 quantile of the Student t-distribution with n-1
degrees of freedom.
t-Statistic
x
t
s/ n
• When the sampled population is normally distributed, the t
statistic is Student t distributed with n-1 degrees of
freedom.
s
• Confidence Interval: x  t / 2,n1 n
where t / 2,n1 is
the  / 2 quantile of the Student t-distribution with n-1
degrees of freedom.
The t - Statistic
t
The t distribution is mound-shaped,
and symmetrical around zero.
d.f. = v2
v1 < v2
d.f. = v1
0
x
s
n
The “degrees of freedom”,
(a function of the sample size)
determine how spread the
distribution is (compared to the
normal distribution)
A = .05
tA
t.100
t.05
t.025
t.01
t.005
3.078
1.886
.
.
1.325
6.314
2.92
.
.
1.725
12.706
4.303
.
.
2.086
31.821
6.965
.
.
2.528
.
.
.
.
.
.
.
.
.
.
200
1.286
1.282
1.653
1.645
1.972
1.96
2.345
2.326
63.657
9.925
.
.
2.845
.
.
2.601
2.576
Degrees of Freedom
1
2
.
.
20

Testing  when  is unknown
• Example 12.1
– In order to determine the number of workers required to
meet demand, the productivity of newly hired trainees
is studied.
– It is believed that trainees can process and distribute
more than 450 packages per hour within one week of
hiring.
– Fifty trainees were observed for one hour. In this
sample of 50 trainees, the mean number of packages
processed is 460.38 and s=38.82.
– Can we conclude that the belief is correct, based on the
productivity observation of 50 trainees?
Checking the required conditions
• In deriving the test and confidence interval, we have made
two assumptions: (i) the sample is a random sample from
the population; (ii) the distribution of the population is
normal.
• The t test is robust – the results are still approximately
valid as long as the population is not extremely nonnormal.
Also if the sample size is large, the results are
approximately valid.
• A rough graphical approach to examining normality is to
look at the sample histogram. Distributions
Packages
350
400
450
500
550
JMP Example
• Problem 12.45: Companies that sell groceries over
the Internet are called e-grocers. Customers enter
their orders, pay by credit card, and receive
delivery by truck. A potential e-grocer analyzed
the market and determined that to be profitable the
average order would have to exceed $85. To
determine whether an e-grocer would be profitable
in one large city, she offered the service and
recorded the size of the order for a random sample
of customers. Can we infer from the data that egrocery will be profitable in this city at
significance level 0.05?
12.3 Inference About a Population
Variance
• Sometimes we are interested in making
inference about the variability of processes.
• Examples:
– The consistency of a production process for
quality control purposes.
– Investors use variance as a measure of risk.
• To draw inference about variability, the
parameter of interest is 2.
12.3 Inference About a Population
Variance
2
• The sample variance s is an unbiased, consistent
and efficient point estimator for 2.
(n  1)s 2
• The statistic
has a distribution called
2

Chi-squared, if the population is normally
distributed.
 
2
d.f. = 5
(n  1) s

2
2
d.f. = 10
d. f .  n 1
Confidence Interval for
Population Variance
• From the following probability statement
P(21-/2 < 2 < 2/2) = 1-
we have (by substituting 2 = [(n - 1)s2]/2.)
(n  1)s 2
 2 / 2
 2 
(n  1)s 2
12 / 2
Testing the Population Variance
• Example 12.3 (operation management application)
– A container-filling machine is believed to fill 1 liter
containers so consistently, that the variance of the
filling will be less than 1 cc (.001 liter).
– To test this belief a random sample of 25 1-liter fills
was taken, and the results recorded (Xm12-03).
s2=0.8659.
– Do these data support the belief that the variance is less
than 1cc at 5% significance level?
– Find a 99% confidence interval for the variance of fills.
JMP implementation of twosided test
Distributions
Fills
1001.5
1001.0
1000.5
1000.0
999.5
999.0
998.5
998.0
Test Standard Deviation=value
Hypothesized Value
1
Actual Estimate
0.93054
df
24
ChiSquare
Test Statistic
Prob > |ChiSq|
Prob < ChiSq
Prob > ChiSq
20.7816
0.6969
0.3484
0.6516
12.4 Inference About a
Population Proportion
• When the population consists of nominal data (e.g.,
does the customer prefer Pepsi or Coke), the only
inference we can make is about the proportion of
occurrence of a certain value.
• When there are two categories (success and
failure), the parameter p describes the proportion of
successes in the population. The probability of
obtaining X successes in a random sample of size n
from a large population can be calculated using the
binomial distribution.
12.4 Inference About a
Population Proportion
• Statistic and sampling distribution
– the statistic used when making inference about p is:
x
p̂  where
n
x  the number of successes .
n  sample size .
– Under certain conditions, [np > 5 and n(1-p) > 5],
p̂ is approximately normally distributed, with
 = p and 2 = p(1 - p)/n.
Testing and Estimating the
Proportion
• Test statistic for p
Z
pˆ  p
p(1  p) / n
where np  5 and n(1  p)  5
• Interval estimator for p (1- confidence
level)
p̂  z  / 2 p̂(1  p̂) / n
provided np̂  5 and n(1  p̂)  5
Testing the Proportion
• Example 12.5 (Predicting the winner in
election day)
– Voters are asked by a certain network to participate
in an exit poll in order to predict the winner on
election day.
– The exit poll consists of 765 voters. 407 say that
they voted for the Republican network.
– The polls close at 8:00. Should the network
announce at 8:01 that the Republican candidate will
win?
Selecting the Sample Size to
Estimate the Proportion
• Recall: The confidence interval for the proportion
is
pˆ  z / 2 pˆ (1  pˆ ) / n
• Thus, to estimate the proportion to within W, we
can write
W  z / 2 pˆ (1  pˆ ) / n
• The required sample size is:
 z / 2 pˆ (1  pˆ )
n
W




2
Sample Size to Estimate the
Proportion
• Example
– Suppose we want to estimate the proportion of
customers who prefer our company’s brand to within
.03 with 95% confidence.
 1.96 p̂(1  p̂)
– Find the sample size needed.
n
– Solution
.03

W = .03; 1 -  = .95,
therefore /2 = .025,
so z.025 = 1.96
Since the sample has not yet
been taken, the sample proportion
is still unknown.
We proceed using either one of the
following two methods:



2
•
Sample Size to Estimate the
Proportion
Method 1:
– There is no knowledge about the value of p̂
• Let p̂  .5 . This results in the largest possible n needed for a
1- confidence interval of the form p
ˆ  .03.
• If the sample proportion does not equal .5, the actual W will be
narrower than .03 with the n obtained by the formula below.
• Method 2:
– There is some idea about what p̂ will turn out to be.
• Use a probable value of p̂ to calculate the sample size
 1.96 .5(1  .5)
n
.03

2

  1,068
 1.96 .2(1  .2)

n
.03

2

  683

Practice Problems
• 12.40, 12.46, 12.58, 12.77, 12.98