Download PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Chapter 18 & 20: Inferences
Involving One Population
Student’s t, df = 25
Student’s t, df = 15
Student’s t, df = 5
0
t
1
Chapter Goals
• Learned about confidence intervals
• Consider inference about p, the probability of success
• Assumed s was known
• Consider inference about m when s is unknown
2
Inferences About the
Probability of Success
• Possibly the most common inference of all
• Many examples of situations in which we are
concerned about something either happening or not
happening
• Two possible outcomes, and multiple independent
trials
3
Background
1. p: the binomial parameter, the probability of success on a
single trial
2. p ': the observed or sample binomial probability
x x represents the number of successes that
p' 
n occur in a sample consisting of n trials
3. For the binomial random variable x:
m  np,
s  npq ,
where
q  1 p
4. The distribution of x is approximately normal if n is larger than 20
and if np and nq are both larger than 5
4
Sampling Distribution of p'
Sampling Distribution of p': If a sample of size n is randomly
selected from a large population with p = P(success), then the
sampling distribution of p' has:
1. a mean m p ' equal to p,
2. a standard error s p ' equal to ( pq) / n , and
3. an approximately normal distribution if n is sufficiently large
In practice, use of the following guidelines will ensure normality:
1. The sample size is greater than 20
2. The sample consists of less than 10% of the population
3. The products np and nq are both larger than 5
5
The Assumptions...
The assumptions for inferences about the binomial parameter p:
The n random observations forming the sample are selected
independently from a population that is not changing during the
sampling
Confidence Interval Procedure:
The unbiased sample statistic p' is used to estimate the population
proportion p
The formula for the 1  a confidence interval for p is:
p '  z(a/2) 
p' q'
n
to
p ' + z(a/2) 
p' q'
n
where p'  x / n and q'  1  p'
6
Example
 Example: A recent survey of 300 randomly selected fourth graders
showed 210 participate in at least one organized sport during
one calendar year. Find a 95% confidence interval for the
proportion of fourth graders who participate in an organized
sport during the year.
Solution:
1. Describe the population parameter of concern
The parameter of interest is the proportion of fourth graders who
participate in an organized sport during the year
2. Specify the confidence interval criteria
a. Check the assumptions
The sample was randomly selected
Each subject’s response was independent
7
Solution Continued
b. Identify the probability distribution
z is the test statistic
p' is approximately normal
n  300  20
np '  300(210 / 300)  210  5
nq '  300(90 / 300)  90  5
c. Determine the level of confidence: 1  a  0.95
3. Collect and present sample evidence
Sample information: n = 300, and x = 210
The point estimate: p'  x / n  210 / 300  0.70
8
Solution Continued
4. Determine the confidence interval
a. Determine the confidence coefficients:
Using Table A: z(a /2) = z(0.025) = 1.96
b. The maximum error of estimate:
p' q' 
(0.70)(0.30)
E  z(a /2) 
1.96
n
300
 (1.96) 0.0007  (1.96)(0.0265)  0.0519
c. Find the lower and upper confidence limits:
p ' E
0.70  0.0519
0.6481
to
to
to
p '+ E
0.70 + 0.0519
0.7519
9
Solution Continued
d. The Results
0.6481 to 0.7519 is a 95% confidence interval for the
true proportion of fourth graders who participate in an
organized sport during the year
z(a /2) ] 2  p * q *
[
Sample Size Determination: n 
E2
E: maximum error of estimate
1  a: confidence level
p*: provisional value of p (q* = 1  p*)
If no provisional values for p and q are given use p* = q* = 0.5
(Always round up)
10
Example
 Example: Determine the sample size necessary to estimate the true
proportion of laboratory mice with a certain genetic
defect. We would like the estimate to be within 0.015
with 95% confidence.
Solution:
1. Level of confidence: 1  a = 0.95, z(a/2) = z(0.025) = 1.96
2. Desired maximum error is E = 0.015.
3. No estimate of p given, use p* = q* = 0.5
[ z(a /2) ]2  p * q *
(1.96) 2  (0.5)  (0.5)

4. Use the formula for n: n 
2
E
(0.015) 2
0.9604

 4268.44  n  4269
0.000225
11
Note
Note: Suppose we know the genetic defect occurs in approximately
1 of 80 animals
Use p* = 1/80 = 0.125:
[z(a /2) ]2  p * q * (1.96) 2  (0.0125)  (0.9875)

n
2
E
(0.015) 2
0.0474

 210.75  n  211
0.000225
As illustrated here, it is an advantage to have some indication of the
value expected for p, especially as p becomes increasingly further
from 0.5
12
Inference About mean m (s unknown)
• Inferences about m are based on the sample mean x
• If the sample size is large or the sample population is
normal: z*  ( x  m ) /(s / n )
has a standard normal distribution
• If s is unknown, use s as a point estimate for s
• Estimated standard error of the mean: s / n
13
Student’s t-Statistic
1. When s is used as an estimate for s, the test statistic has two
sources of variation: x and s
2. The resulting test statistic:
xm
t
Known as the Student’s t-statistic
s n
3. Assumption: samples are taken from normal populations
4. The population standard deviation, s, is almost never known in
real-world problems
The standard error will almost always be estimated using s
n
Almost all real-world inference about the population mean will be
completed using the Student’s t-statistic
14
Properties of the t-Distribution (df>2)
1. t is distributed with a mean of 0
2. t is distributed symmetrically about its mean
3. t is distributed so as to form a family of distributions, a
separate distribution for each different number of degrees of
freedom (df  1)
4. The t-distribution approaches the normal distribution as the
number of degrees of freedom increases
5. t is distributed with a variance greater than 1, but as the
degrees of freedom increase, the variance approaches 1
6. t is distributed so as to be less peaked at the mean and
thicker at the tails than the normal distribution
15
Student’s t-Distributions
Normal distribution
Student’s t, df = 15
Student’s t, df = 5
0
t
Degrees of Freedom, df: A parameter that identifies each
different distribution of Student’s t-distribution. For the
methods presented in this chapter, the value of df will be the
sample size minus 1, df = n  1.
16
Notes
1. The number of degrees of freedom associated with s2 is the
divisor (n  1) used to define the sample variance s2
Thus: df = n  1
2. The number of degrees of freedom is the number of unrelated
deviations available for use in estimating s2
3. Table for Student’s t-distribution (Table C) is a table of critical
values. Left column = df. When df > 100, critical values of the tdistribution are the same as the corresponding critical values of
the standard normal distribution.
4. Notation: t(df, a)
Read as: t of df, a
17
t-Distribution Showing t(df, a)
a
0
t (df, a )
t
18
Example
 Example: Find the value of t(12, 0.025)
0.025
0.025
- t (12, 0.025)
 2.18
Portion of
Table 6
df
0
t (12, 0.025)
2.18
t
Amount of a in one-tail
...
...
0.025
..
.
12
2.18
19
Notes
1. If the df is not listed in the left-hand column of Table C, use the
next smaller value of df that is listed
2. Most computer software packages will calculate either the area
related to a specified t-value or the t-value that bounds a specified
area
3. The cumulative distribution function (CDF) is often used to find
area from   to t
4. If the area from   to t is known and the value of t is wanted,
then the inverse cumulative distribution function (INVCDF) is
used
cumulative
probability

t
20
The Assumption...
The assumption for inferences about mean m when s is
unknown: The sampled population is normally distributed
Confidence Interval Procedure:
1. Procedure for constructing confidence intervals similar to
that used when s is known
2. Use t in place of z, use s in place of s
3. The formula for the 1  a confidence interval for m is:
s

x t(df, a/2)
n
to
s
+
x t(df, a/2)
n
where df  n  1
21
Example
 Example: A study is conducted to learn how long it takes the typical tax payer
to complete their federal income tax return. A random sample of 17
income tax filers showed a mean time (in hours) of 7.8 and a
standard deviation of 2.3. Find a 95% confidence interval for the
true mean time required to complete a federal income tax return.
Assume the time to complete the return is normally distributed.
Solution:
1. Parameter of Interest
The mean time required to complete a federal income tax return
2. Confidence Interval Criteria
a. Assumptions: Sampled population assumed normal, s unknown
b. Test statistic: t will be used
c. Confidence level: 1  a = 0.95
22
Solution Continued
3. The Sample Evidence:
n  17,
x  7.8, and
s  2.3
4. The Confidence Interval
a. Confidence coefficients: t(df, a /2) = t(16, 0.025) = 2.12
b. Maximum error:
s
2.3



 ( 2.12)(0.5578)  1.18
t
(16,
0.025)
E
( 2.12)
n
17
xE
to
x+E
7.8  1.18
6.62
to
to
7.8 + 1.18
8.98
c. Confidence limits:
5. The Results:
6.62 to 8.98 is the 95% confidence interval for m
23