Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Sufficient statistic wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Confidence interval wikipedia , lookup

Taylor's law wikipedia , lookup

Statistical inference wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Class Four
To Be Turned In:
Chapter 3: 28, 30, 36, 42
Chapter 4: 24, 32, 36, 38
Chapter 5: 30, 32, 34, 44
For Class Five:
Chapter 11: problems 24, 32, 36
Chapter 14: problems 36, 38, 50
Quiz 2
Read Chapters 15 & 17
Objectives for Class Four
• Compare and contrast population parameters and sample statistics.
• Define basic terms related to statistics, including sampling variability,
sampling distribution of a statistic, and unbiased statistic.
• Describe the sampling distribution of the sampling mean and calculate
probabilities regarding the sample mean by using the central limit
theorem.
• Explain the meaning of statistical confidence.
• Compute confidence intervals for the mean of a population (assuming
the standard deviation is known).
• Discuss the relationship between sample size and margin of error in a
confidence interval.
• Correctly state the null and alternative hypotheses for one-sample
hypothesis tests.
• Calculate and interpret the p-value for one sample hypothesis test.
• Explain the meaning of statistical significance and determine if a
hypothesis test is significant at a given level.
Parameters and Statistics
• parameter: a number that describes the population which
is usually unknown because we cannot examine the entire
population
• statistic: is a number that can be computed from the
sample data without making use of any unknown
parameters which is usually used to estimate an unknown
parameter
• “population mean” = μ
• “sample mean” = x
• Law of Large Numbers: draw observations from any
population with finite mean μ. As the number of
observations drawn increases, the mean x of the observed
values gets closer and closer to the mean μ of the
population.
Sampling Distributions
• sampling distribution of a statistic: is the distribution of
values taken by the statistic in all possible samples of the
same size from the same population
• sampling distribution of a sampling mean: if individual
observations have N(μ, σ) distribution, then the sample
mean x of n independent observations has the N(μ, σ/ n )
distribution.
• mean and standard deviation of sample mean: if x is the
mean of an SRS of size n drawn from a large population
with mean μ and standard deviation σ, then the mean of the
sampling distribution of x is μ and its standard deviation is
σ/ n
– the average of all the sample means will be the mean of the
population which makes it an unbiased estimator of the
parameter
– averages of this nature are less variable than individual
observations
Central Limit Theorem
• Draw an SRS of any size n from a population with mean μ
and standard deviation σ. When n is large, the sampling
distribution of the sample mean x is approximately Normal
  
x is approximat ely N ,

n


Statistical Inference
• statistical inference: provides methods for drawing
conclusions about a population from sample data
• Inference about a mean under simple conditions
– We have an SRS from the population of interest.
– The variable we measure has perfectly Normal distribution
N(μ, σ) in the population.
– We don’t know the population mean μ. Our task is to infer
something about μ from the sample data, but we do know the
population standard deviation σ.
• confidence interval: a level C confidence interval for a
parameter has two parts
– an interval calculated from the data, usually of the form
estimate ± margin of error
– a confidence level C, which gives the probability that the
interval will capture the true parameter value in repeated
samples i.e. the success rate of the method
Statistical Inference
• confidence interval for the mean of a population: draw an
SRS of size n from a Normal population having unknown
mean μ and known standard deviation σ. A level C
confidence interval for μ is
  
x  z *

 n
• the margin of error is
 

 n
z *
• z* values for differing levels of confidence C can be found
on the bottom row of Table C.
NAEP Quantitative Scores
The NAEP survey includes a short test of quantitative skills,
covering mainly basic arithmetic and the ability to apply it to
realistic problems. Scores on the test range from 0 to 500, with
higher scores indicating greater numerical abilities. It is
known that NAEP scores have standard deviation  = 60. In a
recent year, 840 men 21 to 25 years of age were in the NAEP
sample. Their mean quantitative score was 272.
On the basis of this sample, estimate the mean score  in the
population of all 9.5 million young men of these ages with
95% conidence.
  
x  z *

 n
 60 
272  1.960

 840 
 60 
272  1.960 
  267.9424
 840 
 60 
272  1.960 
  276.0576
 840 
Using a sample of 840 men aged 21 to 25, we are 95%
confident that the mean quantitative skills score on the
NAEP exam for this population is between 267.9424
and 276.0576 with MOE of 4.0576.
Margins of Error
• there is a trade off between margin of error and confidence
level.
– to obtain a smaller margin of error for the same data you must
be willing to accept lower confidence
– it is easier to pin down μ when σ is small
– increasing the sample size reduces the margin of error
• choosing a sample size for a desired margin of error
– the confidence interval for the mean of a Normal population
will have a specified margin of error m when the sample size is
 z* 
n 

m


2
Tests of Significance
• The goal of a test of significance is to test the evidence
provided by data about some claim, called a null
hypothesis, concerning a parameter of the population
– an outcome that would rarely happen if a claim were true is good
evidence that the claim is not true
• this is based on the idea of a counterexample from logic: it
takes only one instance of when a statement is untrue to
show that it is unreliable and should be considered false
• examples, no matter how numerous, only demonstrate truth
in that one instance they do not demonstrate the truth of the
statement in all circumstances
– tests of significance start with an SRS from an exactly Normal
population with standard deviation  known to us
Stating Hypotheses
• null hypothesis (H0): is the statement being tested
statistically
– the test is designed to assess the strength of the evidence against
the null hypothesis
– usually the null hypothesis is a statement of “no effect” or “no
difference”
• alternative hypothesis (Ha): the claim about the population
that we are trying to find evidence for
– one sided: we are interested in whether the parameter is greater
than or less than the “no effect” level but not both
– two sided: we are interested in whether the parameter is both
greater than and less than the “no effect” level
The Hypotheses for Means

Null: H0: μ = μ 0

One sided alternatives
Ha: μ > μ 0
Ha: μ < μ 0
Two sided alternative
Ha: μ ≠ μ 0

Test Statistic
• The test is based on a statistic that compares the value of
the parameter stated in the null hypothesis with an estimate
of the parameter from the sample data. The estimate is
usually the same one used in a confidence interval for the
parameter.
• Large values of the test statistic indicate that the estimate is
far from the parameter value specified by H0. These values
give evidence against H0. The alternative hypothesis
determines which directions count against H0.
x 
• the z test statistic is given by: z 
/ n
– measures how far the sample data diverge from the null
hypothesis.
– the probability of this occurring is given by a p-value
P-values & Statistical Significance
• p-value: the probability, computed assuming that H0 is true, that
the test statistic would take a value as extreme or more extreme
that that actually observed. exact p values can be found on
Table A or with software
– the smaller the p-value the stronger the evidence against H0 provided by
the data
– large p-values fail to give evidence against H0.
– Ha: μ > μ 0 P-value is the probability of getting a value as large or larger than the
observed test statistic (z) value.
─
Ha: μ < μ 0 P-value is the probability of getting a value as small or smaller than
the observed test statistic (z) value.
─
Ha: μ ≠ μ 0 P-value is two times the probability of getting a value as large or
larger than the absolute value of the observed test statistic (z) value.
• significance level (): a fixed value (critical values from Table
C) of the p-value that we consider to be decisive i.e. a
predetermined level of evidence required to reject H0.
ranges from weak evidence near 0.10
to some evidence near 0.05
no evidence
good evidence
varying
degrees of
strong
evidence
Suppose we know that for any cola, the sweetness loss scores
vary from taster to taster according to a Normal distribution
with standard deviation σ = 1.
The mean μ for all tasters measures loss of sweetness.
The sweetness losses for a new cola, as measured by 10
trained testers, yields an average sweetness loss of
x = 1.02. Do the data provide sufficient evidence that the
new cola lost sweetness in storage?
The null hypothesis is no average sweetness loss occurs,
while the alternative hypothesis (that which we want to show
is likely to be true) is that an average sweetness loss does
occur.
H0: μ = 0
Ha: μ > 0
This is considered a one-sided test because we are interested
only in determining if the cola lost sweetness (gaining
sweetness is of no consequence in this study).
x   0 1.02  0
z

 3.23

1
10
n
For test statistic z = 3.23 and alternative hypothesis
Ha: μ > 0, the P-value would be:
P-value = P(Z > 3.23) = 1 – 0.9994 = 0.0006
If H0 is true, there is only a 0.0006 (0.06%) chance that we
would see results at least as extreme as those in the sample;
thus, since we saw results that are unlikely if H0 is true, we
therefore have evidence against H0 and in favor of Ha and we
say:
“Using samples of 10 trained tasters, we have extremely strong
evidence (P ≤ 0.0006) to reject the null hypothesis that there is
no loss of sweetness in cola.”
Objectives for Class Four
• Compare and contrast population parameters and sample statistics.
• Define basic terms related to statistics, including sampling variability,
sampling distribution of a statistic, and unbiased statistic.
• Describe the sampling distribution of the sampling mean and calculate
probabilities regarding the sample mean by using the central limit
theorem.
• Explain the meaning of statistical confidence.
• Compute confidence intervals for the mean of a population (assuming
the standard deviation is known).
• Discuss the relationship between sample size and margin of error in a
confidence interval.
• Correctly state the null and alternative hypotheses for one-sample
hypothesis tests.
• Calculate and interpret the p-value for one sample hypothesis test.
• Explain the meaning of statistical significance and determine if a
hypothesis test is significant at a given level.
Next Week Class Five
To Be Completed Before Class Five:
Chapter 11: problems 24, 32, 36
Chapter 14: problems 36, 38, 50
Quiz 2
Read Chapters 15 & 17