Download Review of some basic statistical concepts

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Confidence interval wikipedia , lookup

Foundations of statistics wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
A Broad Overview of Key
Statistical Concepts
An Overview of Our Review
•
•
•
•
Populations and samples
Parameters and statistics
Confidence intervals
Hypothesis testing
Populations and Samples
… and Parameters and Statistics
Populations and Parameters
• A population is any large collection of
objects or individuals, such as people,
students, or trees about which information is
desired.
• A parameter is any summary number, like
an average or percentage, that describes the
entire population.
Parameters
• Examples include population mean , the
population variance 2 and population
proportion p.
• 99.999999999999….% of the time, we
don’t (...or can’t) know the real value of a
population parameter.
• Best we can do is estimate the parameter!
Samples and Statistics
• A sample is a representative group drawn
from the population.
• A statistic is any summary number, like an
average or percentage, that describes the
sample.
Statistics
• Examples include the sample mean , and
the sample variance s2, and the sample
proportion (“p-hat”) p̂
• Because samples are manageable in size, we
can determine the value of statistics.
• We use the known statistic to learn about
the unknown parameter.
Example: Smoking at PSU?
Population of
42,000 PSU students
What proportion
smoke regularly?
Sample of
987 PSU students
43% reported
smoking regularly
Example: Grade inflation?
Population of
5 million college
students
Sample of
100 college students
Is the average
GPA 2.7?
How likely is it that
100 students would
have an average
GPA as large as 2.9
if the population
average was 2.7?
Two ways to learn
about a population parameter
• Confidence intervals estimate parameters.
– We can be 95% confident that the proportion of
Penn State students who have a tattoo is between
5.1% and 15.3%.
• Hypothesis tests test the value of parameters.
– There is enough statistical evidence to conclude
that the mean normal body temperature of adults
is lower than 98.6 degrees F.
Confidence Intervals
A Review of Concepts
The situation
• Want to estimate the actual population mean
.
• But can only get , the sample mean.
• Find a range of values, L <  < U, that we
can be really confident contains .
• This range of values is called a “confidence
interval.”
Confidence Intervals
for Proportions in Newspapers
• ABC News Poll, May 16-20, 2001
• 69% of 1,027 U.S. adults think using a hand-held
cell phone while driving a car should be illegal
• The “margin of error” is 3%.
• The “confidence interval” is 69% ± 3%.
• We can be really confident that between 66% and
72% of all U.S. adults think using a hand-held cell
phone while driving a car should be illegal.
General Form of
Most Confidence Intervals
•
•
•
•
Sample estimate ± margin of error
Lower limit L = estimate - margin of error
Upper limit U = estimate + margin of error
Then, we’re confident that the population
value is somewhere between L and U.
T-interval for Mean 
Formula in notation:






xt s
n






Formula in English:
Sample mean ± (t × estimated standard error)
where “t” comes from the t distribution, and
depends on the confidence level 1-a and the sample
size through the degrees of freedom “n-1”.
Length of Confidence Interval
• Want confidence interval to be as narrow as
possible.
• Length = Upper Limit - Lower Limit
How length of CI is affected?






xt s
n
•
•
•
•






As sample mean increases…
As the standard deviation decreases…
As we decrease the confidence level…
As we increase sample size …
T-Interval for Mean in Minitab
One-Sample T: TEMP
Variable N Mean
TEMP
130 98.27
StDev
0.778
SE Mean 95.0% CI
0.0682 (98.14,98.41)
We can be 95% confident that the average normal body
temperature of adults is between 98.1 and 98.4 degrees
Fahrenheit.
Hypothesis Testing
A Review of Concepts
General Idea of
Hypothesis Testing
• Make an initial assumption.
• Collect evidence (data).
• Based on the available evidence, decide
whether or not the initial assumption is
reasonable.
Example: Normal Body Temperature
Population of
many, many adults
Sample of
130 adults
Is average adult
body temperature
98.6 degrees? Or
is it lower?
Average body
temperature of 130
sampled adults is
98.25 degrees.
Making the Decision
• It is either likely or unlikely that we would
collect the evidence we did given the initial
assumption.
• (Note: “Likely” or “unlikely” is measured
by calculating a probability!)
• If it is likely, then we “do not reject” our
initial assumption. There is not enough
evidence to do otherwise.
Making the Decision (cont’d)
• If it is unlikely, then:
– either our initial assumption is correct and we
experienced an unusual event
– or our initial assumption is incorrect
• In statistics, if it is unlikely, we decide to
“reject” our initial assumption.
Idea of Hypothesis Testing:
Criminal Trial Analogy
• First, state 2 hypotheses, the null hypothesis
(“H0”) and the alternative hypothesis (“HA”)
– H0: Defendant is not guilty.
– HA: Defendant is guilty.
Criminal Trial Analogy
(continued)
• Then, collect evidence, such as finger
prints, blood spots, hair samples, carpet
fibers, shoe prints, ransom notes,
handwriting samples, etc.
• In statistics, the data are the evidence.
Criminal Trial Analogy
(continued)
• Then, make initial assumption.
– Defendant is innocent until proven guilty.
• In statistics, we always assume the null
hypothesis is true.
Criminal Trial Analogy
(continued)
• Then, make a decision based on the
available evidence.
– If there is sufficient evidence (“beyond a
reasonable doubt”), reject the null hypothesis.
(Behave as if defendant is guilty.)
– If there is not enough evidence, do not reject
the null hypothesis. (Behave as if defendant is
not guilty.)
Very Important Point
• Neither decision entails proving the null
hypothesis or the alternative hypothesis.
• We merely state there is enough evidence
to behave one way or the other.
• This is also always true in statistics! No
matter what decision we make, there is
always a chance we made an error.
Errors in Criminal Trials
Truth
Jury
Decision
Not guilty
Guilty
Not guilty
Guilty
OK
ERROR
ERROR
OK
Errors in Hypothesis Testing
Truth
Decision
Null
hypothesis
Do not
reject null
OK
Reject null
TYPE I
ERROR
Alternative
hypothesis
TYPE II
ERROR
OK
Definitions: Types of Errors
• Type I error: The null hypothesis is
rejected when it is true.
• Type II error: The null hypothesis is not
rejected when it is false.
• There is always a chance of making one of
these errors. But, a good scientific study
will minimize the chance of doing so!
Example: Normal Body Temperature
• Specify hypotheses.
– H0:  = 98.6 degrees
– HA:  < 98.6 degrees
• Make initial assumption:  = 98.6 degrees
• Collect data: Average body temp of 130
sampled adults is 98.27 degrees. How likely
is it that a sample of 130 adults would have
an average body temp as low as 98.27 if the
average body temp of population was 98.6?
Using the p-value
to make the decision
• The p-value represents how likely we
would be to observe such an extreme
sample if the null hypothesis were true.
• The p-value is a probability, so it is a
number between 0 and 1.
• Close to 0 means “unlikely.”
• So if p-value is “small,” (typically, less than
0.05), then reject the null hypothesis.
Example (continued)
The p-value can easily be obtained from
statistical software like MINITAB.
One-Sample T: TEMP
Test of mu = 98.6 vs mu < 98.6
Var
TEMP
N
130
Mean
98.27
StDev
0.778
SE Mean
T
0.0682 -4.79
P
0.000
(Generally, the p-value is labeled as “P”)
Example (continued)
• The p-value, <0.0001, indicates that, if the
average body temperature in the population
is 98.6 degrees, it is unlikely that a sample
of 130 adults would have an average body
temperature as extreme as 98.27 degrees.
• Decision: Reject the null hypothesis.
• Conclude that the average body temperature
is lower than 98.6 degrees.
What type of error
might we have made?
• Type I error here is claiming that average
body temp is lower than 98.6 when in fact it
really isn’t.
• Type II error here is failing to claim that
the average body temp is lower than 98.6
when it is.
• We rejected the null hypothesis, i.e. claimed
body temp is lower than 98.6, so we may
have made a Type I error.