Download Last Powerpoint

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
You are about to take the
AP Stats test and…
MAKE SURE YOU HAVE…

MULTIPLE PENCILS…
1 OR 2 CALCULATORS…
EXTRA BATTERIES…

EAT A FULL MEAL BEFOREHAND!


Statistics is about…
Models
A model is an attempt
to represent reality…
 …but we know it’s not
perfect.

Models
“All models are
wrong, but some are
useful.”
(George Box)
Models
Common models:
 Regression lines
 Simulations
AND MORE…
THINK about Models
Avoid confusion …
• known vs unknowable
• samples vs populations
• statistics vs parameters
THINK about Models
Probability Models:
 Normal Model
 Geometric & Binomial
 t - Models
 Chi-Square Models
The calculator…
… can’t TELL what it
all means.
YOU have to do that, too!
Remember …
Answers are not
numbers, answers
are sentences in
context.
The 68, 95, 99.7 Rule

Only works for Normal Distributions.
When describing Univariate
Data (1 variable)



Shape!
Center!
Spread!
Range is…



A way to measure the “spread” of data
It is a single number, such as 40. (not
30-70)
Variance, standard deviation, IQR are
other ways to measure the “spread” of
data.
Mean or Median is…


A way to measure the location of data
(center)
They are a single number
Adding a constant to every
value in a data set…


Changes the central location of the data
(such as mean or median)
Does NOT change the spread of the
data (such as standard deviation,
variance, IQR)
Multiplying by a constant to
every value in a data set…


Changes the central location of the data
(such as the mean or the median)
DOES CHANGE the spread of the data
(such as standard deviation, variance,
IQR)
When describing bivariate
quantitative data…




Form!
Strength!
Direction!
This is when describing a scatterplot,
linear regression, or the like...
A residual is…


The vertical distance from point to LSRL
It is calculated as



Observed Y – Predicted Y
All points ABOVE the LSRL have positive
residuals!
All points BELOW the LSRL have
negative residuals!
An Exponential Model is best
fit when…


Log Y vs X is linear
Variable is in the exponent
y  ab
x
A Power Model is best fit
when…


Log Y vs. Log X is linear
Variable is in the base
y  ax
b
Interpreting Slope…

“For every [unit] increase in [x], we
expect an [? unit] increase in [y]
Interpreting Y Intercept

When [x] is 0 [units], we expect [y] to
be [? units]
You will see the word
“COMPARE” at least once…


So COMPARE!!!
Don’t list attributes… use comparative
words!
Lurking variable (Common
Response)
x
y
z
Confounding Variable
x
y
z
“Correlation does not imply
causation!”

Be careful with the word “cause”. The
only way to prove causation is a
properly designed experiment.
R is called the…

“Correlation Coefficient”

Or just…. “Correlation”

It measure the strength and direction of
a linear relationship (no context for
nonlinear relationships)
R-squared is called…



“Coefficient of Determination”
And is interpreted as “the % of the
variation in [y] that is explained by [x]”
It can also be thought of as

sum of explained error/sum of total error
If r squared is .64


Then r = .8 OR r= -.8
Figure it out by looking at the direction
of the scatterplot!
Simpson’s Paradox is…

When combining the data from 2
groups results in a reversal of direction
of the conclusion.
Placebo Effect

Giving a person a sugar pill and telling
them it will make them feel better
Double Blind

Neither the subjects nor the
experimenter know which treatment the
subject is receiving
Disjoint

Both cant occur simultaneously
Mutually Exclusive

One or the other must occur
Checks for Independence

P(A and B)=p(A)p(B)

Or

P(B)=p(B|A)
Rules for Means and Variances
of Discrete Random Variables

P420
a bx  a  b  x

2
a  bx
 b 
2
2
x
Central Limit Theorem

As sample size increases,



the shape of the sampling distribution gets
more and more normal, regardless of the
shape of the parent distribution.
The center (mean) of the sampling
distributions stays exactly the same.
The variability in the sampling distribution
(standard deviation) decreases.
Law of Large Numbers

As sample size increases, the mean,
Xbar, tends to get closer and closer to
u.
DON’T FORGET
TO…
Use the proper
NOTATION.
P(x>2)=….
Notation is communication
a, b, n, p, q, r, s, t, x, y, z, E, H, P, π,
,  all have special
meanings…
 …and “hats” or “bars” change
those meanings.
 You are not free to substitute
another letter even though it looks
like algebra.

4 Requirements of a Binomial
Setting are…




Independence
Success/Failure for each trial
Equal probability of success for each
trial
Fixed number of trials
The only difference between
binomial setting and geometric
setting is…

Geometric does not have a fixed number
of trials, it is waiting for the first
success…
The mean of a geometric
distribution is…



1/p
(this is not on your formula sheet, but
you should know it)
To calculate a geometric probability,
use a tree diagram
Undercoverage Bias

When some groups are systematically
left out of the sampling process (like
people without phones in a phone
survey)
Voluntary Response Bias

When a sample consists of volunteers
(like calling in to a radio survey)
Nonresponse Bias

When an individual cant be contacted
or refuses to cooperate

P of your “Phantoms” is
“Define the parameter”.
So Define your parameter
(either p or u) as
specifically as possible.
The grader of your test…

Doesn’t know what “PANIC” and
“PHANTOMS” are… they are simply for
your own organization.
Your hypotheses must…



Be about the PARAMETERS.
Why make a hypothesis about the
sample?
(X bar and p hat shouldn’t ever be in
the Hypotheses).
When you do inference…

You hypothesize about what the value
of a single number is (that number is
usually u, or p).
What IS an Assumption?
an underlying hypothesis about
the situation required by the
mathematical justification for the
statistical method.
WE WILL PROBABLY
NEVER KNOW
IF AN ASSUMPTION IS TRUE.


Draw the graph if you are
given the data set! An
outlier/skewness can
dramatically affect your
Test Statistic.


Your sample should be an SRS of the
POPULATION OF INTEREST. Be
specific.
If the sample is randomly selected and
unbiased, then we can generalize the
findings to the population

We rarely know the population standard
deviation, so the only time you do ztests will probably be when using
proportions.
Don’t Forget…

Degrees of Freedom! -for all t-statistics
and Chi-Square statistics.



***(r-1)(c-1) for Chi-Square Tests for
Homogeneity and Chi-Square Tests for
Independence
***n-2 for LinReg T Test
*** n-1 for all others.
Can you find and interpret
critical values?




1-sided or 2-sided?
Draw a picture!
Degrees of freedom?
Critical values interpret to be the
number of std. devs. away from the
mean, usually in order to reject a
hypothesis
Your Calculations in
PHANTOMS should…


Include equation and values you are
plugging in…
Normal curves will earn you a PLUS (but
don’t draw a normal curve for ChiSquare Distributions, they are ALWAYS
skewed.
Interpretation of Confidence
Interval:

“We are 95% confident that _____ falls
within the interval _________”
Interpretation of Confidence
Level

“If we repeated this sampling and
calculation process many times, 95% of
all calculated intervals will correctly
contain ______.”
Interpretation of the P-Value

“A p-value of ______ indicates that if
Ho is true, then we would obtain a
sample statistics as extreme as ours
less than (more than) ____% of the
time due to random chance alone.”
The communication rubric:
Your conclusion should be …
Clear,
Concise,
Complete, and
in Context!
You will be given an essay
question that asks you to
describe an experiment…

Be thorough!



Your answer must include Repetition,
Randomization, Control, and Comparison.
Don’t abbreviate “R.A.” Spell it out! – They
will assume you don’t know what you are
doing, but you do!
SAY WHAT YOU ARE COMPARING!
When describing an
experiment or simulation…

Simply stating one of these things is not
enough…




“use a random number generator…”
“use a table or random digit table…”
“use a coin to randomly select…”
YOU MUST DESCRIBE YOUR
PROCEDURE IN DETAIL
Describing random assignment
using a RDT…



Label subjects to digits
Peel digits
Assign subjects to treatments
If you are asked to simulate
something…

Include a stopping rule as well
Type 1 error is…

The probability of incorrectly rejecting
the Null Hypothesis.
Type 2 error is…


The probability of incorrectly failing to
reject the null hypothesis.
1- Power
How will each of these affect
the power of a test?…







Using a larger alpha?
Using a larger sample size?
Using a smaller sigma?
Choosing an alternative that is further
from the population mean?
Increasing Type 1 error?
Decreasing Type 2 error?
……They all will Increase the Power!!!
Medians…


Are resistant!
Means are not!
IQR…

Is a measure of spread!
The 1.5 IQR Rule…

Adds or subtracts from Q1 and Q3!!!

Not the median…
A Five Number Summary is


Min, Q1, Med, Q3, Max
0%, 25%, 50%, 75%, 100%
Variance is

Std. dev squared!
Show all your work, even for
little calculations!


AND WRITE NEATLY!
Even show your work if the answer is

2+2 = 4
If you erase on your scantron
sheet, erase thoroughly!
Please use…



All 90 minutes of each section…
It cant hurt you and you cant go
anywhere…
Once you think you are done, go back
and try to find at least 1 problem to
change/correct/improve.
The Investigative Task…


Will be difficult- but it will be difficult for
everybody.
Even if you can get a 2 on it, you are
ahead of the game.
If you don’t know an answer
to part A of a question…


Say “Suppose Part A is 3.6”… just so
you can go on to part B…
You can still get full credit for Part B if
Part A is incorrect.
Every answer…

Should be in context
If you want to review one
more thing….


Review mean and variance of a discrete
random variable (last pages in your
review packet)
The answers and explanations are
included in the packet.
And the most important thing
I can tell you…
READ CAREFULLY and
ANSWER THE QUESTION!

THEY LIKE TO ASK MULTIPLE
QUESTIONS AT ONCE… SO ANSWER
COMPLETELY!