Download AP-Test-Prep---Flashcards[2]

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Bias of an estimator wikipedia , lookup

German tank problem wikipedia , lookup

Regression toward the mean wikipedia , lookup

Least squares wikipedia , lookup

Coefficient of determination wikipedia , lookup

Robust statistics wikipedia , lookup

Transcript
Explain
Standard Deviation
Explain LSRL “b”
Why use a control group?
Explain LSRL “a”
Explain a P-value
Explain LSRL “SEb”
Goal of Blocking
Benefit of Blocking
Explain LSRL “ y ”
Explain LSRL “s”
SRS
For every one unit change in the x-axis
variable (context) the y-axis variable
(context) is estimated to
increase/decrease by ____ units
(context).
Standard Deviation measures spread
by giving the “typical” or “average”
distance that the observations
(context) are away from their (context)
mean
When the x-axis variable (context) is
zero, the y-axis variable (context) is
estimated to be put value here.
A control group gives the researchers a
comparison group to be used to evaluate
the effectiveness of the treatment(s).
(context)
(gage the effect of the treatment
compared to no treatment at all)
SEb measures the standard deviation of
the estimated slope for predicting the yaxis variable (context) from the x-axis
variable (context).
Assuming that the null is true (context)
the p-value measures the chance of
observing a statistic (or difference in
statistics) (context) as large or larger
than the one actually observed.
y is the “estimated” or “predicted” yvalue (context) for a given x-value
(context)
A SRS is a sample taken in such a way
that every set of n individuals has an
equal chance to be the sample actually
selected.
The goal of blocking is to create groups
of homogeneous experimental units.
The benefit of blocking is the reduction
of variation within the experimental
units. (context)
The value s = ___ is the standard
deviation of the residuals. It measures
a typical distance between the actual yvalues (context) and their predicted yvalues (context)
Sampling Techniques
2 Random Variables
(Formulas)
Bias
Is a Linear Model
Appropriate?
*Interpreting a Residual Plot*
Central Limit Theorem
The Meaning of 95% Confident
Experimental Designs
Interpreting r2
1 Random Variable
(Formulas)
Binomial Distribution
(Conditions)
Total Mean of 2 RV’s:
T = x+ y
Total Stdev of 2 Independent RV’s:
 T   x2   Y 2
Total Stdev of 2 Dependent RV’s:
Cannot be determined because it depends on how strongly
they are correlated.
1. SRS– Number the entire population, draw numbers from a
hat (every set of n individuals has equal chance of selection)
2. Stratified – Split the population into homogeneous groups,
select a SRS from each group.
3. Voluntary Response – People choose themselves by
responding to a general appeal.
4. Multistage – Select successively smaller groups within the
population in stages, resulting in a sample consisting of
clusters of individuals.
1. No Clear Pattern – Particularly check that there is
not a curved pattern.
2. No increasing or decreasing spread – Bad for
predicting the future (or past)
3. Are the residuals small? (Notice the units)
4. No clear outliers (large residuals) or influential
observations (pulling the LSRL up or down)
5. Is “r” (or “r2”) close to 1 or -1? The closer the better!
If so, the LSRL is a good model for the data!
The systematic favoring of certain
outcomes from flawed sample selection,
poor question wording, undercoverage,
nonresponse, etc.
Bias deals with the center of a sample
distribution being “off”!
The method used to produce this interval will
capture the true population mean/proportion in
95% of all possible samples of this same size from
this same population.
1. If the population distribution is normal the sampling
distribution will also be normal with the same mean as
the population. Additionally, as n increases the
sampling distribution’s standard deviation will decrease
2. If the population distribution is not normal the
sampling distribution will become more and more normal
as n increases. The sampling distribution will have the
same mean as the population and as n increases the
sampling distribution’s standard deviation will decrease.
r2 = ____ means that ___% of the variation in y
(context) is explained by the LSRL of y (context)
on x (context).
Or
2
r = ____ means that ___% of the variation in y
(context) is explained by using the linear regression
model with x (context) as the explanatory variable.
1. CRD (Completely Randomized Design) – All
experimental units are allocated at random among all
treatments
2. RBD (Randomized Block Design) – Experimental units
are put into homogeneous blocks. The random
assignment of the units to the treatments is carried out
separately within each block.
3. Matched Pairs – A form of blocking in which each subject
receives both treatments in a random order or the subjects are
matched in pairs as closely as possible and one subject in each pair
receives each treatment.
Mean (Expected Value):
1.
2.
3.
4.
Two Outcomes: Success & Failure
Fixed Number of Trials (n)
Fixed Probability of Success for Each Trial (p)
Trials are Independent
 x   xi pi
(Multiply & add across the table)
Standard Deviation:
x 
 ( xi   ) pi
Sum of: (Each x value – the mean)2(its probability)
Binomial Distribution
(Mean & Standard Deviation)
Outlier Rule
Binomial Distribution
(Calculator Usage)
What is an Outlier?
Type I Error,
Type II Error,
& Power
Interpret r
Interpret a Z-score
P(At Least 1)
Two Events are Independent
If…
Linear Transformations
Upper Bound = Q3 + 1.5(IQR)
Lower Bound = Q1 – 1.5(IQR)
IQR = Q3 – Q1
When given 1 variable data:
An outlier is any value that falls more
than 1.5IQR above Q3 or below Q1
Regression Outlier:
Any data point that has a “large”
residual
 x  np
Standard Deviation:  x  np(1  p)
Mean:
Exactly 5:
At Most 5:
Less Than 5:
At Least 5:
More Than 5:
P(X = 5) = Binompdf(n, p, 5)
P(x  5) = Binomcdf(n, p, 5)
P(X < 5) = Binomcdf(n, p, 4)
P(x  5) = 1 – Binomcdf(n, p, 4)
P(X > 5) = 1 – Binomcdf(n, p, 5)
Correlation measures the strength and direction of 1. Type I Error: H is innocent, but due to
0
the linear relationship between x and y.
unfortunate sample selection that did not represent
 r is always between -1 and 1.
the population well, we mistakenly reject H0.
 Close to zero = very weak,
2. Type II Error: H0 is guilty (should be rejected), but
 Close to 1 or -1 = stronger
due to unfortunate sample selection (which did not
 Exactly 1 or -1 = Perfectly straight line
represent the population well), we fail to reject H0.
 Positive r = + Correlation
3. Power: Probability of rejecting H0 when H0 should
be rejected. (Rejecting Correctly)
 Negative r = - Correlation
P(At least 1) = 1 – P(None)
Ex. P(Get a statmaster on any one test) = 0.02
Tests are independent, and there are 14 chapter tests.
P(Get at least 1 statmaster) = 1-P(None)
= 1 – (0.98)14
= 0.246
Adding “a” to every member of a data set adds “a”
to the measures of center, but does not change the
measures of spread.
Multiplying every member of a data set by “a”
multiplies the measures of center by “a” and
multiplies the measures of spread by |a|.
z
statistic  mean
stdev
A z-score describes how many standard
deviations a value or statistic (x, x , p ) falls
away from the population mean. The further
the z-score is away from zero the more
“surprising” the value of the statistic is.
P(A and B) = P(A) P(B)
Or
P(B) = P(B|A)
Unbiased Estimator
Why Large Samples Give More
Trustworthy Results…
(When collected appropriately)
Does the Sample Represent the
Population Well?
Describe the Distribution
OR
Compare the Distributions
Experiment or Observational
Study?
Does ___ CAUSE ___?
How to Set Up a
Simulation…
What is a Residual?
Extrapolation
SOCS
When collected appropriately, large samples The data is collected in such a way that there
yield more precise/accurate results than small
is no systematic tendency to over or
samples because in a large sample the values
underestimate the true value of the
of the response tend to average out and
population parameter. (The mean of the
approach that of the true population
sampling distribution equals the true value of
parameter.
the parameter being estimated)
SOCS!
Shape, Outliers, Center Spread
Only discuss outliers if there are obviously outliers
present. You will get full credit for SCS!
If it says “Compare”
YOU MUST USE comparison words like “Is
greater than” or “Is less than” for Center & Spread
Association is NOT Causation!
An observed association, no matter how
strong, is not evidence of causation. Only a
well designed, controlled experiment can lead
to conclusions of cause and effect.
Residual = y  yˆ
A residual measures the difference between
the actual (observed) y-value in a scatterplot
and the y-value that is predicted by the
LSRL for any given value of x.
In the Calculator: L3 = L2 – Y1(L1)
Shape – Skewed Left (Mean < Median)
Skewed Right (Mean > Median)
Fairly Symmetric (Mean ≈ Median)
Outliers – Only discuss them if they are obvious
Center – Mean or Median (whichever is easier)
Spread – Range, IQR, or Standard Deviation
( whichever is easier)
Yes, if:
They have a large, random sample taken
from the same population we hope to
draw conclusions about.
A study is an experiment ONLY if
they IMPOSE a treatment upon the
experimental units.
In an observational study we make no
attempt to influence the results.
1. Assign digits to represent the outcomes/responses
2. Scheme: Will you use Table B, RandInt? How
many numbers will you read at a time? Skip
Numbers? Skip Repeats?
3. Recording the Data: What are you counting?
When do you stop?
4. Repeat Many Trials
5. Report the Results of the Simulation
Using a LSRL to predict outside the
domain of the explanatory variable.
(Can lead to ridiculous conclusions if the
current linear trend does not continue)
Carrying out a Two-Sided Test
from a CI
Matched Pairs t-test
Phrasing Hints,
H0 and Ha,
Conclusion
Two Sample t-test
Phrasing Hints,
H0 and Ha,
Conclusion
Complimentary Events
Binomial Distribution
Conditions
Key Phrase: MEAN DIFFERENCE
Ho: μDiff = 0
Ha: μDiff < 0, > 0, ≠0
μ = The mean difference in __ for all __.
We do/do not have enough evidence at the
0.05 level to conclude that the mean
difference in __ for all __ is ___.
We do/do not have enough evident to
reject H0: μ = ? in favor of Ha: μ≠ ?at
the α = 0.05 level (1 – Confidence Level)
because ? falls inside/outside the 95% CI
(or whatever Confidence Level was used)
2 Disjoint Events whose union is the sample space.
Key Phrase: DIFFERENCE IN THE MEANS
A
Ac
Ex: Boy/Girl,
Rain/Not Rain,
Draw at least one heart / Draw NO hearts
Ho: μ1 = μ2 OR μ1 - μ2 = 0
Ha: μ1 < μ2, > μ2, ≠ μ2
μ = The difference in the mean __ for all __.
We do/do not have enough evidence at the
0.05 level to conclude that the difference in
the mean __ for all __ is ___.
1. Two Outcomes: Success / Failure
2. Fixed # of trials/observations (n)
3. Probability of Success is the same
for all trials/observations (p)
4. The “n” trials/observations are
independent.