Download AP Statistics Written Interpretations and Templates

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

Pattern recognition wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
AP Statistics
Written Interpretations and Templates
*** Note: All conclusions and interpretations must be connected to the context of the problem.
Interpretation of R-Squared (use R-Squared when asked about strength/reliability of the
model) (Remember to check sign if calculating from a computer printout)
_____% of the variability in _____ can be explained by the linear association between _____
and _____. (y, y, x)
Interpretation of r (correlation coefficient)
This is the correlation and has strength and direction. You must address both of these. (r is
NOT a %... do not confuse r with R-Squared).
Interpretation of Slope
The model predicts that _____ (y) will change by approximately _____ (the slope) as _____ (x)
increases by 1 (units) on average.
Residual
Actual – predicted OR observed – expected
Confidence Interval
I am _____% confident that the true _____ (mean/proportion) is between _____ and _____.
Confidence Level
If I repeated this process again and again I would capture the true _____ (mean/proportion)
_____% of the time in the various intervals.
Concluding a Statistical Test (Inference)
Two sentences… the first states the p-value and whether or not you reject the null (is the pvalue small? <0.05). The second sentence states the conclusion in context of the problem.
α-level
• Also known as the significance level
• Compare to the p-value to determine whether to reject… if the p-value < α then you
REJECT the null.
• The probability of a Type I error
Type I Error – You reject the null and the null is TRUE
Type II Error – You fail to reject the null and the null is FALSE
How do you increase the power of a test?
• Increase sample size
• Increase α
• Remember that increasing the power of the test by increasing alpha also increases the
probability of a Type I error
Summarizing/Comparing Distributions
• Summarize or Compare CUSS or SOCS
• Use real numerical values for center and spread
• Do NOT say Normal for shape – unimodal and symmetric
Transforming Data – addition or subtraction of a constant changes measures of center by that
constant but measures of spread will not change. Multiplication and Division by a factor will
change measures of center and spread.
Graphical Displays
• Categorical – Bar Graphs, Segmented Bar Graphs, and Pie Charts
• Quantitative – Histograms, Boxplots, Dotplots for univariate data and scatterplots for
bivariate data
z-score – measures the number of standard deviations a value is from the mean
Outliers (for boxplots): Q3 + 1.5IQR and Q1 – 1.5IQR (Upper and Lower Fences) – If a value
falls outside the fences, it is an outlier.
Describing Linear Association
• Strength, Form, Direction
• Correlation quantifies the strength of the linear association (ranges between -1 and 1
inclusive with zero being no correlation)
Leverage – a point has leverage if it is extreme in the x-direction (far away from the mean of
x’s)
Influential – a point is influential if removing it changes the model (slope)
*** Note: A point can have leverage and not be influential – if it fits the pattern (it will increase
the correlation)
Correlation does not imply causation
Median and IQR are resistant to extreme values
• Therefore we use Median and IQR to summarize a skewed distribution NOT mean and
standard deviation
• We use mean and standard deviation to summarize a distribution that is unimodal and
symmetric (not skewed)
Is the model APPROPRIATE?
Look at the residual plot – scatter/no pattern means it is appropriate
As a back-up, you can also look at the scatterplot of the data—is there a strong linear
association?
Population – Parameters
Sample – Statistics
***Review the symbols used for each
Samples
• SRS
• Cluster
• Stratified
• Systematic
• Convenience
• Multistage
Note: Stratified and Cluster are similar in design… both begin by separating the population into
homogeneous groups… a cluster sample will then select one of the groups (in its entirety) and a
stratified sample will select a set number from each group.
Bias (you can never eliminate bias only reduce it – we randomize in order to reduce bias)
• Undercoverage – you do not reach the population (ex: call people during the day)
• Response – something in your survey design changes peoples’ response
o How the question is worded
o Who is doing the survey
• Nonresponse – you reach the people but they don’t respond for some reason
• Voluntary Reponse – phone-in polls/internet polls
Experimental Design
• Explanatory Variable (x)
• Response Variable (y)
• You can conclude cause-and-effect ONLY with a controlled properly designed
experiment (NOT with an observational study)
• An observational study may suggest an association but there may be a lurking variable
• Factors
• Levels
• Treatments – All factors across all levels
• Subjects/experimental units
• Matched Pairs – example: before and after
• Single-Blind and Double-Blind
• Confounding
• Placebo Effect – only in humans
Statistical Significance – when our observed difference is too large for us to conclude that it’s
due to random chance alone
Four principals of a properly designed experiment
• Randomization – random assignment of treatments to treatment groups
• Control – control the sources of variations (NOT control group)
• Replication
• Blocking – not essential but may improve our design if we are concerned about variation
across groups
Law of Large Numbers – as trials increase, the observed frequency approaches the true
frequency (ex: tossing a coin 5 times vs. 500 times—5 times may get all tails but 500 times will
approach 50-50)
See Formula Sheet for Probability Formulas
Formal Definition of Independence
P(A|B) = P(A)
Mutually Exclusive/Disjoint
P(A∩B) = 0 ; the two events cannot happen simultaneously
Expected Value (Mean or E(X)) – see formula sheet
• All probabilities should add up to one in a probability model
VARIANCES ADD!!!
Bernoulli Trials
• Independent
• Success/Failure
• Probability of Success is constant
Geometric (waiting)
• The probability that the first success happens on the nth trial
• Expected value = 1/p (not on formula sheet)
Binomial Model
• Parameters are n and p
• Probability of k success in n trials
• Bernoulli Trials
• Use Binompdf(n,p,k) or Binomcdf(n,p,k)
• Binompdf(n,p,k) – finds the probability of exactly k successes
• Binomcdf(n,p,k) – finds the probability of 0-k successes