Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
AP Statistics Written Interpretations and Templates *** Note: All conclusions and interpretations must be connected to the context of the problem. Interpretation of R-Squared (use R-Squared when asked about strength/reliability of the model) (Remember to check sign if calculating from a computer printout) _____% of the variability in _____ can be explained by the linear association between _____ and _____. (y, y, x) Interpretation of r (correlation coefficient) This is the correlation and has strength and direction. You must address both of these. (r is NOT a %... do not confuse r with R-Squared). Interpretation of Slope The model predicts that _____ (y) will change by approximately _____ (the slope) as _____ (x) increases by 1 (units) on average. Residual Actual – predicted OR observed – expected Confidence Interval I am _____% confident that the true _____ (mean/proportion) is between _____ and _____. Confidence Level If I repeated this process again and again I would capture the true _____ (mean/proportion) _____% of the time in the various intervals. Concluding a Statistical Test (Inference) Two sentences… the first states the p-value and whether or not you reject the null (is the pvalue small? <0.05). The second sentence states the conclusion in context of the problem. α-level • Also known as the significance level • Compare to the p-value to determine whether to reject… if the p-value < α then you REJECT the null. • The probability of a Type I error Type I Error – You reject the null and the null is TRUE Type II Error – You fail to reject the null and the null is FALSE How do you increase the power of a test? • Increase sample size • Increase α • Remember that increasing the power of the test by increasing alpha also increases the probability of a Type I error Summarizing/Comparing Distributions • Summarize or Compare CUSS or SOCS • Use real numerical values for center and spread • Do NOT say Normal for shape – unimodal and symmetric Transforming Data – addition or subtraction of a constant changes measures of center by that constant but measures of spread will not change. Multiplication and Division by a factor will change measures of center and spread. Graphical Displays • Categorical – Bar Graphs, Segmented Bar Graphs, and Pie Charts • Quantitative – Histograms, Boxplots, Dotplots for univariate data and scatterplots for bivariate data z-score – measures the number of standard deviations a value is from the mean Outliers (for boxplots): Q3 + 1.5IQR and Q1 – 1.5IQR (Upper and Lower Fences) – If a value falls outside the fences, it is an outlier. Describing Linear Association • Strength, Form, Direction • Correlation quantifies the strength of the linear association (ranges between -1 and 1 inclusive with zero being no correlation) Leverage – a point has leverage if it is extreme in the x-direction (far away from the mean of x’s) Influential – a point is influential if removing it changes the model (slope) *** Note: A point can have leverage and not be influential – if it fits the pattern (it will increase the correlation) Correlation does not imply causation Median and IQR are resistant to extreme values • Therefore we use Median and IQR to summarize a skewed distribution NOT mean and standard deviation • We use mean and standard deviation to summarize a distribution that is unimodal and symmetric (not skewed) Is the model APPROPRIATE? Look at the residual plot – scatter/no pattern means it is appropriate As a back-up, you can also look at the scatterplot of the data—is there a strong linear association? Population – Parameters Sample – Statistics ***Review the symbols used for each Samples • SRS • Cluster • Stratified • Systematic • Convenience • Multistage Note: Stratified and Cluster are similar in design… both begin by separating the population into homogeneous groups… a cluster sample will then select one of the groups (in its entirety) and a stratified sample will select a set number from each group. Bias (you can never eliminate bias only reduce it – we randomize in order to reduce bias) • Undercoverage – you do not reach the population (ex: call people during the day) • Response – something in your survey design changes peoples’ response o How the question is worded o Who is doing the survey • Nonresponse – you reach the people but they don’t respond for some reason • Voluntary Reponse – phone-in polls/internet polls Experimental Design • Explanatory Variable (x) • Response Variable (y) • You can conclude cause-and-effect ONLY with a controlled properly designed experiment (NOT with an observational study) • An observational study may suggest an association but there may be a lurking variable • Factors • Levels • Treatments – All factors across all levels • Subjects/experimental units • Matched Pairs – example: before and after • Single-Blind and Double-Blind • Confounding • Placebo Effect – only in humans Statistical Significance – when our observed difference is too large for us to conclude that it’s due to random chance alone Four principals of a properly designed experiment • Randomization – random assignment of treatments to treatment groups • Control – control the sources of variations (NOT control group) • Replication • Blocking – not essential but may improve our design if we are concerned about variation across groups Law of Large Numbers – as trials increase, the observed frequency approaches the true frequency (ex: tossing a coin 5 times vs. 500 times—5 times may get all tails but 500 times will approach 50-50) See Formula Sheet for Probability Formulas Formal Definition of Independence P(A|B) = P(A) Mutually Exclusive/Disjoint P(A∩B) = 0 ; the two events cannot happen simultaneously Expected Value (Mean or E(X)) – see formula sheet • All probabilities should add up to one in a probability model VARIANCES ADD!!! Bernoulli Trials • Independent • Success/Failure • Probability of Success is constant Geometric (waiting) • The probability that the first success happens on the nth trial • Expected value = 1/p (not on formula sheet) Binomial Model • Parameters are n and p • Probability of k success in n trials • Bernoulli Trials • Use Binompdf(n,p,k) or Binomcdf(n,p,k) • Binompdf(n,p,k) – finds the probability of exactly k successes • Binomcdf(n,p,k) – finds the probability of 0-k successes