Download Chapter 1 Reminders

AP Statistics        Chapter Reminders page 1 of 16 Chapter 2 Reminders A categorical variable places individuals in a category. A quantitative variable has a numerical value that measures something. Quantitative data should have units Just because a value is a number, don’t assume that it is a quantitative variable. A statistic is a numerical summary of data. Know the difference between statistics and data. Know the meaning of univariate and bivariate analysis. Chapter 3 Reminders    ALWAYS make a picture. Know how to create and interpret these graphs: bar charts, pie charts, contingency tables (also called two-way tables), segmented bar charts Two way tables: Know how to find marginal distributions and conditional distributions Example: A group of students were asked if they preferred the number 2 or the number 5. Then they were asked if they preferred the color blue or the color green. The results are given below. Blue Green 2 18 7 5 6 15 The marginal distribution for color preference is: Blue Green 24 22 The marginal distribution for number preference is: 2 5 25 21 The conditional distribution of those who chose blue is: Number preference 2 5 18 6 The conditional distribution of those who did not choose blue is: Number preference 2 5 7 15  Don’t confuse similar sounding percentages/proportions: The proportion of American men who are US Senators is very small. The proportion of US Senators who are American men is very large. AP Statistics Chapter Reminders page 2 of 16 Chapters 4 – 5 Reminders  When describing a distribution always mention: shape (symmetric, right skewed , left skewed , uniform bimodal , multimodal ,...) center (median or mean) spread (range or IQR, or standard deviation) any unusual characteristics (gaps, clusters, possible outliers ...) ,  Know how to create and interpret these graphs: dotplots, stemplots (also called stem-and-leaf plots), histograms, relative frequency histograms, boxplots, cumulative frequency graphs (also called ogives). **Title and Label all graphs**  Know how to find the mean and median when given a frequency table.  In a skewed distribution, the mean is farther out in the tail than the median.  Know how to locate the mode(s) on a histogram.     Know how to find the first quartile (Q1) and the third quartile (Q3). Interquartile range (IQR) = Q3 - Q1 Outliers are observations less than Q1 - (1.5)(IQR) or greater than Q3 + (1.5)(IQR) The five number summary is: min, Q1, median, Q3, max  Know how to find variance and standard deviation of a set of data: n s2   (x i 1 i n  x)2 n 1 s   (x i 1 i  x)2 n 1   Typically, use mean and standard deviation when distribution is relatively mound shaped. Use the five number summary when distribution is skewed.  Know which measures are resistant to extreme values. AP Statistics Chapter Reminders page 3 of 16 Chapter 6 Reminders  A density curve is a curve that - is always on or above the horizontal axis - has an area exactly 1 underneath it  Normal distributions are denoted as N(μ, σ) where μ = population mean and σ = population standard deviation  68-95-99.7 rule (also known as the Empirical Rule) : **True for Normal distributions only.** - 68% of observations fall within σ of μ - 95% of observations fall within 2 σ of μ - 99.7% of observations fall within 3 σ of μ  In a normal distribution the points of inflection are at μ - σ and μ + σ .  The larger σ, the flatter the curve -- The smaller σ, the taller the curve  normalcdf(lowerbound, upperbound) gives the proportion of observations between those two points on the standard normal curve N(0,1).  normalcdf(lowerbound, upperbound, mean, standard deviation) gives the proportion of observations between those two points for any normal curve **Notice the c. We do not use the other one.**  Standardized values: z  x   Know how to use the standard normal table  Know how to find the percentile for an observation value AP Statistics  Chapter Reminders page 4 of 16 Know how to get an observation when given a proportion of values under the curve. Example: Find the 90th percentile for the normal distribution with mean 50 and standard deviation 6. -Find z using table backwards or invNorm(prop. to the left) InvNorm(.9)  1.28 -Plug in μ, σ, and z into formula to find x. x x  50 z  1.28   x  57.68  6  Know how to use a normal probability plot to determine if a set of data is likely to have come from a normal distribution. Chapters 7 – 10 Reminders   A response variable measures an outcome of a study (y) An explanatory variable attempts to explain the observed outcomes (x)  Scatterplots: **Title, label and mark axes** If there is an explanatory variable it always goes on the x axis. Two variables are positively associated when large values of one go with large values of the other Two variables are negatively associated when large values of one go with small values of the other    When describing a scatterplot mention: strength (strong, weak, ...), direction (positive or negative), and form (linear, exponential, …) and unusual features.  The correlation coefficient, r -measures the strength of the linear relationship of x and y of two quantitative variables. -has no unit of measure. -is always between -1 and 1. -does not change when the unit of measure changes or if the variables are exchanged.  Correlation … -is strongly affected by extreme observations -helps establish association but not causation -is not a complete description of two variable data -Don’t say ‘correlation’ unless you mean r. AP Statistics Chapter Reminders page 5 of 16  The coefficient of determination, r2, is the fraction of the variation in the values of y that is explained by the change in x.  Know how to graph a scatterplot on the calculator.  Know how to use the calculator to find LSRL when given the data. - Stat-Calc-#8 (x-list, y-list) (L1 and L2 are the default lists) - Turn Diagnostic On to get r in the output (Use catalog to find Diagnostic On)  Find LSRL formulas on formula sheet  The center of gravity ( x, y ) is always on the LSRL  Residuals - Residual = observed y - predicted y = y  yˆ - The sum of the residuals is always 0. - The sum of the residuals squared is always smaller than it would be for any other line.  Residual plot: A scatterplot of the x (explanatory variable) values and the residuals.  Residual plots of “good’ models -have close to the same number of positive and negative values -are scattered with no pattern -have small residuals   An outlier in this context is a point whose residual is an outlier compared to the other residuals. (a point that falls outside the general pattern of the plot) An influential point is a point for which the slope of the LSRL changes a good bit if it is removed. (usually far in the horizontal direction)  Know how to interpret the information in a LSRL computer printout.  Exponential model:  x, log y  log yˆ  b1 x  bo   yˆ  10b0 10b1x  Power functions:  log x,log y  log yˆ  b1 log x  bo  yˆ  10b0  x  b1  You’d have to have a lot of POWER to pick up two logs! Ha ha ha!! AP Statistics Chapter Reminders page 6 of 16  A lurking variable is a variable that has an important effect on the relationship among the variables is a study but is not included among the variables studied.  Extrapolation is the practice of using the regression equation to predict outside the domain of the explanatory values that were used to form the line. **It is not recommended.**  Association does not imply causation  Causation: Changes in x cause changes in y. Chapter 11 Reminders  Simulations You must include the following: 1. State the problem or describe the experiment 2. State assumptions (usually something about probabilities of outcomes and each trial being independent) 3. Explain process in detail (include digit assignment, any ignored digits, stopping rule, what is counted, replacement issues) 4. Simulate “many” times 5. State conclusions. Chapter 12 Reminders  Types of samples: Simple Random Sample (SRS): subjects are selected without replacement, every individual has an equal chance of being chosen and every subgroup has an equal chance of being the subgroup chosen Voluntary response sample: people choose themselves by responding Convenience sample: chooses people easiest to reach Probability sample:gives each member of the population a known chance (>0) to be selected. Stratified random sample: divide population into groups of similar individuals, then choose a separate SRS from each group, combine them to form the full sample Multistage cluster sample: Example: 1. Choose from all the counties in the US, choose towns in each chosen county, 3. choose subdivisions within each town, 4. choose households within each subdivision Quota: Subjects are chosen around categories(age, gender,…) according to known demographic information Systematic sample: Example: Choose #1, #51, #101, …  A sampling frame is the actual list of possible subjects. Ideally, the sampling frame should include everyone in the population. AP Statistics     Chapter Reminders page 7 of 16 The placebo effect occurs when subjects have some type of different response (improvement) that is not due to the treatment itself – maybe thinking they are receiving a treatment causes some improvement A census is a method of collecting data from all members of the population. A study is biased if it systematically favors certain outcomes. Types of bias: - Undercoverage bias: when some groups are left out of the sampling process - Nonresponse bias: when someone refuses to participate - Response bias: people not giving reliable responses - Measurement bias: the way measurements are taken favors particular results Chapter 13 Reminders   Observational studies observe individuals or measure variables of interest but do not attempt to influence responses. Experiments - impose some type of treatment - are the only source of fully convincing data when trying to determine cause and effect.  Principles of experimental design: 1. Control of effects of lurking variables 2. Randomization 2. Replication  Types of experimental design - block design: similar to the stratified sampling design - matched pairs design: data from two samples are paired, differences are found, one sample t-procedures are used  Terms - factor: the explanatory variables - treatment: the specific experimental condition - experimental units: what the treatment is imposed on - subject: human experimental units - double blind: anyone working directly with the units (and obviously the units themselves) are unaware which group(control or treatment) the units are in - confounding: Two variables are confounded if we can’t separately identify their effects on the response variable. AP Statistics Chapter Reminders page 8 of 16 Chapter 14 – 15 Reminders  Random does not mean haphazard  Permutations:(order matters) P(n, r )   Combinations:(order does not matter) C (n, r )   Probability:  Tree diagrams: Example: Roll a die then toss a coin. n! (n  r )! n! r! (n  r )! number of favorable outcomes number of outcomes P(1,H) = P(1|H) =  Terms: - complement – A’ or Ac denote the complement of A - union – ‘or’, ‘ᴜ’ - intersection – ‘and’, ‘∩’ - disjoint (mutually exclusive) – can’t occur together - conditional event – P(B|A) means the probability of B given A - independent – A and B are independent if P(A) = P(A|B) = P(A|B’) - sample space – set of all possible outcomes AP Statistics Chapter Reminders page 9 of 16  P(A ∩ B) = P(A) * P(B) if and only if A and B are independent.  P  A | B  P  A  B P( B)  Two way table probabilities: AP Statistics students were asked to select their favorite from each of the following lists: {mountains, beach} and {fall, spring}. The results are described below: Mountains Beach Fall 2 1 Spring 4 2 P(fall) = 3 1  9 3 P(fall | mountains) = 2 1  4 2 (Completely ignore the beach column.) P(mountains | fall) = 2 3 (Completely ignore the spring row.) Chapter 16 Reminders  mean of a random variable X:  x (population mean)  mean of several actual values of X: x (sample mean)  The mean is also called the expected value.  Mean and variance of a discrete random variable: X Probability x1 p1 x2 p2 … xn pn μX = x1p1 + x2p2 + ... xnpn σX =  ( x1  1 ) 2 p1  ( x 2   2 ) 2 p 2    ( x n   n ) 2 p n Law of Large Numbers: As the number of observations increases, x approaches μx (and stays that close) AP Statistics  Chapter Reminders page 10 of 16 Rules for means μa + bX = a + b μx μX + Y = μx + μY  Rules for variances σ2a + bX = b2σ2X (X and Y must be independent): σ2X + Y = σ2X + σ2Y σ2X - Y = σ2X + σ2Y  Standard deviations do not add, variances do (even with subtraction) Chapter 17 Reminders  The Binomial Setting B Binary outcomes - just two possibilities “success” and “failure” I Independence - the n observations are independent N Number of observations is fixed S Same probability of a success for each trial  Binomial distribution: B(n,p) where n = number of trials, p = probability of a success  The variable of interest, X, is the number of successes in the n trials.  The probability that X = k, P(X=k) is obtained by binompdf(n, p, k)  The probability that X ≤ k, P(X ≤ k) is obtained by binomcdf(n, p, k)  Mean of a Binomial Random Variable μ = np  Standard deviation of a Binomial Random Variable:   np(1  p)  The Geometric Setting 1. just two possibilities “success” and “failure” 2. Independence - the observations are independent 3. Same probability of a success for each trial  The variable of interest, X, is the number of trials necessary to get first success.  The probability that X = k, P(X=k) is obtained by geometpdf(p, k) AP Statistics Chapter Reminders page 11 of 16  The probability that X ≤ k, P(X≤k) is obtained by geometcdf(p, k)  Mean of a Geometric Random Variable μ = 1 p Chapter 18 Reminders  A parameter describes a population.  A statistic is a number obtained from a sample.  A sampling distribution of a statistic is the distribution of values taken by the statistic in many samples of the same size from the same population.  A statistic used to estimate a parameter is unbiased if the mean of its sampling distribution equals the parameter.  p represents a population proportion  p̂ represents a sample proportion   The sampling distribution of p̂ - is close to normal when n is large (np ≥ 10, n(1-p) ≥ 10) p 1  p  - has mean = p and standard deviation = n μ represents a population mean  x represents a sample mean  The sampling distribution of x - is normal if X has a normal distribution - is close to normal if n ≥ 30, regardless of distribution of X. - has mean = μ and standard deviation =   n The Central Limit Theorem: As the sample size increases, the sampling distribution of x approaches a normal distribution – regardless of the distribution of X. AP Statistics Chapter Reminders page 12 of 16 Chapter 19 – 25 Reminders Inference Overview  A confidence interval is a method of estimating a parameter.  Two parts to a confidence interval: the interval and the confidence level (denoted by C)  Form of a confidence interval: estimate  margin of error estimate  (# of standard deviations on either side)(standard deviation)  Margin of error decreases - when n increases or - when confidence level decreases  Know how to find a sample size necessary for a given margin of error and a given confidence level   Confidence intervals are used to estimate a parameter Significance tests are used to assess evidence for a particular claim  A significance test does the following – Suppose the null hypothesis is true. With that assumption, is our sample outcome unusual?  A p-value is the probability that we would get by chance a result at least as extreme as our sample result.   Small p-values give evidence against Ho Large p-values fail to give evidence against Ho – they do not give evidence of anything.  A significance level, α, is sometimes used as a decisive boundary for rejecting H o and failing to reject Ho (α = .1, α = .05 and α = .01 are typical values)  Statistical significance does not mean ‘important’, it means ‘not likely to occur by chance’  Inference (Significance tests and confidence intervals) are based on the laws of probability  Randomization ensures the probability laws apply.     Type I Error – the null hypothesis is true and it is rejected Probability of a Type I error = α (the significance level) Type II Error – the null hypothesis is false, but not rejected Probability of a Type II error = β  can be computed if you have a specific alternative in mind AP Statistics   Chapter Reminders page 13 of 16 The Power of the test is the probability that the null hypothesis is rejected given that it is false. Power of the test = 1 – β Quantative Data - When population standard deviation, σ, is known: one sample z interval or one sample z test Confidence Interval: x  z * x n Assumptions: x  z  Test statistic: x n  SRS Pop. is normal OR n ≥ 30 Pop. size ≥ 10n When we use s instead of σ, in the test statistic, we get a t-statistic instead of a z-statistic t-distributions - are similar in shape to normal distributions - have a larger variance than the normal distribution - approach a normal curve as the degrees of freedom increase Quantative Data - When population standard deviation, σ, is NOT known: one sample t interval or one sample t test Confidence Interval: x  t Test statistic: t   x  sx n * sx n d. f  n 1 Assumptions: SRS Pop. is normal OR n ≥ 30 Pop. size ≥ 10n d. f .  n 1 The t statistic for comparing two means: t  x1  x 2 does not actually have a t-distribution, it is s12 s 22  n1 n 2 close if we estimate the degrees of freedom with a complicated formula (that is what the calculator does) or we could use the conservative estimate of min{n1 – 1, n2 –2} AP Statistics Chapter Reminders page 14 of 16 Quantative Data - When comparing two means (with population standard deviation NOT known: two sample t interval or two sample t test Confidence Interval:  x1  x2   t * s12 s2 2  n1 n2 d . f .  min  n1  1, n2  1 Test statistic: t x1  x2 s12 s2 2  n1 n2 d . f .  min  n1  1, n2  1 Categorical Data - one proportion z interval or one proportion z test pˆ 1  pˆ  Confidence Interval: pˆ  z * n Test statistic:  z pˆ  po po 1  po  n Assumptions: 2 SRSs distinct populations Independent samples Normal populations OR n1 + n2 ≥ 40 Pop. 1 ≥ 10n1 Pop. 2 ≥ 10n2 Assumptions: SRS Pop. size ≥ 10n npˆ  10AND n 1  pˆ   10 Assumptions: SRS Pop. size ≥ 10n npo  10 AND n 1  po   10 Choosing a sample size for a specific margin of error p(1  p) - margin of error = z * n - Since we don’t know p. We use a guess from a previous study or the conservative guess of 0.5 AP Statistics Chapter Reminders page 15 of 16 Categorical Data – When comparing two proportions: two proportion z interval or two proportion z test Confidence Interval: pˆ 1  pˆ1  pˆ 2 1  pˆ 2  Assumptions:   pˆ1  pˆ 2   z* 1 n1 n2 2 SRSs 2 distinct populations Test statistic: pˆ1  pˆ 2 1 1 pˆ 1  pˆ      n1 n2  Population 1 ≥ 10 Population 2 ≥ 10 pˆ  overall combined proportion Chapter 26 Reminders  Chi-square test for goodness of fit - used to see how well an observed distribution fits a hypothesized distribution - Can be done on the calculator if OBSERVED values are in L1 and EXPECTED values are in L2 - Some calculators have the GOF test under STATS  TESTS, others have it under the PROGRAMS menu  Chi-square test for independence (same process as the test for homogeneity) - used to determine if two categorical variables recorded for ONE SAMPLE are independent (or ‘associated’ … or ‘related’) - Can be done on the calculator if the TWO WAY TABLE is entered as a MATRIX - Expected counts do not need to be entered - Use the Chi-square test under STATS  TESTS  Chi-square test for homogeneity (same process as the test for independence) - Used to determine if a distribution is the same across the categories for different groups (TWO different SAMPLES) - Can be done on the calculator if the TWO WAY TABLE is entered as a MATRIX - Expected counts do not need to be entered - Use the Chi-square test under STATS  TESTS AP Statistics Chapter Reminders page 16 of 16 Chi-square test for Goodness of Fit Test statistic: (obs - exp)2 2   , df  # of categories  1 exp Assumptions: Random selection (All) expected values ≥ 5 Chi-square test for Independence and Chi-square test for Homogeneity Test statistic: Assumptions: 2 (obs exp) Random selection –necessary 2   , df   # of rows  1 # of columns  1 only if generalizing results exp (All) expected values ≥ 5

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 1 Reminders