Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Final Review Dr. Joseph Brennan Math 148, BU Dr. Joseph Brennan (Math 148, BU) Final Review 1 / 52 Part I Review the Vocabulary of Design for potential multiple-choice questions. Observational Studies: A study based on observing individuals and measuring responses but does not attempt to influence responses. We reviewed three major types. Because of potential hidden confounding factors, an observational study may only establish an correlation between the treatment and the response, but not the cause-effect relationship. !Correlation 6= Causation! CAUSATION can be established only from well-designed experimental studies which have better control over the confounding factors. A randomized controlled experiment has the following principles: 1 Control; 2 Randomization; 3 Repetition; 4 Blinding. Dr. Joseph Brennan (Math 148, BU) Final Review 2 / 52 Part II A qualitative variable places an individual into one of several groups or categories. A quantitative variable takes numerical values for which arithmetic operations (such as adding and averaging) make sense. There are 3 main types of the histograms: Density histogram displays percents (or proportions) per unit width in the vertical direction. In a frequency histogram the height of each bar is equal to the actual count of observations in the class interval. In a relative frequency histogram the height of each bar is equal to the proportion or percentage of observations in the class interval. Dr. Joseph Brennan (Math 148, BU) Final Review 3 / 52 The shape of a distribution: Number of modes: Unimodal, Bimodal, Multi-modal. Symmetry and Skew. The center of a distribution: Mode: The number that occurs most frequently in a given data. Mean: The numerical center of data. Median: The midpoint of a distribution. The spread of a distribution: Standard Deviation: s measures the spread about the mean. The standard deviation is connected to only the mean among center measures. v u n u1 X 2 s=t (xi − x̄) n i=1 Interquartile Range: Q3 − Q1 where Q3 is the third quartile (75th percentile) and Q1 is the first quartile (25th percentile). Dr. Joseph Brennan (Math 148, BU) Final Review 4 / 52 Effects of a General Linear Transformations. If we act upon a data set by any linear transformation: xnew = bx + a, the change in spread and center are recorded: Mean 1st Quartile x̄new = bx̄ + a Q1, new = bQ1 + a Median 3rd Quartile x̃new = bx̃ + a Q3, new = bQ3 + a Standard Deviation Interquartile Range snew = |b| · s IQRnew = |b| · IQR where | · | denotes absolute value. Dr. Joseph Brennan (Math 148, BU) Final Review 5 / 52 iClicker Which of the following statements is false? A) If a data histogram has a long right tail, then the median is less than the average. B) An observational study cannot establish causation due to uncontrolled confounders. C) The mean and standard deviation are not robust to outliers. D) The sum of the column areas in a frequency histogram is one or 100%. E) All of the above are true. D The sum of the column areas in a density histogram is one or 100%, while the sum of the column heights in a frequency histogram is one or 100%. Dr. Joseph Brennan (Math 148, BU) Final Review 6 / 52 Normal Curves Properties of the (standard) normal curve: Symmetric about zero, Unimodal, The mean, median, and mode are equal, Bell-shaped, The mean µ = 0 and the standard deviation σ = 1, The area under the whole normal curve is 100% (or 1, if you use decimals). Dr. Joseph Brennan (Math 148, BU) Final Review 7 / 52 Empirical Rule Figure : Normal curve and percentage of observations under it. z-Score: The transformation of data into standard units, normal approximation: observation − mean z= standard deviation Dr. Joseph Brennan (Math 148, BU) Final Review 8 / 52 iClicker Bias in statistical measurements A) generally varies from measurement to measurement. B) is a systematic error in all measurements and effects all measurements the same way. C) is caused by random measurement errors. D) can generally be ignored because of its random natures. E) can be readily detected by looking at the measurements themselves. B Dr. Joseph Brennan (Math 148, BU) Final Review 9 / 52 Measurement Errors 1 Chance Error: Chance error is present in every measurement and is difficult to determine. Chance errors are random and have equal probability of undervaluing a measurement as overvaluing one. If a n measurements have been taken with a mean of x̄ and a standard deviation of s, then we say: ”The next measurement will be x̄ give or take s.” 2 Outliers: 3 Bias (Systematic Error): Observations outside of 3 standard deviations are considered to be extreme and are treated as potential outliers. A phenomenon affecting all measurements the same way, pushing them in the same direction. Bias, unlike chance error, is not detectable through multiple measurements. Dr. Joseph Brennan (Math 148, BU) Final Review 10 / 52 Correlation Analysis Correlation Analysis The establishment of an association (or correlation) between two variables and assessing its strength. Independent Variable: The variable that is expected to influence the other variable. Dependent Variable: The variable that is expected to be influenced by the other variable. Variables with a linear relationship have two classifications: Positive Association: both variable increase or decrease simultaneously. Negative Association: when one of the variables increases the other decreases. Dr. Joseph Brennan (Math 148, BU) Final Review 11 / 52 The Correlation Coefficient The correlation coefficient, r , is a descriptive statistic which measures the direction and strength of the linear relationship between two quantitative variables. Suppose that we have data on variables x and y for n individuals. Let the mean of x values be x̄ and let the mean of y values be ȳ . Let the standard deviation of x values be sx and let the standard deviation of y values be sy . The sample correlation coefficient r between x and y is computed as n 1X r= n i=1 where zx,i = xi − x̄ sx n yi − ȳ 1X = zx,i · zy ,i sy n (1) i=1 xi − x̄ yi − ȳ and zy ,i = are the z - scores for xi and yi , sx sy respectively. Dr. Joseph Brennan (Math 148, BU) Final Review 12 / 52 Properties of the Correlation Coefficient (1) The sign of the correlation coefficient r indicates the direction of the relationship between the variables. (2) The correlation coefficient is just a number, it has no units of measurement. (3) The correlation r is always a number between −1 and 1. The closer r to 1 or -1 is, the stronger the linear association between x and y . ¯ (4) Correlation only measures the strength of a LINEAR relationship between two variables. Correlation DOES NOT describe curved relationships between variables, no matter how strong they are! (5) The correlation coefficient is NOT resistant to outliers. (6) The correlation coefficient r is symmetric. Dr. Joseph Brennan (Math 148, BU) Final Review 13 / 52 Interpreting the Correlation Coefficient Value of |r | 0.0 - 0.2 0.2 - 0.4 0.4 - 0.7 0.7 - 0.9 0.9 - 1.0 Interpretation Very weak to negligible correlation Weak, low correlation (not very significant) Moderate correlation Strong, high correlation Very strong correlation The correlation coefficient only measures the strength of LINEAR relationships. In order to accurately visually guess r , construct a scatter diagram such that the vertical standard deviations cover the same distance on the page as the horizontal standard deviations. A coefficient r = 0.80 does NOT mean that 80% of the points are tightly clustered around a line, NOR does it indicate twice as much linearity as r = 0.40. Dr. Joseph Brennan (Math 148, BU) Final Review 14 / 52 Regression Analysis Regression Analysis The creation of a mathematical model or formula that relates the values of one variable to the values of the other. Assume that y and x are the dependent and independent variables of a study. Denote: ŷ to be the predicted (by regression) value of y for a given x, r to be the correlation coefficient between x and y , ȳ and sy the average and standard deviation for the dependent (response) variable y , x̄ and sx the average and standard deviation for the independent (explanatory) variable x. Regression Line: Regressing y on x x − x̄ ŷ = ȳ + r · sy · sx ŷ = ȳ + r · sy · zx zˆy = r · zx Dr. Joseph Brennan (Math 148, BU) Final Review 15 / 52 Regression Analysis The best use of the regression line is to estimate the AVERAGE value of y for a given value of x. The correlation coefficient r measures the amount of scattering of points about the regression line. Regression Effect describes the tendency of individuals with extreme values to retest towards the mean. On average the top group will value lower on a second experiment and on average the bottom group will value higher on a second experiment. The Regression Fallacy is a fallacy by which individuals conjecture a cause for an extreme to become average. Dr. Joseph Brennan (Math 148, BU) Final Review 16 / 52 iClicker A university has made a statistical analysis of the relationship between Math SAT scores and first year GPA’s for students who complete the first year. The correlation coefficient is calculated as r = 0.6. SATM Score GPA Mean 550 2.6 Standard Deviation 80 0.6 If someone enters with an SATM score of 650, what would you estimate their first-year GPA to be? A) 4.0 B) 3.3 C) 3.0 D) 2.8 E) 2.6 Dr. Joseph Brennan (Math 148, BU) Final Review 17 / 52 iClicker Solution As we are predicting GPA, we let GPA be y and SATM x. To find our regression line we first find the slope m=r· sy 0.6 = (0.6) · = 0.0045 sx 80 Then the y-intercept b = y − m · x = 2.6 − (0.0045) · 550 = 0.125 Finally we obtain our regression line ŷ = 0.0045x + 0.125 We are given an x value of 650 and the resulting prediction: ŷ = 0.0045 · (650) + 0.125 = 3.0 Dr. Joseph Brennan (Math 148, BU) Final Review 18 / 52 RMS Error The R.M.S. Error measures how far a typical point will be from the regression line. Taking the spread of points in a small vertical strip of the scatter plot, the R.M.S. Error is similar to the standard deviation and the regression line is similar to the mean. p R.M.S. Error: = 1 − r 2 · sy Dr. Joseph Brennan (Math 148, BU) Final Review 19 / 52 Homoscedastic and Heteroscedastic Relationships There are two broad generalizations of data with a linear relationship: Homoscedastic: Scatter diagrams which forma a true football shape. The standard deviation of y observations on vertical strips are approximately the equivalent. We may assume that the standard deviation is the R.M.S. Error. Heteroscedastic: Scatter diagrams with unequal vertical strip standard deviations. The R.M.S. Error in this case gives an average error across all the vertical strips. For a given x, the R.M.S. Error should not be used as an estimate of the standard deviation of the corresponding y -values. Dr. Joseph Brennan (Math 148, BU) Final Review 20 / 52 Probability Probability theory deals with studies where the outcomes are not known for sure in advance. Usually, there are many possible outcomes for a study, we just do not know which particular outcome we will observe. Sample Space: The set of all possible outcomes of a study. The sample space of a study is denoted by S. Every repetition of a study, or a trial, produces a single outcome. Usually an outcome is computed from the values of the response variables. An event is a set of outcomes from the sample space S. We say that an event has occurred if ANY of the outcomes that constitute it occur. Dr. Joseph Brennan (Math 148, BU) Final Review 21 / 52 Special Events Certain Event: An event which is guaranteed to happen at every repetition of the experiment. A certain event is equal to the sample space. Impossible Event: An event which never can occur. Mathematically an impossible event is written as ∅, the empty set. Opposite Event: An event A is opposite to A if it happens whenever A does not happen. Dr. Joseph Brennan (Math 148, BU) Final Review 22 / 52 Mutually Exclusive Events The set of common outcomes is an event which is called the intersection of events A and B. We will denote the intersection of events A and B by A and B Mutually Exclusive Events: Two events A and B are mutually exclusive if they both cannot happen at the same time. The intersection A and B is the empty set and P(A and B) = 0. Dr. Joseph Brennan (Math 148, BU) Final Review 23 / 52 Independent Events Two events A and B are independent of each other if knowing that one event occurs does not change the probability that the other event occurs. For independent events A and B, P(A and B) = P(A) · P(B). It can be established that events A and B are independent if P(A) = P(A|B). Union: The union of events A and B is the event which happens when either event A or event B or both happen. The union of two events is expressed as A or B Dr. Joseph Brennan (Math 148, BU) Final Review 24 / 52 Rules of Probability The following rules simplify many probability computations: Rule 1: The probability P(A) of any event A satisfies 0 ≤ P(A) ≤ 1. In other words, the probability of any event is between 0 and 1. Rule 2: If S is the sample space for an experiment, then P(S) = 1. The probability of a certain event is 1. Rule 3: The probability of an impossible event is 0. Hence, P(∅) = 0. Rule 4: If events A and B are independent, then the probability that they both happen is the product of their probabilities: P(A and B) = P(A) · P(B) Dr. Joseph Brennan (Math 148, BU) Final Review 25 / 52 Rules of Probability Rule 5: For any events A and B, the probability of their union is equal to the sum of their individual probabilities minus the probability of their intersection: P(A or B) = P(A) + P(B) − P(A and B) Subtracting the probability of the intersection is needed to avoid double counting. A special case: if A and B are disjoint events, then the probability of their union is the sum of individual probabilities: P(A or B) = P(A) + P(B) Rule 6: For any event A P(Ā) = 1 − P(A) Dr. Joseph Brennan (Math 148, BU) Final Review 26 / 52 Conditional Probability Let A and B be events. If the events are not independent, then the occurrence of B alters the probability that A will occur. The conditional probability of event A given that event B has happened is denoted P(A|B). P(A and B) P(A|B) = P(B) We will consider two important sampling schemes: Sampling with Replacement: Independent Events Sampling without Replacement: Dependent Events The events A1 , A2 , . . . , An are not necessarily independent. P(A1 and A2 and A3 and A4 and . . .) = = P(A1 )·P(A2 |A1 )·P(A3 |A1 and A2 )·P(A4 |A1 and A2 and A3 )·. . . Dr. Joseph Brennan (Math 148, BU) Final Review 27 / 52 iClicker Consider a standard 52 deck of cards. What is the probability of being dealt a card that is either red or even? A) 0% B) 27% C) 58% D) 69% E) 75% The even cards are 2, 4, 6, 8, and 10 (5 per suit or 20 per deck). P( Red or Even ) = P(R) + P(E ) − P(R and E ) = Dr. Joseph Brennan (Math 148, BU) 26 20 10 36 + − = = 69% 52 52 52 52 Final Review 28 / 52 iClicker Consider a standard 52 deck of cards. What is the probability of being dealt (without replacement) three cards with at least one being a King. A) 0% B) 14% C) 22% D) 78% E) 84% Let K be the event where at least one card is a King. The opposite event K̄ is the event where none are Kings. There are 4 Kings per deck: P(K ) = 1 − P(K̄ ) = 1 − Dr. Joseph Brennan (Math 148, BU) 48 47 46 · · = 22% 52 51 50 Final Review 29 / 52 Permutations and Combinations ORDER In order to count the number of possible ways to choose, without replacement, k objects from a collection of n distinct objects we must be specific as to we acknowledge order. A permutation is a choice where order matters. A combination is a choice where order does not matter. The only difference between a permutation and a combination is order. This leads to very similar counting formulas: n! n n! = n Pk = (n − k)! k k! · (n − k)! Recall: An event E in the sample space S has probability P(E ) = Dr. Joseph Brennan (Math 148, BU) number of outcomes in E number of outcomes in S Final Review 30 / 52 The Law of Averages and Random Variables Law of Averages: If an experiment is independently repeated a large number of times, the percentage of occurrences of a specific event E will be the theoretical probability of the event occurring, but of by some amount - the chance error. Random Variable: An unknown subject to random change. Often a random variable will be an unknown numerical result of study. A random variable has a numerical sample space where each outcome has an assigned probability. There is not necessarily equal assigned probabilities. Any random variable X , discrete or continuous, can be described with / A probability distribution. A mean and standard deviation. Dr. Joseph Brennan (Math 148, BU) Final Review 31 / 52 Discrete Random Variable: σ Standard Deviation: The standard deviation σ of a discrete random variable is found with the aid of µ: q σ = (x1 − µ)2 p1 + (x2 − µ)2 p2 + . . . (xk − µ)2 pk v u k uX = t (xi − µ)2 pi i=1 When there are just two numbers, x1 and x2 , in the distribution of X the distribution’s standard deviation, σ, can be computed by using the following short-cut formula: √ σ = |x1 − x2 | p1 p2 where pi is the probability of xi . Dr. Joseph Brennan (Math 148, BU) Final Review 32 / 52 Box Models Box Model: A model framing a statistical question as drawing tickets (with or without replacement) from a box. The tickets are to be labeled with numerical values linked to a random variable. The expected value of a random variable is the average of the tickets occupying the box model. The standard deviation of a random variable is the standard deviation of the tickets. Dr. Joseph Brennan (Math 148, BU) Final Review 33 / 52 The Sum of n Independent Outcomes When the same experiment is repeated independently n times, the following is true for the sum of outcomes: The expected value of the sum of n independent outcomes of an experiment: nµ The standard error of the sum of n independent outcomes of an experiment: √ nσ The second part of the above rule is called the the Square Root Law. Note that the above rule is true for any sequence of independent random variables, discrete or continuous! Dr. Joseph Brennan (Math 148, BU) Final Review 34 / 52 The Binomial Setting and Distribution 1 There are a fixed number of n of repeated trials. 2 The trials are independent. In other words, the outcome of any particular trial is not influenced by previous outcomes. 3 The outcome of every trial falls into one of just two categories, which for convenience we call success and failure. 4 The probability of a success, call it p, is the same for each trial. 5 It is the total number of successes that is of interest, not their order of occurrence. Let X denote the number of successes under the binomial setting. The probabilities of values of X are computed as n k P(X = k) = p (1 − p)n−k , k = 0, 1, 2, . . . , n. (2) k Dr. Joseph Brennan (Math 148, BU) Final Review 35 / 52 Binomial Distribution and Normal Curves Let X be a binomial random variable with parameters n (number of trials) and p (probability of success in each trial). Then the mean and standard deviation of X are µ = np, p σ = np(1 − p). NORMAL APPROXIMATION for BINOMIAL COUNTS When n is large, the distribution of X is approximately normal. X is approximately normalpwith mean np and standard deviation np(1 − p). As a rule, we will use this approximation for values of n and p that satisfy np ≥ 10 and n(1 − p) ≥ 10. Dr. Joseph Brennan (Math 148, BU) Final Review 36 / 52 iClicker The binomial formula can be used to calculate A) The probability of getting exactly 4 heads in 10 tosses of a coin. B) The probability of having exactly 4 diamond cards among the top 10 cards in a well shuffled standard deck of cards. C) The probability of rolling exactly 7 even numbers in 10 die rolls. D) Both (A) and (B). E) Both (A) and (C ). E Dr. Joseph Brennan (Math 148, BU) Final Review 37 / 52 iClicker What is the probability of rolling exactly four 1’s in ten die rolls. A) 0% B) 5% C) 10% D) 16% E) 50% We have n = 10 independent trials with success having probability which we expect exactly k = 4 successes. Note that 10 4 = 210: P(k = 4) = Dr. Joseph Brennan (Math 148, BU) 1 6 from 4 6 5 10 1 · · = 5% 4 6 6 Final Review 38 / 52 The Central Limit Theorem (CLT) The Central Limit Theorem: When drawing at random with replacement from a box, the probability histogram for the sum will approximately follow the normal curve, even if the contents of the box do not. The larger the number of draws, the better the normal approximation. The sample size n should be at least 30 (n ≥ 30) before the normal approximation can be used. For symmetric population distributions the distribution of x̄ is usually normal-like even at n = 10 or more. For very skewed populations distributions larger values of n may be needed to overcome the skewness. Dr. Joseph Brennan (Math 148, BU) Final Review 39 / 52 Parameters & Statistics Parameter: A numerical fact about a population. Statistic: A numerical fact about a sample. An investigator knows a statistic and wants to know a parameter. Probability Methods: Sampling techniques which implements an objective chance process to choose subjects from the population, leaving no discretion to the interviewer. It is possible to compute the chance that any particular individual in the population will get into the sample. Simple Random Sampling: A sampling technique where selection of individuals is equally likely and drawing for the sample is performed without replacement. We discussed three types of bias which arise in sampling, be familiar with these terms for true/false and multiple choice questions. Dr. Joseph Brennan (Math 148, BU) Final Review 40 / 52 Variable Type Given n draws with replacement from a box with Mean µ (average for quantitative and percent for qualitative). Standard Deviation σ. Expected Value: Standard Error: Dr. Joseph Brennan (Math 148, BU) Sum n×µ √ n×σ Average µ √ σ/ n Final Review Number n×µ √ n×σ Percent µ √ σ/ n 41 / 52 iClicker A box contains four tickets with a 0 and six tickets with a 1. A sample of 100 draws are made with replacement from the box. There are fifty-six 1’s among the draws. The chance error is and the standard error is . A) −4 and 5. B) −4 and 0.05. C) 6 and 5. D) 6 and 0.05. E) −6 and 5. The average ticket value is µ = 0.6. After 100 draws the expected value is 60 but the observed value is 56. chance error = observed − expected = −4 √ The standard deviation of the box σ = 0.6 · 0.4 ≈ 0.5. The standard error for number is √ n × σ = 10 × 0.5 = 5 Dr. Joseph Brennan (Math 148, BU) Final Review 42 / 52 The Correction Factor When drawing without replacement, to get the exact SE you must multiply by the correction factor: s number of objects in box − number of draws number of objects in box − 1 When the number of tickets in the box is large relative to the number of draws, the correction factor is nearly one. Dr. Joseph Brennan (Math 148, BU) Final Review 43 / 52 Normal Curve for SE for Averages and Percentages Suppose 1,000 draws are made with replacement from a box whose average ticket value is 200. The standard error for averages is found to be 10. There is about a 68% chance for the average of the 1, 000 draws to be in the range 190 to 210. Suppose 1,000 draws are made with replacement from a 0 − 1 box whose percent of 1’s was 15%. The standard error for percent is found to be 0.5%. There is about a 68% chance for the percentage of successful draws of the 1, 000 draws to be in the range 14.5% to 15.5%. Dr. Joseph Brennan (Math 148, BU) Final Review 44 / 52 Tests of Significance Confidence Intervals: Intervals on the number line which are used to estimate the population parameter (µ) from the sample statistic (x̄). Tests of Significance: Tests intending to assess the evidence provided by the data in favor of some claim about a population parameter (µ). A significance test is a formal procedure which uses the data to choose between two competing hypotheses, the null hypothesis and the alternative hypothesis. Null Hypothesis: The basic or primary statement about the parameter (µ). If we reject H0 when in fact H0 is true, this is TYPE I error. If we do not reject H0 when in fact Ha is true, this is TYPE II error. Dr. Joseph Brennan (Math 148, BU) Final Review 45 / 52 One Sample z - Test for µ One Sample z-Test for µ: A test to determine the validity of a statement concerning the mean µ based upon a single sample. STEP 1: State the hypotheses. H0 : µ = µ 0 , As a default, the alternative Ha is two-sided. A problem may specify whether Ha is left-sided or right-sided. STEP 2: Choose the significance level α. Assume α = 0.05 unless otherwise stated. STEP 3: Calculate the test statistic. z= Dr. Joseph Brennan (Math 148, BU) x̄ − µ0 √σ n Final Review . 46 / 52 One Sample z - Test for µ STEP 4: Compute the P - value. The formula for the P-value depends on the alternative hypothesis. Recall that the P-value is the probability that a test statistic would take a value more extreme than of that actually observed. STEP 5: Make a decision: Reject H0 if P − value < α. Do not reject H0 is P − value > α. STEP 6: State the conclusion in terms of the alternative hypothesis. If you rejected H0 , say ”there is enough evidence at α level that state your alternative hypothesis in words here”. If you did not reject H0 , say ”there is not enough evidence at α level to say that state your alternative hypothesis in words here”. Dr. Joseph Brennan (Math 148, BU) Final Review 47 / 52 Confidence Intervals Chart To find a confidence interval for the population mean µ with confidence level C from a sample of size n with mean x̄ and sample standard deviation s: (1) If the population standard deviation σ is known, and either population distribution is normal, or sample size is large (n ≥ 30) zC × σ zC × σ x̄ − √ , x̄ + √ n n (2) If the population standard deviation σ is unknown then Case 1: (n < 30 and population distribution is normal) # " tCn−1 × s tCn−1 × s x̄ − √ , x̄ + √ n−1 n−1 Case 2: (n ≥ 30 and population distribution is normal) zC × s zC × s x̄ − √ , x̄ + √ n n Dr. Joseph Brennan (Math 148, BU) Final Review 48 / 52 Chart for Tests of Significance for µ We have two types of test statistic for the null hypothesis (H0 : µ = µ0 ). (1) If the sample size is large, (n > 30), then test statistic = x̄ − µ0 √s n and use the normal table to calculate a P-value. (2) If the sample size is small, n ≤ 30, and the population distribution is roughly normal x̄ − µ0 test statistic = s √ n and use the t-table with n − 1 degree of freedom to calculate a P-value. Dr. Joseph Brennan (Math 148, BU) Final Review 49 / 52 iClicker Other things being equal, which of the following P-values is best for the null hypothesis? A) 0.1% B) 4% C) 17% D) 32% E) 48% E Dr. Joseph Brennan (Math 148, BU) Final Review 50 / 52 iClicker An automobile manufacturer has claimed that their car averages 30 mpg (miles per gallon) on the highway. A simple random sample of 17 such cars yields an average gas mileage of 26 mpg with a standard deviation of 8 mpg. Set up a suitable hypothesis test to determine whether the difference between our sample and the manufacturer’s claim is real or chance error. A) The results are highly statistically significant. B) The results are statistically significant. C) The results are not statistically significant. (H0 : µ = 30) (Ha : µ 6= 30) As we do not know the population standard deviation and the sample size is under 30 we use the t-distribution with 16 degrees of freedom. 26 − 30 Test Statistic: = = −2 8 √ P-Value: 2 · P(t 16 16 ≥ 2) > 2 · .025 ≥ 5 Therefore, the results are not statistically significant. Dr. Joseph Brennan (Math 148, BU) Final Review 51 / 52 iClicker An automobile manufacturer has claimed that his car averages 30 mpg (miles per gallon) on the highway. A simple random sample of 17 such cars yields an average gas mileage of 26 mpg with a standard deviation of 8 mpg. Calculate a 99% confidence interval for the average gas mileage for this type of automobile. A) [25, 27] B) [23, 29] C) [21.8, 30.2] D) [20.2, 31.8] E) [19.4, 33.2] We are still using t 16 , but we are interested in a 99% confidence interval or the 99/2 + 50 = 99.5th percentile. 16 t0.005 = 2.921 The 99% confidence interval: 2.921 · 8 2.921 · 8 [26 − √ , 26 + √ ] = [20.2, 31.8] 16 16 Dr. Joseph Brennan (Math 148, BU) Final Review 52 / 52