Download Review - People

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Final Review
Dr. Joseph Brennan
Math 148, BU
Dr. Joseph Brennan (Math 148, BU)
Final Review
1 / 52
Part I
Review the Vocabulary of Design for potential multiple-choice questions.
Observational Studies: A study based on observing individuals and
measuring responses but does not attempt to influence responses. We
reviewed three major types.
Because of potential hidden confounding factors, an observational study
may only establish an correlation between the treatment and the
response, but not the cause-effect relationship.
!Correlation 6= Causation!
CAUSATION can be established only from well-designed experimental
studies which have better control over the confounding factors. A
randomized controlled experiment has the following principles:
1 Control;
2 Randomization;
3 Repetition;
4 Blinding.
Dr. Joseph Brennan (Math 148, BU)
Final Review
2 / 52
Part II
A qualitative variable places an individual into one of several groups or
categories.
A quantitative variable takes numerical values for which arithmetic
operations (such as adding and averaging) make sense.
There are 3 main types of the histograms:
Density histogram displays percents (or proportions) per unit width
in the vertical direction.
In a frequency histogram the height of each bar is equal to the
actual count of observations in the class interval.
In a relative frequency histogram the height of each bar is equal to
the proportion or percentage of observations in the class interval.
Dr. Joseph Brennan (Math 148, BU)
Final Review
3 / 52
The shape of a distribution:
Number of modes: Unimodal, Bimodal, Multi-modal.
Symmetry and Skew.
The center of a distribution:
Mode: The number that occurs most frequently in a given data.
Mean: The numerical center of data.
Median: The midpoint of a distribution.
The spread of a distribution:
Standard Deviation: s measures the spread about the mean. The
standard deviation is connected to only the mean among center
measures.
v
u n
u1 X
2
s=t
(xi − x̄)
n
i=1
Interquartile Range: Q3 − Q1 where Q3 is the third quartile (75th
percentile) and Q1 is the first quartile (25th percentile).
Dr. Joseph Brennan (Math 148, BU)
Final Review
4 / 52
Effects of a General Linear Transformations.
If we act upon a data set by any linear transformation:
xnew = bx + a,
the change in spread and center are recorded:
Mean
1st Quartile
x̄new = bx̄ + a
Q1, new = bQ1 + a
Median
3rd Quartile
x̃new = bx̃ + a
Q3, new = bQ3 + a
Standard Deviation
Interquartile Range
snew = |b| · s
IQRnew = |b| · IQR
where | · | denotes absolute value.
Dr. Joseph Brennan (Math 148, BU)
Final Review
5 / 52
iClicker
Which of the following statements is false?
A) If a data histogram has a long right tail, then the median is less than
the average.
B) An observational study cannot establish causation due to uncontrolled
confounders.
C) The mean and standard deviation are not robust to outliers.
D) The sum of the column areas in a frequency histogram is one or 100%.
E) All of the above are true.
D The sum of the column areas in a density histogram is one or 100%,
while the sum of the column heights in a frequency histogram is one or
100%.
Dr. Joseph Brennan (Math 148, BU)
Final Review
6 / 52
Normal Curves
Properties of the (standard) normal
curve:
Symmetric about zero,
Unimodal,
The mean, median, and mode are
equal,
Bell-shaped,
The mean µ = 0 and the standard
deviation σ = 1,
The area under the whole normal
curve is 100% (or 1, if you use
decimals).
Dr. Joseph Brennan (Math 148, BU)
Final Review
7 / 52
Empirical Rule
Figure : Normal curve and percentage of observations under it.
z-Score:
The transformation of data into standard units, normal
approximation:
observation − mean
z=
standard deviation
Dr. Joseph Brennan (Math 148, BU)
Final Review
8 / 52
iClicker
Bias in statistical measurements
A) generally varies from measurement to measurement.
B) is a systematic error in all measurements and effects all measurements
the same way.
C) is caused by random measurement errors.
D) can generally be ignored because of its random natures.
E) can be readily detected by looking at the measurements themselves.
B
Dr. Joseph Brennan (Math 148, BU)
Final Review
9 / 52
Measurement Errors
1
Chance Error:
Chance error is present in every measurement
and is difficult to determine. Chance errors are random and have
equal probability of undervaluing a measurement as overvaluing one.
If a n measurements have been taken with a mean of x̄ and a
standard deviation of s, then we say:
”The next measurement will be x̄ give or take s.”
2
Outliers:
3
Bias (Systematic Error):
Observations outside of 3 standard deviations are
considered to be extreme and are treated as potential outliers.
A phenomenon affecting all
measurements the same way, pushing them in the same direction.
Bias, unlike chance error, is not detectable through multiple
measurements.
Dr. Joseph Brennan (Math 148, BU)
Final Review
10 / 52
Correlation Analysis
Correlation Analysis
The establishment of an association (or correlation) between two variables
and assessing its strength.
Independent Variable: The variable that is expected to influence
the other variable.
Dependent Variable: The variable that is expected to be influenced
by the other variable.
Variables with a linear relationship have two classifications:
Positive Association: both variable increase or decrease
simultaneously.
Negative Association: when one of the variables increases the other
decreases.
Dr. Joseph Brennan (Math 148, BU)
Final Review
11 / 52
The Correlation Coefficient
The correlation coefficient, r , is a descriptive statistic which measures the
direction and strength of the linear relationship between two
quantitative variables.
Suppose that we have data on variables x and y for n individuals.
Let the mean of x values be x̄ and let the mean of y values be ȳ .
Let the standard deviation of x values be sx and let the standard
deviation of y values be sy .
The sample correlation coefficient r between x and y is computed as
n
1X
r=
n
i=1
where zx,i =
xi − x̄
sx
n
yi − ȳ
1X
=
zx,i · zy ,i
sy
n
(1)
i=1
xi − x̄
yi − ȳ
and zy ,i =
are the z - scores for xi and yi ,
sx
sy
respectively.
Dr. Joseph Brennan (Math 148, BU)
Final Review
12 / 52
Properties of the Correlation Coefficient
(1) The sign of the correlation coefficient r indicates the direction of the
relationship between the variables.
(2) The correlation coefficient is just a number, it has no units of
measurement.
(3) The correlation r is always a number between −1 and 1. The closer
r to 1 or -1 is, the stronger the linear association between x and y .
¯
(4) Correlation only measures the strength of a LINEAR relationship
between two variables. Correlation DOES NOT describe curved
relationships between variables, no matter how strong they are!
(5) The correlation coefficient is NOT resistant to outliers.
(6) The correlation coefficient r is symmetric.
Dr. Joseph Brennan (Math 148, BU)
Final Review
13 / 52
Interpreting the Correlation Coefficient
Value of |r |
0.0 - 0.2
0.2 - 0.4
0.4 - 0.7
0.7 - 0.9
0.9 - 1.0
Interpretation
Very weak to negligible correlation
Weak, low correlation (not very significant)
Moderate correlation
Strong, high correlation
Very strong correlation
The correlation coefficient only measures the strength of
LINEAR relationships.
In order to accurately visually guess r , construct a scatter diagram
such that the vertical standard deviations cover the same distance on
the page as the horizontal standard deviations.
A coefficient r = 0.80 does NOT mean that 80% of the points are
tightly clustered around a line, NOR does it indicate twice as much
linearity as r = 0.40.
Dr. Joseph Brennan (Math 148, BU)
Final Review
14 / 52
Regression Analysis
Regression Analysis The creation of a mathematical model or formula
that relates the values of one variable to the values of the other.
Assume that y and x are the dependent and independent variables of a
study. Denote:
ŷ to be the predicted (by regression) value of y for a given x,
r to be the correlation coefficient between x and y ,
ȳ and sy the average and standard deviation for the dependent
(response) variable y ,
x̄ and sx the average and standard deviation for the independent
(explanatory) variable x.
Regression Line: Regressing y on x
x − x̄
ŷ = ȳ + r · sy ·
sx
ŷ = ȳ + r · sy · zx
zˆy = r · zx
Dr. Joseph Brennan (Math 148, BU)
Final Review
15 / 52
Regression Analysis
The best use of the regression line is to estimate the AVERAGE value of
y for a given value of x.
The correlation coefficient r measures the amount of scattering of points
about the regression line.
Regression Effect describes the tendency of individuals with extreme
values to retest towards the mean.
On average the top group will value lower on a second experiment and on
average the bottom group will value higher on a second experiment.
The Regression Fallacy is a fallacy by which individuals conjecture a
cause for an extreme to become average.
Dr. Joseph Brennan (Math 148, BU)
Final Review
16 / 52
iClicker
A university has made a statistical analysis of the relationship between
Math SAT scores and first year GPA’s for students who complete the first
year. The correlation coefficient is calculated as r = 0.6.
SATM Score
GPA
Mean
550
2.6
Standard Deviation
80
0.6
If someone enters with an SATM score of 650, what would you estimate
their first-year GPA to be?
A) 4.0
B) 3.3
C) 3.0
D) 2.8
E) 2.6
Dr. Joseph Brennan (Math 148, BU)
Final Review
17 / 52
iClicker Solution
As we are predicting GPA, we let GPA be y and SATM x.
To find our regression line we first find the slope
m=r·
sy
0.6
= (0.6) ·
= 0.0045
sx
80
Then the y-intercept
b = y − m · x = 2.6 − (0.0045) · 550 = 0.125
Finally we obtain our regression line
ŷ = 0.0045x + 0.125
We are given an x value of 650 and the resulting prediction:
ŷ = 0.0045 · (650) + 0.125 = 3.0
Dr. Joseph Brennan (Math 148, BU)
Final Review
18 / 52
RMS Error
The R.M.S. Error measures how far a typical point will be from the
regression line.
Taking the spread of points in a small vertical strip of the scatter plot, the
R.M.S. Error is similar to the standard deviation and the regression line is
similar to the mean.
p
R.M.S. Error: = 1 − r 2 · sy
Dr. Joseph Brennan (Math 148, BU)
Final Review
19 / 52
Homoscedastic and Heteroscedastic Relationships
There are two broad generalizations of data with a linear relationship:
Homoscedastic:
Scatter diagrams which forma a true football
shape. The standard deviation of y observations on vertical strips are
approximately the equivalent.
We may assume that the standard deviation is the R.M.S. Error.
Heteroscedastic:
Scatter diagrams with unequal vertical strip
standard deviations. The R.M.S. Error in this case gives an average error
across all the vertical strips.
For a given x, the R.M.S. Error should not be used as an estimate
of the standard deviation of the corresponding y -values.
Dr. Joseph Brennan (Math 148, BU)
Final Review
20 / 52
Probability
Probability theory deals with studies where the outcomes are not known
for sure in advance. Usually, there are many possible outcomes for a study,
we just do not know which particular outcome we will observe.
Sample Space: The set of all possible outcomes of a study. The sample
space of a study is denoted by S.
Every repetition of a study, or a trial, produces a single outcome. Usually
an outcome is computed from the values of the response variables.
An event is a set of outcomes from the sample space S. We say that an
event has occurred if ANY of the outcomes that constitute it occur.
Dr. Joseph Brennan (Math 148, BU)
Final Review
21 / 52
Special Events
Certain Event: An event which is guaranteed to happen at every
repetition of the experiment. A certain event is equal to the sample
space.
Impossible Event: An event which never can occur. Mathematically an
impossible event is written as ∅, the empty set.
Opposite Event: An event A is opposite to A if it happens whenever A
does not happen.
Dr. Joseph Brennan (Math 148, BU)
Final Review
22 / 52
Mutually Exclusive Events
The set of common outcomes is an event which is called the intersection
of events A and B. We will denote the intersection of events A and B by
A and B
Mutually Exclusive Events: Two events A and B are mutually exclusive
if they both cannot happen at the same time. The intersection A and B is
the empty set and P(A and B) = 0.
Dr. Joseph Brennan (Math 148, BU)
Final Review
23 / 52
Independent Events
Two events A and B are independent of each other if knowing that one
event occurs does not change the probability that the other event occurs.
For independent events A and B, P(A and B) = P(A) · P(B).
It can be established that events A and B are independent if
P(A) = P(A|B).
Union: The union of events A and B is the event which happens when
either event A or event B or both happen. The union of two events is
expressed as
A or B
Dr. Joseph Brennan (Math 148, BU)
Final Review
24 / 52
Rules of Probability
The following rules simplify many probability computations:
Rule 1: The probability P(A) of any event A satisfies
0 ≤ P(A) ≤ 1.
In other words, the probability of any event is between 0 and 1.
Rule 2: If S is the sample space for an experiment, then P(S) = 1.
The probability of a certain event is 1.
Rule 3: The probability of an impossible event is 0. Hence,
P(∅) = 0.
Rule 4: If events A and B are independent, then the probability
that they both happen is the product of their probabilities:
P(A and B) = P(A) · P(B)
Dr. Joseph Brennan (Math 148, BU)
Final Review
25 / 52
Rules of Probability
Rule 5: For any events A and B, the probability of their union is
equal to the sum of their individual probabilities minus the probability
of their intersection:
P(A or B) = P(A) + P(B) − P(A and B)
Subtracting the probability of the intersection is needed to avoid
double counting.
A special case: if A and B are disjoint events, then the probability of
their union is the sum of individual probabilities:
P(A or B) = P(A) + P(B)
Rule 6: For any event A
P(Ā) = 1 − P(A)
Dr. Joseph Brennan (Math 148, BU)
Final Review
26 / 52
Conditional Probability
Let A and B be events. If the events are not independent, then the
occurrence of B alters the probability that A will occur. The conditional
probability of event A given that event B has happened is denoted
P(A|B).
P(A and B)
P(A|B) =
P(B)
We will consider two important sampling schemes:
Sampling with Replacement: Independent Events
Sampling without Replacement: Dependent Events
The events A1 , A2 , . . . , An are not necessarily independent.
P(A1 and A2 and A3 and A4 and . . .) =
= P(A1 )·P(A2 |A1 )·P(A3 |A1 and A2 )·P(A4 |A1 and A2 and A3 )·. . .
Dr. Joseph Brennan (Math 148, BU)
Final Review
27 / 52
iClicker
Consider a standard 52 deck of cards. What is the probability of being
dealt a card that is either red or even?
A) 0%
B) 27%
C) 58%
D) 69%
E) 75%
The even cards are 2, 4, 6, 8, and 10 (5 per suit or 20 per deck).
P( Red or Even ) = P(R) + P(E ) − P(R and E )
=
Dr. Joseph Brennan (Math 148, BU)
26 20 10
36
+
−
=
= 69%
52 52 52
52
Final Review
28 / 52
iClicker
Consider a standard 52 deck of cards. What is the probability of being
dealt (without replacement) three cards with at least one being a King.
A) 0%
B) 14%
C) 22%
D) 78%
E) 84%
Let K be the event where at least one card is a King. The opposite event
K̄ is the event where none are Kings. There are 4 Kings per deck:
P(K ) = 1 − P(K̄ ) = 1 −
Dr. Joseph Brennan (Math 148, BU)
48 47 46
·
·
= 22%
52 51 50
Final Review
29 / 52
Permutations and Combinations
ORDER
In order to count the number of possible ways to choose, without
replacement, k objects from a collection of n distinct objects we must be
specific as to we acknowledge order.
A permutation is a choice where order matters.
A combination is a choice where order does not matter.
The only difference between a permutation and a combination is order.
This leads to very similar counting formulas:
n!
n
n!
=
n Pk =
(n − k)!
k
k! · (n − k)!
Recall: An event E in the sample space S has probability
P(E ) =
Dr. Joseph Brennan (Math 148, BU)
number of outcomes in E
number of outcomes in S
Final Review
30 / 52
The Law of Averages and Random Variables
Law of Averages: If an experiment is independently repeated a large
number of times, the percentage of occurrences of a specific event E will
be the theoretical probability of the event occurring, but of by some
amount - the chance error.
Random Variable: An unknown subject to random change. Often a
random variable will be an unknown numerical result of study.
A random variable has a numerical sample space where each outcome has
an assigned probability. There is not necessarily equal assigned
probabilities.
Any random variable X , discrete or continuous, can be described with /
A probability distribution.
A mean and standard deviation.
Dr. Joseph Brennan (Math 148, BU)
Final Review
31 / 52
Discrete Random Variable: σ
Standard Deviation: The standard deviation σ of a discrete
random variable is found with the aid of µ:
q
σ =
(x1 − µ)2 p1 + (x2 − µ)2 p2 + . . . (xk − µ)2 pk
v
u k
uX
= t (xi − µ)2 pi
i=1
When there are just two numbers, x1 and x2 , in the distribution of X the
distribution’s standard deviation, σ, can be computed by using the
following short-cut formula:
√
σ = |x1 − x2 | p1 p2
where pi is the probability of xi .
Dr. Joseph Brennan (Math 148, BU)
Final Review
32 / 52
Box Models
Box Model: A model framing a statistical question as drawing tickets
(with or without replacement) from a box. The tickets are to be labeled
with numerical values linked to a random variable.
The expected value of a random variable is the average of the tickets
occupying the box model.
The standard deviation of a random variable is the standard deviation
of the tickets.
Dr. Joseph Brennan (Math 148, BU)
Final Review
33 / 52
The Sum of n Independent Outcomes
When the same experiment is repeated independently n times, the
following is true for the sum of outcomes:
The expected value of the sum of n independent outcomes of an
experiment:
nµ
The standard error of the sum of n independent outcomes of an
experiment:
√
nσ
The second part of the above rule is called the the Square Root Law.
Note that the above rule is true for any sequence of independent random
variables, discrete or continuous!
Dr. Joseph Brennan (Math 148, BU)
Final Review
34 / 52
The Binomial Setting and Distribution
1
There are a fixed number of n of repeated trials.
2
The trials are independent. In other words, the outcome of any
particular trial is not influenced by previous outcomes.
3
The outcome of every trial falls into one of just two categories, which
for convenience we call success and failure.
4
The probability of a success, call it p, is the same for each trial.
5
It is the total number of successes that is of interest, not their order
of occurrence.
Let X denote the number of successes under the binomial setting. The
probabilities of values of X are computed as
n k
P(X = k) =
p (1 − p)n−k , k = 0, 1, 2, . . . , n.
(2)
k
Dr. Joseph Brennan (Math 148, BU)
Final Review
35 / 52
Binomial Distribution and Normal Curves
Let X be a binomial random variable with parameters n (number of trials)
and p (probability of success in each trial). Then the mean and standard
deviation of X are
µ = np,
p
σ = np(1 − p).
NORMAL APPROXIMATION for
BINOMIAL COUNTS
When n is large, the distribution of X is approximately normal.
X is approximately normalpwith mean np and standard deviation
np(1 − p).
As a rule, we will use this approximation for values of n and p that satisfy
np ≥ 10 and n(1 − p) ≥ 10.
Dr. Joseph Brennan (Math 148, BU)
Final Review
36 / 52
iClicker
The binomial formula can be used to calculate
A) The probability of getting exactly 4 heads in 10 tosses of a coin.
B) The probability of having exactly 4 diamond cards among the top 10
cards in a well shuffled standard deck of cards.
C) The probability of rolling exactly 7 even numbers in 10 die rolls.
D) Both (A) and (B).
E) Both (A) and (C ).
E
Dr. Joseph Brennan (Math 148, BU)
Final Review
37 / 52
iClicker
What is the probability of rolling exactly four 1’s in ten die rolls.
A) 0%
B) 5%
C) 10%
D) 16%
E) 50%
We have n = 10 independent trials with success having probability
which we expect exactly k = 4 successes. Note that 10
4 = 210:
P(k = 4) =
Dr. Joseph Brennan (Math 148, BU)
1
6
from
4 6
5
10
1
·
·
= 5%
4
6
6
Final Review
38 / 52
The Central Limit Theorem (CLT)
The Central Limit Theorem: When drawing at random with
replacement from a box, the probability histogram for the sum will
approximately follow the normal curve, even if the contents of the box do
not. The larger the number of draws, the better the normal approximation.
The sample size n should be at least 30 (n ≥ 30) before the normal
approximation can be used.
For symmetric population distributions the distribution of x̄ is usually
normal-like even at n = 10 or more.
For very skewed populations distributions larger values of n may be
needed to overcome the skewness.
Dr. Joseph Brennan (Math 148, BU)
Final Review
39 / 52
Parameters & Statistics
Parameter: A numerical fact about a population.
Statistic: A numerical fact about a sample.
An investigator knows a statistic and wants to know a parameter.
Probability Methods: Sampling techniques which implements an
objective chance process to choose subjects from the population, leaving
no discretion to the interviewer.
It is possible to compute the chance that any particular individual in
the population will get into the sample.
Simple Random Sampling: A sampling technique where selection of
individuals is equally likely and drawing for the sample is performed
without replacement.
We discussed three types of bias which arise in sampling, be familiar with
these terms for true/false and multiple choice questions.
Dr. Joseph Brennan (Math 148, BU)
Final Review
40 / 52
Variable Type
Given n draws with replacement from a box with
Mean µ (average for quantitative and percent for qualitative).
Standard Deviation σ.
Expected Value:
Standard Error:
Dr. Joseph Brennan (Math 148, BU)
Sum
n×µ
√
n×σ
Average
µ
√
σ/ n
Final Review
Number
n×µ
√
n×σ
Percent
µ
√
σ/ n
41 / 52
iClicker
A box contains four tickets with a 0 and six tickets with a 1. A sample of
100 draws are made with replacement from the box. There are fifty-six 1’s
among the draws. The chance error is
and the standard error is
.
A) −4 and 5.
B) −4 and 0.05.
C) 6 and 5.
D) 6 and 0.05.
E) −6 and 5.
The average ticket value is µ = 0.6. After 100 draws the expected value is
60 but the observed value is 56.
chance error = observed − expected = −4
√
The standard deviation of the box σ = 0.6 · 0.4 ≈ 0.5. The standard
error for number is
√
n × σ = 10 × 0.5 = 5
Dr. Joseph Brennan (Math 148, BU)
Final Review
42 / 52
The Correction Factor
When drawing without replacement, to get the exact SE you must
multiply by the correction factor:
s
number of objects in box − number of draws
number of objects in box − 1
When the number of tickets in the box is large relative to the number of
draws, the correction factor is nearly one.
Dr. Joseph Brennan (Math 148, BU)
Final Review
43 / 52
Normal Curve for SE for Averages and Percentages
Suppose 1,000 draws are made with replacement from a box whose
average ticket value is 200. The standard error for averages is found to
be 10.
There is about a 68% chance for the average of the 1, 000 draws to
be in the range 190 to 210.
Suppose 1,000 draws are made with replacement from a 0 − 1 box whose
percent of 1’s was 15%. The standard error for percent is found to be
0.5%.
There is about a 68% chance for the percentage of successful draws
of the 1, 000 draws to be in the range 14.5% to 15.5%.
Dr. Joseph Brennan (Math 148, BU)
Final Review
44 / 52
Tests of Significance
Confidence Intervals: Intervals on the number line which are used
to estimate the population parameter (µ) from the sample statistic
(x̄).
Tests of Significance: Tests intending to assess the evidence
provided by the data in favor of some claim about a population
parameter (µ).
A significance test is a formal procedure which uses the data to choose
between two competing hypotheses, the null hypothesis and the
alternative hypothesis.
Null Hypothesis: The basic or primary statement about the parameter
(µ).
If we reject H0 when in fact H0 is true, this is TYPE I error.
If we do not reject H0 when in fact Ha is true, this is TYPE II error.
Dr. Joseph Brennan (Math 148, BU)
Final Review
45 / 52
One Sample z - Test for µ
One Sample z-Test for µ: A test to determine the validity of a
statement concerning the mean µ based upon a single sample.
STEP 1: State the hypotheses.
H0 : µ = µ 0 ,
As a default, the alternative Ha is two-sided. A problem may specify
whether Ha is left-sided or right-sided.
STEP 2: Choose the significance level α.
Assume α = 0.05 unless otherwise stated.
STEP 3: Calculate the test statistic.
z=
Dr. Joseph Brennan (Math 148, BU)
x̄ − µ0
√σ
n
Final Review
.
46 / 52
One Sample z - Test for µ
STEP 4: Compute the P - value. The formula for the P-value
depends on the alternative hypothesis.
Recall that the P-value is the probability that a test statistic would
take a value more extreme than of that actually observed.
STEP 5: Make a decision:
Reject H0 if P − value < α.
Do not reject H0 is P − value > α.
STEP 6: State the conclusion in terms of the alternative
hypothesis.
If you rejected H0 , say ”there is enough evidence at α level that state
your alternative hypothesis in words here”.
If you did not reject H0 , say ”there is not enough evidence at α level
to say that state your alternative hypothesis in words here”.
Dr. Joseph Brennan (Math 148, BU)
Final Review
47 / 52
Confidence Intervals Chart
To find a confidence interval for the population mean µ with confidence
level C from a sample of size n with mean x̄ and sample standard
deviation s:
(1) If the population standard deviation σ is known, and either
population distribution is normal, or sample size is large (n ≥ 30)
zC × σ
zC × σ
x̄ − √ , x̄ + √
n
n
(2) If the population standard deviation σ is unknown then
Case 1: (n < 30 and population distribution is normal)
#
"
tCn−1 × s
tCn−1 × s
x̄ − √
, x̄ + √
n−1
n−1
Case 2: (n ≥ 30 and population distribution is normal)
zC × s
zC × s
x̄ − √ , x̄ + √
n
n
Dr. Joseph Brennan (Math 148, BU)
Final Review
48 / 52
Chart for Tests of Significance for µ
We have two types of test statistic for the null hypothesis (H0 : µ = µ0 ).
(1) If the sample size is large, (n > 30), then
test statistic =
x̄ − µ0
√s
n
and use the normal table to calculate a P-value.
(2) If the sample size is small, n ≤ 30, and the population distribution is
roughly normal
x̄ − µ0
test statistic =
s
√
n
and use the t-table with n − 1 degree of freedom to calculate a P-value.
Dr. Joseph Brennan (Math 148, BU)
Final Review
49 / 52
iClicker
Other things being equal, which of the following P-values is best for the
null hypothesis?
A) 0.1%
B) 4%
C) 17%
D) 32%
E) 48%
E
Dr. Joseph Brennan (Math 148, BU)
Final Review
50 / 52
iClicker
An automobile manufacturer has claimed that their car averages 30 mpg
(miles per gallon) on the highway. A simple random sample of 17 such cars
yields an average gas mileage of 26 mpg with a standard deviation of 8
mpg. Set up a suitable hypothesis test to determine whether the difference
between our sample and the manufacturer’s claim is real or chance error.
A) The results are highly statistically significant.
B) The results are statistically significant.
C) The results are not statistically significant.
(H0 : µ = 30)
(Ha : µ 6= 30)
As we do not know the population standard deviation and the sample size
is under 30 we use the t-distribution with 16 degrees of freedom.
26 − 30
Test Statistic: =
= −2
8
√
P-Value: 2 · P(t
16
16
≥ 2) > 2 · .025 ≥ 5
Therefore, the results are not statistically significant.
Dr. Joseph Brennan (Math 148, BU)
Final Review
51 / 52
iClicker
An automobile manufacturer has claimed that his car averages 30 mpg
(miles per gallon) on the highway. A simple random sample of 17 such
cars yields an average gas mileage of 26 mpg with a standard deviation of
8 mpg. Calculate a 99% confidence interval for the average gas mileage
for this type of automobile.
A) [25, 27]
B) [23, 29]
C) [21.8, 30.2]
D) [20.2, 31.8]
E) [19.4, 33.2]
We are still using t 16 , but we are interested in a 99% confidence interval
or the 99/2 + 50 = 99.5th percentile.
16
t0.005
= 2.921
The 99% confidence interval:
2.921 · 8
2.921 · 8
[26 − √
, 26 + √
] = [20.2, 31.8]
16
16
Dr. Joseph Brennan (Math 148, BU)
Final Review
52 / 52