Download Midterm Exam III Review

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Statistics wikipedia , lookup

Randomness wikipedia , lookup

Transcript
Midterm Exam III Review
Dr. Joseph Brennan
Math 148, BU
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
1 / 25
Permutations and Combinations
ORDER
In order to count the number of possible ways to choose, without
replacement, k objects from a collection of n distinct objects we must be
specific as to we acknowledge order.
A permutation is a choice where order matters.
A combination is a choice where order does not matter.
The only difference between a permutation and a combination is order.
This leads to very similar counting formulas:
n!
n
n!
=
n Pk =
(n − k)!
k
k! · (n − k)!
Recall: An event E in the sample space S has probability
P(E ) =
Dr. Joseph Brennan (Math 148, BU)
number of outcomes in E
number of outcomes in S
Midterm Exam III Review
2 / 25
iClicker
A graduate class is comprised of 2 overachieving undergraduates, 7 first
year graduates, 4 second year graduates, and 1 third year graduate. If class
begins with random presentations by three students, what is the probability
that an undergraduate and two first year graduates are chosen?
(A) 0%
(B) 6%
(C) 11%
(D) 19%
(E) 27%
2
1
× 72
≈ 11%
14
3
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
3 / 25
Law of Averages
Law of Averages: If an experiment is independently repeated a
large number of times, the percentage of occurrences of a specific event E
will be the theoretical probability of the event occurring, but of by some
amount - the chance error.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
4 / 25
iClicker
There are two hospitals: in the Hospital A 120 babies are born every day;
in Hospital B 12 babies are born.
On average, the ratio of baby boys to baby girls born every day in each
hospital is 50/50. However, one day, in one of those hospitals, twice as
many baby girls were born as baby boys. In which hospital was it more
likely to happen?
(A) Hospital A
(B) Hospital B
(C) Hospital C
(D) Equally Likely
Hospital B. The probability of a random deviation of a particular size (from
the population mean), decreases with the increase in the sample size.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
5 / 25
Random Variable
Random Variable: An unknown subject to random change. Often a
random variable will be an unknown numerical result of study.
A random variable has a numerical sample space where each outcome has
an assigned probability. There is not necessarily equal assigned
probabilities.
Any random variable X , discrete or continuous, can be described with
A probability distribution.
A mean and standard deviation.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
6 / 25
Random Variables
The distribution of a discrete random variable X is summarized in the
distribution table:
Value of X
Probability
x1
p1
x2
p2
x3
p3
...
...
xk
pk
The symbols xi represent the distinct possible values of X and pi is the
probability associated to xi .
p1 + p2 + . . . + pk = 1
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
(or 100%)
7 / 25
3 Coins Probability Histogram
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
8 / 25
Discrete Random Variable: µ
Mean:
The mean µ of a discrete random variable is found by
multiplying each possible value by its probability and adding together all
the products:
µ = x1 p1 + x2 p2 + . . . + xk pk =
k
X
xi pi
i=1
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
9 / 25
Discrete Random Variable: σ
Standard Deviation: The standard deviation σ of a discrete
random variable is found with the aid of µ:
q
σ =
(x1 − µ)2 p1 + (x2 − µ)2 p2 + . . . (xk − µ)2 pk
v
u k
uX
= t (xi − µ)2 pi
i=1
When there are just two numbers, x1 and x2 , in the distribution of X the
distribution’s standard deviation, σ, can be computed by using the
following short-cut formula:
√
σ = |x1 − x2 | p1 p2
where pi is the probability of xi .
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
10 / 25
Box Models
Box Model: A model framing a statistical question as drawing tickets
(with or without replacement) from a box. The tickets are to be labeled
with numerical values linked to a random variable.
The expected value of a random variable is the average of the tickets
occupying the box model.
The standard deviation of a random variable is the standard deviation
of the tickets.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
11 / 25
iClicker
Did you know there are 4, 8, 10, 12, and 20 sided die? Create a box model
for rolling a 4-sided die. What is the standard deviation of the box?
(A) 1.11
(B) 1.65
(C) 2
(D) 2.21
(E) 2.5
Solution: First find the mean:
µ=1×
1
1
1
1
+ 2 × + 3 × + 4 × = 2.5
4
4
4
4
r
σ=
(1 − 2.5)2 ×
σ = 1.11
1
1
1
1
+ (2 − 2.5)2 × + (3 − 2.5)2 × + (4 − 2.5)2 ×
4
4
4
4
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
12 / 25
The Sum of n Independent Outcomes
When the same experiment is repeated independently n times, the
following is true for the sum of outcomes:
The expected value of the sum of n independent outcomes of an
experiment:
nµ
The standard error of the sum of n independent outcomes of an
experiment:
√
nσ
The second part of the above rule is called the the Square Root Law.
Note that the above rule is true for any sequence of independent random
variables, discrete or continuous!
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
13 / 25
The Binomial Setting
1
There are a fixed number of n of repeated trials.
2
The trials are independent. In other words, the outcome of any
particular trial is not influenced by previous outcomes.
3
The outcome of every trial falls into one of just two categories, which
for convenience we call success and failure.
4
The probability of a success, call it p, is the same for each trial.
5
It is the total number of successes that is of interest, not their order
of occurrence.
NOTE: The Binomial Setting can be framed as a box model with only 1’s
and 0’s where draws are performed with replacement.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
14 / 25
The Binomial Distribution
Let X denote the number of successes under the binomial setting. Then X
is a random variable which may take values 0, 1, 2, 3, ..., n. In particular,
X = 0 means no successes in n trails. Only failures were observed.
X = n means the outcomes of all n trails are successes.
X = 5 means 5 successes in n trials.
It turns out that X has a special discrete distribution which is called the
binomial distribution. The probabilities of values of X are computed as
n k
P(X = k) =
p (1 − p)n−k , k = 0, 1, 2, . . . , n.
(1)
k
So the binomial distribution is a probability distribution of a random
variable X which has 2 parameters: p (probability of success) and n (the
number of trials).
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
15 / 25
Binomial Mean and Standard Deviation
Let X be a binomial random variable with parameters n (number of trials)
and p (probability of success in each trial). Then the mean and standard
deviation of X are
µ = np,
p
σ = np(1 − p).
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
16 / 25
Binomial Distribution and Normal Curves
NORMAL APPROXIMATION for BINOMIAL COUNTS
Let X be a random variable which has a binomial distribution with
parameters n and p. When n is large, the distribution of X is
approximately normal.
X is approximately normalpwith mean np and standard deviation
np(1 − p).
As a rule, we will use this approximation for values of n and p that satisfy
np ≥ 10 and n(1 − p) ≥ 10.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
17 / 25
The Central Limit Theorem (CLT)
The Central Limit Theorem: When drawing at random with
replacement from a box, the probability histogram for the sum will
approximately follow the normal curve, even if the contents of the box do
not. The larger the number of draws, the better the normal approximation.
The sample size n should be at least 30 (n ≥ 30) before the normal
approximation can be used.
For symmetric population distributions the distribution of x̄ is usually
normal-like even at n = 10 or more.
For very skewed populations distributions larger values of n may be
needed to overcome the skewness.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
18 / 25
Parameters & Statistics
Parameter: A numerical fact about a population.
Statistic: A numerical fact about a sample.
An investigator knows a statistic and wants to know a parameter.
Probability Methods: Sampling techniques which implements an
objective chance process to choose subjects from the population, leaving
no discretion to the interviewer.
It is possible to compute the chance that any particular individual in
the population will get into the sample.
Simple Random Sampling: A sampling technique where selection of
individuals is equally likely and drawing for the sample is performed
without replacement.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
19 / 25
Bias
Sampling Bias: A bias in which a sample is collected in such a way that
some members of the intended population are less likely to be included
than others.
The bias can lead to an over/underrepresentation of the
corresponding parameter in the population.
Almost every sample in practice is biased because it is practically
impossible to ensure a perfectly random sample.
Non-response Bias: A bias that results when respondents differ in
meaningful ways from nonrespondents.
Respondents and nonrespondents can differ in ways beyond their
willingness to answer a questionnaire.
Quota Sampling: A sampling method in which interviewers are
assigned a fixed quota of subjects to interview.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
20 / 25
Variable Type
Given n draws with replacement from a box with
Mean µ (average for quantitative and percent for qualitative).
Standard Deviation σ.
Expected Value:
Standard Error:
Dr. Joseph Brennan (Math 148, BU)
Sum
n×µ
√
n×σ
Average
µ
√
σ/ n
Midterm Exam III Review
Number
n×µ
√
n×σ
Percent
µ
√
σ/ n
21 / 25
The Correction Factor
When drawing without replacement, to get the exact SE you must
multiply by the correction factor:
s
number of objects in box − number of draws
number of objects in box − 1
When the number of tickets in the box is large relative to the number of
draws, the correction factor is nearly one.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
22 / 25
Normal Curve for SE for Averages and Percentages
Suppose 1,000 draws are made with replacement from a box whose
average ticket value is 200. The standard error for averages is found to
be 10.
There is about a 68% chance for the average of the 1, 000 draws to
be in the range 190 to 210.
Suppose 1,000 draws are made with replacement from a 0 − 1 box whose
percent of 1’s was 15%. The standard error for percent is found to be
0.5%.
There is about a 68% chance for the percentage of successful draws
of the 1, 000 draws to be in the range 14.5% to 15.5%.
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
23 / 25
iClicker
Historically, Five Star Pizza has recorded Thursday night drivers as having
an average delivery time of 28 minutes with a standard deviation of 5
minutes.
If the manager were to take a sample of 100 orders at random on
Thursday nights spread over a year, what is the chance that the average
delivery time is greater than 30 minutes?
(A) 0%
(D) 10%
(B) 2%
(E) 40%
(C) 5%
Solution: The box has µ = 22 and σ = 5.
5
= 0.5
100
We can answer this question using the normal curve:
30 − 28
P(X > 30) = 1 − P Z <
= 1 − P(Z < 4) ≈ 0
0.5
EV = 28
Dr. Joseph Brennan (Math 148, BU)
SE = √
Midterm Exam III Review
24 / 25
iClicker
Five Star Pizza has not been keeping track of their Monday night delivery
time and the manager has been receiving complaints. The manager takes
a simple random sample of delivery 40 deliveries, and finds a sample mean
of 36 minutes and a sample standard deviation of 12 minutes.
What is the 90% confidence interval for the average Monday night delivery
time?
(A) [35,37]
(C) [32,40]
(E) [28,44]
(B) [34,38]
(D) [30,42]
Solution: Begin by finding the z-score for confidence:
(90/2 + 50) = 95
with z-score: 1.66
The margin of error is found
12
m = zC × SE = 1.66 × √ ≈ 4
40
[x̄ − m, x̄ + m] = [32, 40]
Dr. Joseph Brennan (Math 148, BU)
Midterm Exam III Review
25 / 25