Download - Sleeping Polar Bear

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Inductive probability wikipedia , lookup

Taylor's law wikipedia , lookup

History of statistics wikipedia , lookup

Probability amplitude wikipedia , lookup

Student's t-test wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
MATH 203
ICY SMOOTH COURSE PACK
WINTER 2017
www.sleepingpolarbear.ca
CREATED IN AN IGLOO BY PEREGRINE FALCONS
Table of Contents
Describing & Summarizing Data
1
Review Questions & Solutions
17
Intro to Probability
19
Review Questions & Solutions
24
Counting Methods
28
Conditional Probability
32
Review Questions & Solutions
37
Discrete Random Variables
44
Review Questions & Solutions
47
Binomial Probabilities
48
Hypergeometric Probabilities
54
Review Questions & Solutions
59
Normal Distribution
64
Review Questions & Solutions
81
Central Limit Theorem
91
Sampling Distributions Special Topic
93
Final Exam Dec. 2009 w/Solutions
94
A Picture of a Bird
105
Basic Definitions
• A distribution is a group of numbers that are being interpreted.
o For example, the following is a distribution: [11, 13, 19, 23, 34, 47, 61]
o Synonyms: Data, Data Set.
• A value is a specific number from our distribution
o For example, 11 is a value from our distribution.
o Synonyms: Score, Observation
• Data summary involves taking an entire distribution (for example, the GPAs of 200 randomly selected
McGill students) and summarizing this distribution with just a few different values.
o The purpose of data summary is to describe the whole set of scores to someone with these few
specific values, so that, without reading the entire data set, they can have a pretty good idea of what
it looks like.
o The two main ways data are summarized are measures of central tendency and measures of
variation.
Sample vs Population
•
•
Population refers to the entire group that we are interested in measuring with respect to the variable in
question.
Sample refers to a subset of this goup of interest.
•
For example, we may be interested in the IQ of current McGill undergrads.
o The population would be all McGill undergrad students. All 27,000 of them (according to
Wikipedia).
o A sample would be, say, 100 randomly selected students from the entire undergrad student body.
•
•
•
•
The population mean is denoted as
The population standard deviation is denoted as
The sample mean is denoted as 𝑋
The sample standard deviation is denoted as S.
o The population and sample mean are calculated the same way:
𝑋=
Σ𝑋
𝑁
𝜇=
Σ𝑋
𝑁
o There is a minor difference in how sample and population standard deviation are calculated:
𝑆=√
∑(𝑋 − 𝑋)2
𝑁−1
𝜎=√
∑(𝑋 − 𝜇)2
𝑁
You do not have to constantly worry about whether we are dealing with a sample or population in each
example:
(1) Whether we are dealing with a sample or population, everything is calculated the same way, with the
exception of standard deviation.
(2) In any given formula, you can replace 𝑋 and S with 𝜇 and 𝜎 or vice versa. This will always be fine.
• For example, as we will soon see, the formula for calculating the Z-score of a specific value from
a distribution is as follows:
Population: Z(X) =
•
X−μ
σ
⟷
Sample: Z(X) =
X−X
S
Therefore, even if you did not know whether you were dealing with a population or a sample,
you would still get the exact same result when calculating the Z-score for a particular value of X.
(3) Unless explicitly stated that we are dealing with a population, you can safely assume we are dealing with
a sample and calculate standard deviation accordingly.
Measures of Central Tendency
•
•
Measures of central tendency let us know where most of our values are centered or clustered around.
The three most common ones are mean, median and mode.
Mean
•
The Mean or Average is obtained by dividing the sum of all values in our distribution (Σ𝑋) by the number
of values in our distribution (𝑁).
Sample: X =
ΣX
N
Population: ∶ μ =
ΣX
N
Median
The Median is the middle value in our distribution. The median is greater than half the values and less than half
the values in our distribution.
o If we have an odd number of values (N = 5, for example), the median will be an actual value from
our distribution (the 3rd value in this case).
o If we have an even number of values (N = 6, for example,), the median will be the average of the
two middle values (the average of the 3rd and 4th values in this case).
•
The median is also known as the 50th percentile.
o A value’s percentile is the percentage of values which it is greater than the median is greater than
half, or 50% of the values in its distribution (and smaller than half of the values).
Example: Find the median in the following distribution: 29, 22, 23, 56, 37, 28, 33
➢ First, arrange all values in ascending order:
[22, 23, 28, 29, 33, 37, 56]
➢ Next, calculate the sample size and find the rank of the median:
Sample Size = N = 7
𝐌𝐞𝐝𝐢𝐚𝐧 𝐑𝐚𝐧𝐤 = 𝐊 =
𝐍+𝟏 7+1
=
=4
𝟐
2
➢ Finally, find the kth value in our distribution:
[22, 23, 28, 29, 33, 37, 56]
The 4th value in our distribution, starting from the smallest, is 29.
Median = 29
•
Suppose we had an odd number of values in our distribution. Let’s add 77 as the final value:
[22, 23, 28, 29, 33, 37, 56, 77]
Sample Size = N = 8
Median Rank = K =
•
In this case, K = 4.5 tells us that to find the median we must take the average of the 4th and 5th values: 29
and 33.
𝐌𝐞𝐝𝐢𝐚𝐧 =
•
N+1 8+1
=
= 4.5
2
2
𝟐𝟗 + 𝟑𝟑
= 𝟑𝟏
𝟐
That’s it!
Mean vs Median
•
•
•
•
When there are extreme outliers (values that are significantly less than or greater than most other values),
the median is often preferred as a measure of central tendency.
This is because the mean is affected by outliers but the median is not.
For example, suppose we were interested in the salaries of students their first year our of McGill.
We randomly sample 10 such students, and their salaries (in thousands) are as follows:
[32, 36, 38, 44, 47, 48, 55, 65, 77, 675]
o Median Salary in this group = 47.5; Mean Salary = 111.7
o 9 of 10 students have salaries between $32,000 and $77, 000
o The Median ($47,500) in this case is therefore a pretty accurate measure of central tendency.
o The Mean ($111,700) in this case is a very misleading measure of central tendency.
o The extreme outlier of $675,000 (a student who got rich starting her own business) has “pulled the
mean upwards”.
The mean is sensitive to outliers; the median is unaffected by outliers.
•
If we were to replace the student whose salary was $675,000 with one whose salary was $90,000:
[32, 36, 38, 44, 47, 48, 55, 65, 77, 90]
o The Median would still = 47.5
o The Mean would now = 52.2
o The Median has stayed the same at $47,500, while the Mean has fallen from $111,700 to $52,200!
Mode
•
•
The Mode is the most common value in our distribution.
For example, consider the following data set: [14, 16, 23, 27, 27, 32, 35, 35, 35, 43, 68]
o The most common value in this distribution is 35, which occurs three times.
o Therefore, Mode = 35.
Measures of Variation
•
•
Measures of variation let us known the general spread within our distribution.
In other words, they indicate how far apart values tend to be from one another: whether they are relatively
close together (17, 17, 18, 19, 21) or far apart (98, 225, 436, 879, 7473)
Standard Deviation & Variance
•
•
The standard deviation tells us the average distance of each value from the mean.
The variance is equal to the standard deviation squared.
(∑ X)2
∑(X − X)2 √∑ X 2 − n
Sample Standard Deviation = S = √
=
N−1
N−1
(∑ X)2
∑(X − μ)2 √∑ X 2 − n
Population Standard Deviation = σ = √
=
N
N
(∑ X)2
2
∑ X2 −
∑(X
−
X)
n
Sample Variance = S2 =
=
N−1
N−1
(∑ X)2
2
∑ X2 −
∑(X
−
μ)
n
Population Variance = σ2 =
=
N
N
Example: Calculate the mean and standard deviation for the following data set: [22, 25, 27, 28, 40]
*Always assume we are dealing with a sample unless explicitly stated otherwise.
ΣX = 22 + 25 + 27 + 28 + 43 = 145
ΣX 2 = 222 + 252 + 272 + 282 + 432 = 4471
𝐗=
S2 =
𝚺𝐗
= 𝟐𝟗
𝐍
(145)2
(∑ X)2
4471 −
n =
5 = 66.5
N−1
4
∑ X2 −
𝐒 = √𝐒 𝟐 = √𝟔𝟔. 𝟓 = 𝟖. 𝟏𝟓𝟓
Range
•
•
•
The Range is another measure of variation.
The range tells us the difference between the largest and smallest values:
Obviously, the greater the range, the greater the level of variation.
Range = MAX – MIN
Example: Calculate the range in the following data set: [22, 23, 28, 29, 33, 37, 56, 77]
Range = MAX – MIN = 77 -22 = 55
Percentiles
• A score’s percentile score or ranking is the proportion of values that it is greater than in a distribution.
• If you were to write the LSAT and score in the 85th percentile, this would mean that you did better than 85%
and worse than 15% of people who wrote that particular version of the LSAT.
• In any data set:
QL = Lower Quartile = 25th percentile
M = Middle Quartile = Median = 50th percentile
QU = Upper Quartile = 75th percentile
Z-Scores
•
•
•
•
In any data set, the Z-Score of a particular value tells us the distance of that value from the mean.
This distance is expressed in units of standard deviations.
The sign (positive or negative) tells us whether this value is greater or less than the mean
The absolute value tells us how many standard deviations above or below the mean.
We use the following equation to calculate the Z-score of a particular value:
Population: Z(X) =
X−μ
σ
Sample: Z(X) =
X−X
S
Suppose that for Canadian adult males, average height is 70 inches, standard deviation is 3 inches:
70” , = 3”.
=
• 73” is 3“ (1 standard deviation) above the mean The Z-Score of 73” is equal to 1
Z (73) = 1
• 64” is 6” (2 standard deviations) below the mean
The Z-Score of 64” is equal to -2 Z(64) = -2
• 70” is 0” (0 standard deviations) away from the mean The Z-Score of 70” is equal to 0 Z(70) = 0
𝑍(73) =
73 − 70
=1
3
𝑍(64) =
64 − 70
= −2
3
𝑍(70) =
70 − 70
=0
3
The Empirical Rule
•
When a distribution is (a) unimodal and (b) not significantly skewed, it generally looks something like this:
X
•
•
We refer to a distribution of this sort of distribution as mound-shaped, symmetric, and, as we will see later
on, normally distributed.
For such distributions, when N is sufficiently large, the empirical rule tells us that:
(1) Approximately 68% of values are within 1 standard deviation of the mean
o 68% of scores fall within the following interval: X ± S → [X − S, X + S ]
o In other words, 68% of scores have a Z-score between -1 and 1
(2) Approximately 95% of values are within 2 standard deviations of the mean
o 95% of scores fall within the following interval: X ± 2S → [X − 2S , X + 2S ]
o In other words, 95% of scores have a Z-score between -2 and 2
(3) Approximately 99.7% of values are within 3 standard deviations of the mean
o 99.7% of scores fall within the following interval: X ± 3S → [X − 3S, X + 3S ]
o In other words, 99.7% of scores have a Z-score between -3 and 3.
Example: In a random sample of 200 McGill students, the average number of alcohol beverages
consumed during winter break was 35 with a standard deviation of 10. How many of these students do
you estimate to have had between 25 and 45 drinks during winter break?
𝑍(25) =
25 − 35
= −1
10
𝑍(45) =
45 − 35
=1
10
According to the empirical rule, we expect 68% of values to be within 1 standard deviation of the mean.
Therefore, we expect 68% of 200 = 0.68(200) = 136 students to have had between 25 and 45 drinks.
Graphical Representations of Data
Suppose that these are the final exam grades for 100 students in Math 203:
37
44
47
47
49
53
55
55
55
57
57
60
60
61
62
62
63
64
65
65
65
66
66
67
67
68
68
68
68
68
69
69
69
70
70
70
70
71
71
71
71
72
72
72
72
72
73
73
73
73
73
73
73
73
73
73
73
74
74
75
75
75
75
75
76
76
76
76
76
76
77
77
77
77
78
78
78
79
79
80
81
81
81
82
82
83
83
84
84
85
85
86
86
86
87
88
92
94
94
96
Dot Plot
•
•
•
•
•
The horizontal axis represents the different possible values in our distribution.
o In this case, the horizontal axis represents the grade achieved on the exam.
For the sake of this graph, grades have each been rounded to the nearest multiple of 5.
o In other words, 72 counts as an instance of 70; 88 counts as an instance of 90.
When data values repeat, dots are placed above one another, forming a pile at that numerical location.
As you can see, this dot plot shows that a significant percentage of students scored in the 70s on their exam.
Another thing to note about the dot plot is that it sounds funny when you say it out loud.
INTRO TO PROBABILITY
Some Notation to Start
A
event A happens
P(A)
probability that event A happens
AC
event A does not happen
P(AC)
probability that event A does not happen
The probability of something happening means the “chances” or “likelihood” of it happening
The probability of something happening is always somewhere between 0 and 1
0
1
0% Impossible. It will never, ever happen.
100% Guaranteed. It will happen every single time.
“Percentage/Proportion of” and “Probability” mean the same thing
60% of McGill students are female
P(Female) = 0.6
Probability a randomly selected McGill student is female is 60%
The probability of some event happening + the probability of that event NOT happening = 1
•
•
In other words, P(A) + P(AC) = 1
By re-arranging the terms, we also get:
P(A) = 1 – P(AC)
P(AC) = 1 – P(A)
the probability of something happening = 1 – the probability of it not happening
the probability of something not happening = 1 – the probability of it happening
P(Rain Tomorrow) + P(No Rain Tomorrow) = 1
P(Ben Affleck had eggs for breakfast today) + P(Ben Affleck didn’t have eggs for breakfast today) = 1
•
•
We do not need to know either probability in order to know that their sum is equal to 1.
This is because…
The sum of the probabilities of all possible outcomes in a scenario always equals 1.
•
•
When we roll a die, for example, there are six possible outcomes: it lands on 1, 2, 3, 4, 5 or 6.
1
The probability it lands on each number =
•
Therefore: P(1) + P(2) + P(3) + P(4) + P(5) + P(6) = 6 • 6 = 1
6
1
•
Example: 53% of McGill students are from Montreal, 12% are from Toronto, and 10% are from elsewhere in
Canada. What % of McGill students are from outside of Canada?
P(MTL) + P(TO) + P(Elsewhere) + P(Outside) = 1
P(Outside) = 1 – P(MTL) – P(TO) – P(Elsewhere) = 1 – 0.53 -0.12 – 0.1 = 0.25
25% of McGill students come from outside of Canada.
HERE IS THE MOST IMPORTANT/BASIC DEFINITION OF PROBABILITY:
𝐏𝐑𝐎𝐁𝐀𝐁𝐈𝐋𝐈𝐓𝐘 𝐎𝐅 𝐄𝐕𝐄𝐍𝐓 "𝐀" 𝐇𝐀𝐏𝐏𝐄𝐍𝐈𝐍𝐆 =
•
# 𝐎𝐟 𝐖𝐚𝐲𝐬 "𝐀" 𝐂𝐚𝐧 𝐇𝐚𝐩𝐩𝐞𝐧
𝐓𝐨𝐭𝐚𝐥 # 𝐎𝐟 𝐏𝐨𝐬𝐬𝐢𝐛𝐥𝐞 𝐎𝐮𝐭𝐜𝐨𝐦𝐞𝐬
Example: A jar with 4 red and 6 black marbles. A marble is chosen at random. What is the probability of it
being red?
4 red marbles 4 ways of choosing a red marble
# ways A can happen = 4
4 red + 6 black marbles 10 total marbles to choose from
total # of possible outcomes = 10
P(Red) = 4/10
P(B|A) = probability that B happens/is true, given (taking into account) that A happened/is true
•
•
For example, let’s say that 70% of all McGill students like lollipops but only 40% of ECON students like
lollipops.
Let A = McGill student likes lollipops & let B = Student is in ECON
P(A) = 0.7
The probability that a randomly selected McGill student likes lollipops is 0.7
P(A|B) = 0.4 The probability that a randomly selected McGill student likes lollipops, taking into account
that he/she is in ECON, is 0.4
The Two Big Rules
Rule 1: The probability of A and be B both happening/being true = P(A∩B) = P(A) • P(B|A)
Time go to on a slight tangent (be back to Rule 1 in a moment….)
Statistical Independence
•
•
•
•
Event A and Event B are statistically independent if knowing that A happened/is true does not affect the
probability of B happening/being true.
For example, let’s run an experiment: flip a coin and roll a die
A = coin lands on heads, B= die lands on 4
Knowing whether or not the coin lands on heads does not affect the probability that the die lands on 4:
1
P(B) = 6
1
P(B|A) = 6
• Therefore, the coin landing on heads and the die landing on 4 are statistically independent.
If A and B are statistically independent, then P(B|A) = P(B)
Therefore, if A and B are independent, then P(A∩B) = P(A) • P(B)
A bit later we will use the above equation when testing for statistical independence.
OK back to Rule 1…
Example #1: I randomly select 2 cards from a 52 card deck: A, 2, 3, 4, 5 , 6, 7, 8, 9, 10, J, Q, K, each of spades
♠, hearts ♥, diamonds ♦ and clubs ♣. What is the probability that they are both diamonds?
Let A = 1st card is a diamond, and B = 2nd card is a diamond
13
P(1st is a diamond) = P(A) = 52
12
P(2nd is a diamond given that 1st is a diamond) = P(B|A) = 51
A & B are not independent (you do not need to state this when answering such questions)
13
12
P(1st card is a diamond AND 2nd card is a diamond) = P(A) • P(B|A) = (52) (51) = 0.0588
Example #2: I flip a coin 3 times. What is the probability that it lands on heads all 3 times?
A = heads first time, B = heads second time, C = heads third time
1
P(1st is heads) = 2
1
P(2nd is heads given that 1st was heads) = P(2nd is heads) = 2
P (3rd is heads given that 1st & 2nd were heads = P(3rd is heads) =
1
2
A, B,& C are independent
Therefore:
1 3
1
P(coin lands on heads the 1 st & 2nd & 3rd time) = P(A) • P(B) • P(C) = (2) = 8
Rule 2: The probability of A or B happening/being true = 𝐏(𝐀 ∪ 𝐁) = 𝐏(𝐀) + 𝐏(𝐁) − 𝐏(𝐀 ∩ 𝐁)
Time go to on another slight tangent (be back to Rule 2 in a moment….)
Mutually Exclusive (Disjoint)
• A & B are mutually exclusive (disjoint) if A & B cannot both be true at the same time
• If we flip a coin, it cannot both land on heads AND land on tails
• If we roll a die, it cannot both land on 1 to 3 AND 4 to 6
If A and B are mutually exclusive, then: 𝐏(𝐀 ∩ 𝐁) = 𝟎, and therefore 𝐏(𝐀 ∪ 𝐁) = 𝐏(𝐀) + 𝐏(𝐁)
OK back to rule 2…
Example #1: What is the probability a die will land on 3 or 4?
P(3) = 1/6, P(4) = 1/6
Landing on 3 & landing on 4 are mutually exclusive
P(3 ∩ 4) = 0
Therefore, P(3 or 4) = P(3) + P(4)
And finally: P(3 or 4) = P(3) + P(4) = 1/6 + 1/6 = 1/3 or 0.333
Example #2: 60% of McGill students own a mac and 70% own an iPhone. 40% own a mac and an iPhone.
What proportion of McGill students own a mac or an iPhone?
Owning a Mac and owning and iPhone are not mutually exclusive
P(Mac) + P(iPhone) – P(Mac ∩ iPhone) = 0.6 + 0.7 – 0.4 = 0.9
90% of McGill students own a Mac or an iPhone.
Example #3: We flip a coin and roll a die. What is the probability that the coin lands on heads or the die lands
on 4?
The coin landing on heads and the die landing on 4 are NOT mutually exclusive events
o It is possible the coin lands on heads and the die lands on 4
o Therefore, = P(H ∪ 4) = P(H) + P(4) − P(H ∩ 4)
The coin landing on heads and the die landing on 4 ARE independent events
o Knowing whether the coin has landed on heads or tails does not affect the probability that the die will land
on 4.
o Therefore, P(H ∩ 4) = P(H) • P(4) =
Solution: 𝐏(𝐇𝐞𝐚𝐝𝐬 𝐨𝐫 𝟒) =
𝟏
𝟐
𝟏
𝟏
𝟕
1
2
1
1
• 6 = 12
+ 𝟔 − 𝟏𝟐 = 𝟏𝟐 𝐨𝐫 𝟎. 𝟓𝟖𝟑
Sample Space
The sample space of an experiment is a list of all possible outcomes in that experiment
For instance, if we were to flip a coin and then observe which side the coin has landed on (experiment), then
there would be two possible outcomes to this experiment: heads or tails (sample space).
Sample Space: {H, T}
Experiment - Flip a Coin Twice (or Flip Two Coins):
1st coin
Heads
Heads
Tails
Tails
2nd coin
Heads
Tails
Heads
Tails
We can simply write the sample space out as follows: {HH HT TH TT}
Experiment - Flip a coin three times:
1st coin
Heads
Heads
Heads
Heads
Tails
Tails
Tails
Tails
2nd coin
Heads
Heads
Tails
Tails
Heads
Heads
Tails
Tails
3rd coin
Heads
Tails
Heads
Tails
Heads
Tails
Heads
Tails
Sample Space: {HHH HHT HTH HTT THH THT TTH TTT}
Experiment - Roll 2 Dice:
11
21
31
41
51
61
12
22
32
42
52
62
13
23
33
43
53
63
14
24
34
44
54
64
15
25
35
45
55
65
16
26
36
46
56
66
Sample Midterm Winter 2016 – Question 3
Consider the following experiment: A ball is drawn from an urn containing 2 red balls, 2 white balls and
4 blue balls. If the ball drawn is white, a fair coin is flipped and the outcome is recorded. If the ball is
blue, a card is drawn from a standard (52 card) deck and the suit (club ♣, diamond ♦, heart ♦ and spade
♠, present in equal proportions in the deck) is recorded. If the ball is red, a fair six-sided die is rolled and
the top number is recorded.
(a) List the sample points in this experiment. What is the probability of each sample point?
•
Let’s write out some quick notes first:
o 8 balls {2 red, 2 white, 4 blue}
o 4 suits {♣ ♥ ♠ ♦} each equally common
o 2 sides to a coin {heads, tails]
•
Now let’s make this table:
When Ball is Red
When Ball is White
Ball
Die
Probability
Ball
Suit
Probability
Red
1
2 1
2
• =
8 6 48
Blue
♣
Red
2
2 1
2
• =
8 6 48
Blue
Red
3
2 1
2
• =
8 6 48
Red
4
2 1
2
• =
8 6 48
Red
5
2 1
2
• =
8 6 48
Red
6
2 1
2
• =
8 6 48
➢ Note that:
•
When Ball is Blue
2
48
Ball
Coin
Probability
4 1
4
• =
8 4 32
White
Heads
2 1
2
• =
8 2 16
♥
4 1
4
• =
8 4 32
White
Tails
2 1
2
• =
8 2 16
Blue
♠
4 1
4
• =
8 4 32
Blue
♦
4 1
4
• =
8 4 32
1
= 24
4
32
1
=8
Make sure all probabilities in our table add up to one:
6•
1
1
1
+4• + 2• =1
24
8
8
2
16
1
=8