Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Law of large numbers wikipedia , lookup

Central limit theorem wikipedia , lookup

Normal distribution wikipedia , lookup

Student's t-distribution wikipedia , lookup

Multimodal distribution wikipedia , lookup

Transcript
Last lecture summary
• Bias, Bessel's correction
• MAD
• Normal distribution
• Empirical rule
Standard deviation – empirical rule
Standard deviation – empirical rule
Standard deviation – empirical rule
The nature of the normal distribution
• In laboratory experiments, results vary due to several
factors: imprecise weighting of reagents, imprecise
pipetting, nonhomogeneous suspensions of cells or
membarnes ...
• Similarly, variation in a clinical value might be caused by
many genetic and environmental factors.
• These random factors are independent and they tend to
offset each other.
• Variation among values will approximate a Gaussian
distribution
• when there are many independent sources of variation
• when individual sources add up to get the final result
New stuff
STANDARD NORMAL
DISTRIBUTION
Who is more popular?
Who is more popular
s.d. = 36
Z = -3.53
s.d. = 60
Z = -2.57
Standardizing
Formula
𝑥−𝜇
𝑍=
𝜎
Quiz
• What does a negative Z-score mean?
1. The original value is negative.
2. The original value is less than mean.
3. The original value is less than 0.
4. The original value minus the mean is negative.
Quiz II
• If we standardize a distribution by converting every value
to a Z-score, what will be the new mean of this
standardized distribution?
• If we standardize a distribution by converting every value
to a Z-score, what will be the new standard deviation of
this standardized distribution?
Standard normal distribution
N(𝜇,σ)
N(0,1)
Z
Z – number of standard
deviations away from the
mean
If the Z-value is 1, how
many percent are less than
that value?
cca 84 %
-3
-2
-1
0
+1
+2
+3
Proportion of human heights
𝑥 = 173 cm
𝑠 = 5 cm
𝑥 = 173 cm
𝑠 = 5 cm
-2
-1
0
+1
+2
Quiz
• Approximately what proportion of people is smaller than
168 cm?
𝑥 = 173 cm
𝑠 = 5 cm
16%
163
168
173
178
183
Quiz
• Approximately what proportion of people is higher than
183 cm?
𝑥 = 173 cm
𝑠 = 5 cm
2.5%
163
168
173
178
183
Quiz
• Approximately what proportion of people is between 163
cm and 178 cm high?
𝑥 = 173 cm
𝑠 = 5 cm
81.5%
163
168
173
178
183
Quiz
• Approximately what proportion of people is smaller than
180 cm?
𝑥 = 173 cm
𝑠 = 5 cm
ca 91.5%
163
168
173
178
183
Quiz
• What is the probability of randomly selecting a height in
the sample that is >5 standard deviations above the
mean?
1.
2.
3.
4.
0.01
0.3
0.8
0.99
Quiz
• What is the probability of randomly selecting a height in
the sample that is <5 standard deviations below the
mean?
1.
2.
3.
4.
0.01
0.3
0.8
0.99
Quiz
• What proportion of the data is either below 2 standard
deviations or above 2 standard deviations from the mean
for a normal distribution?
95%
2.5%
2.5%
Z-table
What is the proportion less than the point with the Z-score -2,75?
Use Z-table
What proportion of people is smaller than 180 cm?
𝑥 = 173 cm
𝑠 = 5 cm
180 − 173 7
Z − value =
= = 1.4
5
5
Z-value of 1.4 corresponds to 91.92%.
Quiz – height data
• 𝑛 = 1000, 𝑥 = 173, 𝑠 = 5.0
• What proportion of people is smaller than you?
• 𝑍 =
??−173
5
= , proportion = see Z-table
• What proportion of people is taller than you?
• 𝑍=
??−173
5
= , proportion = 1 − see Z-table
• Table gives a value “less than”.
• Note, that “greater than x” is the same as “less than -x”.
Quiz – height data
• 𝑛 = 1000, 𝑥 = 173, 𝑠 = 5.0
• What proportion of people lie between you and you?
• 𝑍𝐴 =
𝑎−173
5
=, 𝑍𝑏 =
proportion(𝑍𝑏 ) =
𝑏−173
5
=, proportion = proportion 𝑍𝑎 −
• How tall should you be to be in the top 5% of the highest
people?
• 𝑍 − 𝑠𝑐𝑜𝑟𝑒 = 1.645, 173 + 1.645 × 5.0 ≈ 181 cm
An intriguing fact
𝑥 = 173 cm
𝑠 = 5 cm
DISTRIBUTION,
DISTRIBUTION, ARE
YOU NORMAL?
frequency
Life expectancy data – histogram
life expectancy
Making conclusions from a histogram
• What can you tell about life expectancy data?
• how many modes?
• where is the mode?
• symmetric, left skewed or right skewed?
frequency
• outliers – yes or no?
life expectancy
Making conclusions from a histogram
frequency
• Where is the mode, the median, the mean?
life expectancy
Five numbers summary
Min.
47.79
Q1
64.67
8.5
25.4
Median
73.24
>
Q3
76.65
3.5
>
mean = 69.9
10.2
Max.
83.39
Lognormal distribution
• Frazier et al. measured the ability of a drug isoprenaline
to relax the bladder muscle.
• The results are expressed as the EC50, which is the
concentration required to relax the bladder halfway
between its minimum and maximum possible relaxation.
Lognormal distribution
Geometric mean
𝑥 = 1 333 𝑛𝑀
𝑥 = 2.71
𝑥 = 102.71
= 513 nM
Geometric mean – transform all values to their logarithms,
calculate the mean of the logarithms, transform this mean
back to the units of original data (antilog)
The nature of the lognormal distribution
• Lognormal distributions arise when multiple random
factors are multiplied together to determine the value.
• A typical example: cancer (cell division is multiplicative)
• Lognormal distributions are very common in many
scientific fields.
• Drug potency is lognormal
• To analyse lognormal data, do not use methods that
assume the Gaussian distribution. You will get misleding
results (e.g.,non-existing outliers).
• Better way is to convert data to logarithm and analyse the
converted values.