Download Continuous Random Variables Powerpoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Problem: Diagnosing Spina Bifida
The procedure of amniocentesis involves
drawing a sample of the amniotic fluid that
surrounds an unborn child in its mother’s
womb.
High concentration of alpha fetoprotein can
indicate the condition spina bifida.
Concentration of alpha fetoprotein tends to
increase with the size of the foetus.
Amniocentesis results in miscarriage for 1%.
Preliminary tests involve measuring the level
of alpha fetoprotein in the mother’s urine.
C6, L1, S1
Problem: Diagnosing Spina Bifida
For mothers with normal foetuses, the mean level of
alpha fetoprotein is 15.73 moles/litre with a
standard deviation of 0.72 moles/litre.
For mothers carrying foetuses with spina bifida, the
mean is 23.05 and the standard deviation is 4.08.
In both groups the distribution of alpha fetoprotein
appears to be approximately Normally distributed.
15.73
23.05
C6, L1, S2
Problem: Diagnosing Spina Bifida
To operate a diagnostic test for spina bifida,
set a threshold concentration of alpha
fetoprotein, T, say.
If the alpha fetoprotein level is below T, then
the foetus is diagnosed as not having spina
bifida.
If the level is above T, then further testing is
required.
C6, L1, S3
Problem: Diagnosing Spina Bifida
If T was set at 17.80 moles/litre:
What is the probability that a foetus with spina
bifida is correctly diagnosed?
What is the probability that a foetus not suffering
from spina bifida is correctly diagnosed?
If they wanted to ensure that 99% of foetuses with
spina bifida were correctly diagnosed, at what level
should they set T ? What are the implications of
setting T at this level?
C6, L1, S4
Chapter 6
Continuous Random Variables
If a random variable, X, can take any value in
some interval of the real line it is called a
continuous random variable.
Eg
Hg levels, height, weight,
alpha fetoprotein concentration, cell
radius, etc. (i.e. usually ‘measures’ )
C6, L1, S5
The Standardized Histogram
§6.1
pages 231-233
Example: Dietary Carbohydrate in the
Workforce
The average daily intake of carbohydrate in the
diet of 5929 people.
C6, L1, S6
The Standardized Histogram
The histogram of the data shows the carbohydrate
intake:
.004
.002
.000
0
200
400
600
Carbohydrate (g/day)
800
• Is unimodal (modal class 200 – 225 g/day)
C6, L1, S7
The Standardized Histogram
The histogram of the data shows the carbohydrate
intake:
.004
.002
.000
0
200
400
600
Carbohydrate (g/day)
800
• Skewed to larger values (skewed right)
C6, L1, S8
The Standardized Histogram
The histogram of the data shows the carbohydrate
intake:
.004
.002
.000
0
200
400
600
Carbohydrate (g/day)
800
• Has huge variability (highest consumers more
than 10 times that of lowest consumers)
C6, L1, S9
The Standardized Histogram
Area between a = 225 and b = 375 shaded
Shaded area = 0.483
(Corresponds to 48.3% of
observations)
.004
.002
.000
0
2 25
600
800
375
C6, L1, S10
The Standardized Histogram
The standardized histogram adjusts the
height of the rectangle or bar to relative freq. or
proportion divided by width so that
Area = Estimated Probability.
Shaded area = 0.483
(Corresponds to 48.3% of
observations)
.004
.002
.000
0
225
375
60 0
80 0
C6, L1, S11
The Standardized Histogram
i.e. The area of the ith rectangle tells us what
proportion of the data lie in the ith class
interval.
Shaded area = 0.483
(Corresponds to 48.3% of
observations)
.004
.002
.000
0
225
375
60 0
80 0
C6, L1, S12
The Standardized Histogram
For a standardized histogram:
The vertical scale is :
Relative frequency / interval width
(density scale)
Total area under the histogram = 1
The proportion of the data between a and b
is the area under the histogram
between a and b.
C6, L1, S13
The Standardized Histogram
With approximating curve
. 004
. 002
0
200
400
600
800
Carbohydrate (g/day)
C6, L1, S14
The Standardized Histogram
Area between a = 225 and b = 375 shaded
Shaded area = .486
.04
(cf. area = .483 for histogram)
.002
0
225
375
600
800
This area is calculated to be 0.486 and is very
close to the proportion of people who had
carbohydrate intake of between 225 and 375 g/day.
C6, L1, S15
Radius of Maliginant Tumor Cells
In JMP select Histogram Options > Density
Axis to create a standardized histogram
The histogram on the left is for cell radii of malignant
tumor fine needle aspirations in the breast cancer study
from your 2nd assignment.
X = radius of a randomly selected malignant tumor cell
We estimate that,
P(14 < X < 15) = .10
or a 10% chance
C6, L1, S16
AFP Levels in Spina Bifida Cases
In JMP select Histogram Options > Density
Axis to create a standardized histogram
The histogram on the left is AFP levels
found in the urine of mothers carrying a
fetus with spina bifida.
X = AFP level of random select mother carrying
fetus with spina bifida.
We estimate that,
P(22.5 < X < 25) = 2.5 X .10 = .25
or a 25% chance
C6, L1, S17
Smooth Density Curves
Take a standardized histogram, decrease the
width of the class intervals and increase the
number of observations. Then the top of the
histogram tends to a smooth curve.
C6, L1, S18
Histogram  Density Curves as sample
size increases! (AFP Levels)
n = 500
n = 100
n = 10,000
n = 100,000
n = 1,000,000
0.08
0.05
Density
0.10
0.03
10
20
30
40
C6, L1, S19
Properties of the Probability Density
Function (p.d.f.)
1. f(x)  0 (i.e. the p.d.f. curve stays above
the x-axis)
2. P(a  X  b) = area from a to b
beneath the p.d.f curve
3. Area under the p.d.f. curve = 1
C6, L1, S20
Endpoints of Intervals
For a continuous random variable, X,
endpoints of intervals are unimportant.
P(a  X  b) = P(a < X  b)
= P(a  X < b)
= P(a < X < b)
= area from a to b between the
p.d.f. curve and the x-axis.
(Inclusion or exclusion of the endpoints will not
change the area.)
C6, L1, S21
The Normal Distribution
Limiting smooth bell shaped symmetric curve
is called the Normal p.d.f. curve.
Is symmetric about
the mean.
Mean = Median
50%
50%
Mean 
If a random variable, X, has a Normal
distribution with a mean and a standard
deviation we write:
X ~ Normal ( ,  )
parameters
C6, L1, S22
The Normal Distribution
• The Normal distribution is important because:
– it fits a lot of data reasonably well;
– it can be used to approximate other distributions;
– it is important in statistical inference (see later
work).
C6, L1, S23
The Normal Distribution
• A Normal distribution is solely determined by  and .
(a) Changing 
Shifts the curve along the axis
C6, L1, S24
The Normal Distribution
A Normal distribution is solely determined by  and .
(b) Increasing

Increases the spread and flattens the curve
C6, L1, S25
Spina Bifida Example
Let X be the AFP level found in the urine of
mother carrying a foetus with spina bifida.
We will assume that the AFP level is normally
distributed with a mean of  = 23.05 moles/L and
a standard deviation of  = 4.08 moles/L .
AFP Levels for Mothers Carrying Spina Bifida Foetus
C6, L1, S26
Spina Bifida Example (Empirical Rule)
Approximately 68 % of mothers in this population
will have a AFP levels within 1 standard deviation of
the mean.
i.e., approximately 68 % of mothers in this
population will have AFP levels
between 23.05 – 4.08 and 23.05 + 4.08
=
between
18.97 and 27.13
C6, L1, S27
Spina Bifida Example (Empirical Rule)
Approximately 95 % of mothers in this population
will have a AFP levels within 2 standard deviation of
the mean.
i.e., approximately 95 % of mothers in this
population will have AFP levels
between 23.05 - 2  4.08 and 23.05 + 2  4.08
= between 14.89 and 31.21
C6, L1, S28
Spina Bifida Example (Empirical Rule)
Approximately 99.73 % of mothers in this
population will have a AFP levels within 3 standard
deviation of the mean.
i.e., approximately 99.73% of mothers in this
population will have AFP levels
between 23.05 - 3  4.08 and 23.05 - 3  4.08
= between 10.81 and 35.29
C6, L1, S29
The Normal Distribution
For the Normal Distribution:
A random observation has approximately:
– 68% chance of falling within 1 of  ;
– 95% chance of falling within 2 of  ;
– 99.7% chance of falling within 3 of  .
Or:
In a Normal distribution, approximately:
– 68% of observations are within 1 of  ;
– 95% of observations are within 2 of  ;
– 99.7% of observations are within 3 of  .
C6, L1, S30
The Normal Distribution
Probabilities and numbers of standard
deviations
Shaded area = 0.683
-  +
68% chance of
falling between
 -  and  + 
Shaded area = 0.954
 - 2

 + 2
95% chance of
falling between
 - 2 and  + 2
Shaded area = 0.997
 - 3

 + 3
99.7% chance of
falling between
 - 3 and  + 3
C6, L1, S31
Problem: Diagnosing Spina Bifida
For mothers with normal foetuses, the mean level of
alpha fetoprotein is 15.73 moles/litre with a
standard deviation of 0.72 moles/litre.
For mothers carrying foetuses with spina bifida, the
mean is 23.05 and the standard deviation is 4.08.
In both groups the distribution of alpha fetoprotein
appears to be approximately Normally distributed.
Given this
For example
weinformation
might like to
want>to17.8)
be able or
to
find:we P(X
find probabilities
P(19 <with
X < these
25) etc…
associated
distributions.
for either
group.
15.73
23.05
C6, L1, S32
Obtaining Probabilities
Normal distribution probabilities can be
obtained from all statistical packages by
giving the mean and standard deviation of
the distribution.
– Most tables give the value of P(X  x).
– i.e., cumulative or lower tail probabilities.
OR
Area = P(X  x)
x
C6, L1, S33
Obtaining Probabilities
Basic method for obtaining probabilities
1. Sketch a Normal curve, marking on the mean
and values of interest.
2. Shade the area under the curve corresponding
to the required probability.
3. Convert all values to their z-scores
4. Obtain the desired probability using the normal
table in the front inside cover of your text, or
better yet use JMP.
C6, L1, S34
Standard Normal Distribution
GO TO NOTES ON STANDARD
NORMAL DISTRIBUTION
C6, L1, S35
Original problem:
Diagnosing Spina Bifida
15.73 23.05
Recall:
• For normal foetuses  =15.73,  = 0.72 and
for foetuses with spina bifida  = 23.05 and
 = 4.08.
• Assume the threshold for detecting
spina bifida is set at 17.8.
– (A foetus would be diagnosed as not having
spina bifida if the fetoprotein level is below 17.8)
C6, L1, S36
Original problem:
Diagnosing Spina Bifida
15.73 23.05
a) What is the probability that a foetus not
suffering from spina bifida is correctly
diagnosed?
Let X be level of fetoprotein in normal foetus
X ~ Normal (15.73, 0.72) What is P(X < 17.8)?
P(X < 17.8) = P(Z < z-score for 17.8)
z-score = (17.8 – 15.73)/.72
= 1.13/.72 = 1.57
P(X < 1.57) = .9420
0
15.73
1.57
C6, L1, S37
17.8
Original problem:
Diagnosing Spina Bifida
15.73 23.05
b) What is the probability that a foetus with
spina bifida is correctly diagnosed?
Let Y be the level of fetoprotein in a spina
bifida foetus. Y ~ Normal (23.05, 4.08)
P(Y > 17.8) = P(Z > z-score for Y = 17.8)
P(Z > -1.29)
= 1 -=P(Z
< -1.29)
z-score
(17.8
– 23.05)/4.08
= 1 – 0.099
= -5.25/4.08 = -1.29
= 0.901
17.8
-1.29
023.05
C6, L1, S38
Original problem:
Diagnosing Spina Bifida
15.73 23.05
If they wanted to ensure that 99% of foetuses with
spina bifida were correctly diagnosed, at what
level should they set T ?
Find a value T so that if
First
theensures
z-score
associated
with T by
T = find
13.54
From
Normal
Table
we 99%
find of foetuses
finding
z so that
with
spina
bifida
will
be
identified.
Y
~
Normal
(23.05,
4.08)
P(Z < -2.33) = .0100 thus
P(Z < z) = .0100
TThis
=
+probability
 have
x z = 23.05 – 4.08 x 2.33 = 13.54
we will
is called the
sensitivity
P(Y > T) = .9900 or P(Y < T) = .0100
C6, L1, S39
Standard Normal Probabilities in JMP
• Normal Probability Calculator.JMP from Tutorials
section of course website.
• Here it is ready to calculate probabilities for the
standard normal distribution. ( = 0,  = 1)
C6, L1, S40
Arbitrary Normal Probabilities in JMP
• Change the mean and standard deviation columns to
contain the desired values. For mothers carrying
foetus with spina bifida: X ~ N(23.05,4.08), i.e.
=23.05 moles/liter & =4.08 moles/liter
Here we have found P(X < 17.8)
C6, L1, S41