Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Problem: Diagnosing Spina Bifida The procedure of amniocentesis involves drawing a sample of the amniotic fluid that surrounds an unborn child in its mother’s womb. High concentration of alpha fetoprotein can indicate the condition spina bifida. Concentration of alpha fetoprotein tends to increase with the size of the foetus. Amniocentesis results in miscarriage for 1%. Preliminary tests involve measuring the level of alpha fetoprotein in the mother’s urine. C6, L1, S1 Problem: Diagnosing Spina Bifida For mothers with normal foetuses, the mean level of alpha fetoprotein is 15.73 moles/litre with a standard deviation of 0.72 moles/litre. For mothers carrying foetuses with spina bifida, the mean is 23.05 and the standard deviation is 4.08. In both groups the distribution of alpha fetoprotein appears to be approximately Normally distributed. 15.73 23.05 C6, L1, S2 Problem: Diagnosing Spina Bifida To operate a diagnostic test for spina bifida, set a threshold concentration of alpha fetoprotein, T, say. If the alpha fetoprotein level is below T, then the foetus is diagnosed as not having spina bifida. If the level is above T, then further testing is required. C6, L1, S3 Problem: Diagnosing Spina Bifida If T was set at 17.80 moles/litre: What is the probability that a foetus with spina bifida is correctly diagnosed? What is the probability that a foetus not suffering from spina bifida is correctly diagnosed? If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level should they set T ? What are the implications of setting T at this level? C6, L1, S4 Chapter 6 Continuous Random Variables If a random variable, X, can take any value in some interval of the real line it is called a continuous random variable. Eg Hg levels, height, weight, alpha fetoprotein concentration, cell radius, etc. (i.e. usually ‘measures’ ) C6, L1, S5 The Standardized Histogram §6.1 pages 231-233 Example: Dietary Carbohydrate in the Workforce The average daily intake of carbohydrate in the diet of 5929 people. C6, L1, S6 The Standardized Histogram The histogram of the data shows the carbohydrate intake: .004 .002 .000 0 200 400 600 Carbohydrate (g/day) 800 • Is unimodal (modal class 200 – 225 g/day) C6, L1, S7 The Standardized Histogram The histogram of the data shows the carbohydrate intake: .004 .002 .000 0 200 400 600 Carbohydrate (g/day) 800 • Skewed to larger values (skewed right) C6, L1, S8 The Standardized Histogram The histogram of the data shows the carbohydrate intake: .004 .002 .000 0 200 400 600 Carbohydrate (g/day) 800 • Has huge variability (highest consumers more than 10 times that of lowest consumers) C6, L1, S9 The Standardized Histogram Area between a = 225 and b = 375 shaded Shaded area = 0.483 (Corresponds to 48.3% of observations) .004 .002 .000 0 2 25 600 800 375 C6, L1, S10 The Standardized Histogram The standardized histogram adjusts the height of the rectangle or bar to relative freq. or proportion divided by width so that Area = Estimated Probability. Shaded area = 0.483 (Corresponds to 48.3% of observations) .004 .002 .000 0 225 375 60 0 80 0 C6, L1, S11 The Standardized Histogram i.e. The area of the ith rectangle tells us what proportion of the data lie in the ith class interval. Shaded area = 0.483 (Corresponds to 48.3% of observations) .004 .002 .000 0 225 375 60 0 80 0 C6, L1, S12 The Standardized Histogram For a standardized histogram: The vertical scale is : Relative frequency / interval width (density scale) Total area under the histogram = 1 The proportion of the data between a and b is the area under the histogram between a and b. C6, L1, S13 The Standardized Histogram With approximating curve . 004 . 002 0 200 400 600 800 Carbohydrate (g/day) C6, L1, S14 The Standardized Histogram Area between a = 225 and b = 375 shaded Shaded area = .486 .04 (cf. area = .483 for histogram) .002 0 225 375 600 800 This area is calculated to be 0.486 and is very close to the proportion of people who had carbohydrate intake of between 225 and 375 g/day. C6, L1, S15 Radius of Maliginant Tumor Cells In JMP select Histogram Options > Density Axis to create a standardized histogram The histogram on the left is for cell radii of malignant tumor fine needle aspirations in the breast cancer study from your 2nd assignment. X = radius of a randomly selected malignant tumor cell We estimate that, P(14 < X < 15) = .10 or a 10% chance C6, L1, S16 AFP Levels in Spina Bifida Cases In JMP select Histogram Options > Density Axis to create a standardized histogram The histogram on the left is AFP levels found in the urine of mothers carrying a fetus with spina bifida. X = AFP level of random select mother carrying fetus with spina bifida. We estimate that, P(22.5 < X < 25) = 2.5 X .10 = .25 or a 25% chance C6, L1, S17 Smooth Density Curves Take a standardized histogram, decrease the width of the class intervals and increase the number of observations. Then the top of the histogram tends to a smooth curve. C6, L1, S18 Histogram Density Curves as sample size increases! (AFP Levels) n = 500 n = 100 n = 10,000 n = 100,000 n = 1,000,000 0.08 0.05 Density 0.10 0.03 10 20 30 40 C6, L1, S19 Properties of the Probability Density Function (p.d.f.) 1. f(x) 0 (i.e. the p.d.f. curve stays above the x-axis) 2. P(a X b) = area from a to b beneath the p.d.f curve 3. Area under the p.d.f. curve = 1 C6, L1, S20 Endpoints of Intervals For a continuous random variable, X, endpoints of intervals are unimportant. P(a X b) = P(a < X b) = P(a X < b) = P(a < X < b) = area from a to b between the p.d.f. curve and the x-axis. (Inclusion or exclusion of the endpoints will not change the area.) C6, L1, S21 The Normal Distribution Limiting smooth bell shaped symmetric curve is called the Normal p.d.f. curve. Is symmetric about the mean. Mean = Median 50% 50% Mean If a random variable, X, has a Normal distribution with a mean and a standard deviation we write: X ~ Normal ( , ) parameters C6, L1, S22 The Normal Distribution • The Normal distribution is important because: – it fits a lot of data reasonably well; – it can be used to approximate other distributions; – it is important in statistical inference (see later work). C6, L1, S23 The Normal Distribution • A Normal distribution is solely determined by and . (a) Changing Shifts the curve along the axis C6, L1, S24 The Normal Distribution A Normal distribution is solely determined by and . (b) Increasing Increases the spread and flattens the curve C6, L1, S25 Spina Bifida Example Let X be the AFP level found in the urine of mother carrying a foetus with spina bifida. We will assume that the AFP level is normally distributed with a mean of = 23.05 moles/L and a standard deviation of = 4.08 moles/L . AFP Levels for Mothers Carrying Spina Bifida Foetus C6, L1, S26 Spina Bifida Example (Empirical Rule) Approximately 68 % of mothers in this population will have a AFP levels within 1 standard deviation of the mean. i.e., approximately 68 % of mothers in this population will have AFP levels between 23.05 – 4.08 and 23.05 + 4.08 = between 18.97 and 27.13 C6, L1, S27 Spina Bifida Example (Empirical Rule) Approximately 95 % of mothers in this population will have a AFP levels within 2 standard deviation of the mean. i.e., approximately 95 % of mothers in this population will have AFP levels between 23.05 - 2 4.08 and 23.05 + 2 4.08 = between 14.89 and 31.21 C6, L1, S28 Spina Bifida Example (Empirical Rule) Approximately 99.73 % of mothers in this population will have a AFP levels within 3 standard deviation of the mean. i.e., approximately 99.73% of mothers in this population will have AFP levels between 23.05 - 3 4.08 and 23.05 - 3 4.08 = between 10.81 and 35.29 C6, L1, S29 The Normal Distribution For the Normal Distribution: A random observation has approximately: – 68% chance of falling within 1 of ; – 95% chance of falling within 2 of ; – 99.7% chance of falling within 3 of . Or: In a Normal distribution, approximately: – 68% of observations are within 1 of ; – 95% of observations are within 2 of ; – 99.7% of observations are within 3 of . C6, L1, S30 The Normal Distribution Probabilities and numbers of standard deviations Shaded area = 0.683 - + 68% chance of falling between - and + Shaded area = 0.954 - 2 + 2 95% chance of falling between - 2 and + 2 Shaded area = 0.997 - 3 + 3 99.7% chance of falling between - 3 and + 3 C6, L1, S31 Problem: Diagnosing Spina Bifida For mothers with normal foetuses, the mean level of alpha fetoprotein is 15.73 moles/litre with a standard deviation of 0.72 moles/litre. For mothers carrying foetuses with spina bifida, the mean is 23.05 and the standard deviation is 4.08. In both groups the distribution of alpha fetoprotein appears to be approximately Normally distributed. Given this For example weinformation might like to want>to17.8) be able or to find:we P(X find probabilities P(19 <with X < these 25) etc… associated distributions. for either group. 15.73 23.05 C6, L1, S32 Obtaining Probabilities Normal distribution probabilities can be obtained from all statistical packages by giving the mean and standard deviation of the distribution. – Most tables give the value of P(X x). – i.e., cumulative or lower tail probabilities. OR Area = P(X x) x C6, L1, S33 Obtaining Probabilities Basic method for obtaining probabilities 1. Sketch a Normal curve, marking on the mean and values of interest. 2. Shade the area under the curve corresponding to the required probability. 3. Convert all values to their z-scores 4. Obtain the desired probability using the normal table in the front inside cover of your text, or better yet use JMP. C6, L1, S34 Standard Normal Distribution GO TO NOTES ON STANDARD NORMAL DISTRIBUTION C6, L1, S35 Original problem: Diagnosing Spina Bifida 15.73 23.05 Recall: • For normal foetuses =15.73, = 0.72 and for foetuses with spina bifida = 23.05 and = 4.08. • Assume the threshold for detecting spina bifida is set at 17.8. – (A foetus would be diagnosed as not having spina bifida if the fetoprotein level is below 17.8) C6, L1, S36 Original problem: Diagnosing Spina Bifida 15.73 23.05 a) What is the probability that a foetus not suffering from spina bifida is correctly diagnosed? Let X be level of fetoprotein in normal foetus X ~ Normal (15.73, 0.72) What is P(X < 17.8)? P(X < 17.8) = P(Z < z-score for 17.8) z-score = (17.8 – 15.73)/.72 = 1.13/.72 = 1.57 P(X < 1.57) = .9420 0 15.73 1.57 C6, L1, S37 17.8 Original problem: Diagnosing Spina Bifida 15.73 23.05 b) What is the probability that a foetus with spina bifida is correctly diagnosed? Let Y be the level of fetoprotein in a spina bifida foetus. Y ~ Normal (23.05, 4.08) P(Y > 17.8) = P(Z > z-score for Y = 17.8) P(Z > -1.29) = 1 -=P(Z < -1.29) z-score (17.8 – 23.05)/4.08 = 1 – 0.099 = -5.25/4.08 = -1.29 = 0.901 17.8 -1.29 023.05 C6, L1, S38 Original problem: Diagnosing Spina Bifida 15.73 23.05 If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level should they set T ? Find a value T so that if First theensures z-score associated with T by T = find 13.54 From Normal Table we 99% find of foetuses finding z so that with spina bifida will be identified. Y ~ Normal (23.05, 4.08) P(Z < -2.33) = .0100 thus P(Z < z) = .0100 TThis = +probability have x z = 23.05 – 4.08 x 2.33 = 13.54 we will is called the sensitivity P(Y > T) = .9900 or P(Y < T) = .0100 C6, L1, S39 Standard Normal Probabilities in JMP • Normal Probability Calculator.JMP from Tutorials section of course website. • Here it is ready to calculate probabilities for the standard normal distribution. ( = 0, = 1) C6, L1, S40 Arbitrary Normal Probabilities in JMP • Change the mean and standard deviation columns to contain the desired values. For mothers carrying foetus with spina bifida: X ~ N(23.05,4.08), i.e. =23.05 moles/liter & =4.08 moles/liter Here we have found P(X < 17.8) C6, L1, S41