Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Section 5 Part 1: Distributions for Numeric Data Normal Distribution Probability Review Section 4 covered these Probability topics – Addition rule for mutually exclusive events – Joint, Marginal and Conditional probability for two non-mutually exclusive events – Identifying Independent events – Screening test measures as examples of conditional probabilities – Application of Bayes’ Theorem to calculate PPV and NPV PubH 6414 Section 5 Part 1 2 Section 5 Part 1 Outline • • • • Probability Distributions Normal Distribution Standard Normal (Z) Distribution R functions for normal distribution probabilities – pnorm – qnorm PubH 6414 Section 5 Part 1 3 Probability Distributions • Any characteristic that can be measured or categorized is called a variable. • If the variable can assume a number of different values such that any particular outcome is determined by chance it is called a random variable. • Every random variable has a corresponding probability distribution. • Probability distribution: The value the data takes on and how often it takes on those values. PubH 6414 Section 5 Part 1 4 Random Nominal Variable • Blood type is a random nominal variable • The blood type of a randomly selected individual is unknown but the distribution of blood types in the population can be described Distribution of Blood types in the US Probability 50% 40% 30% 20% 10% 0% O A B AB Blood Type PubH 6414 Section 5 Part 1 5 Probability Distributions for Numerical Data • Continuous Data can take on any value within the range of possible values so describing the distribution of continuous data in a table is not very practical. • One solution is a histogram. PubH 6414 Section 5 Part 1 6 Numerical Data: Duration of Eruptions for Old Faithful PubH 6414 Section 5 Part 1 7 Histogram PubH 6414 Section 5 Part 1 8 Probability Density Curve • Definition: a probability density curve is a curve describing a continuous probability distribution • Unlike a histogram of continuous data, the probability density curve is a smooth line. • Thought experiment: Imagine a histogram with the width of the intervals getting smaller and smaller and the sample size getting larger and larger. PubH 6414 Section 5 Part 1 9 The Histogram and the Probability Density Curve PubH 6414 Section 5 Part 1 10 0 .1 Percentage .2 .3 .4 The Histogram and the Probability Density Curve 80 100 120 140 Systolic BP (mmHg) 160 180 The Probability Density Curve for BP values in the entire population of men – there are no bars because the population is infinite. PubH 6414 Section 5 Part 1 11 Area under a Probability Density Curve • The probability density is a smooth idealized curve that shows the shape of the distribution of a random variable in the population • The total area under a probability density curve = 1.0 • The probability density curve in the systolic blood pressure example has the bell-shape of a normal distribution. Not all probability density curves are bell shaped. PubH 6414 Section 5 Part 1 12 Shapes of Probability Density Curves – Symmetric – Right Skewed – Left Skewed – Unimodal – Bimodal – Multimodal PubH 6414 Section 5 Part 1 13 Normal Distribution • The Normal Distribution is also called the Gaussian Distribution after Karl Friedrich Gauss, a German mathematician (1777 – 1855) • Characteristics of any Normal Distribution – Bell-shaped curve – Unimodal – peak is at the mean – Symmetric about the mean – Mean = Median = Mode – ‘Tails’ of the curve extend to infinity in both directions PubH 6414 Section 5 Part 1 14 Normal Distribution PubH 6414 Section 5 Part 1 15 Symbol Notation A convention in statistics notation is to use Roman letters for sample statistics and Greek letters for population parameters. Since the density curve describes the population, Greek letters are used for the mean (mu) and SD (sigma) Symbol Mean Standard Deviation Sample X s PubH 6414 Section 5 Part 1 Density Curve µ σ 16 Describing Normal Distributions • Every Normal distribution is uniquely described by it’s mean (µ) and standard deviation (σ) • The Notation for a normal distribution is N(µ, σ) • N(125, 4) refers to a normal distribution with mean = 125 and standard deviation = 4. What is the variance (σ2) for this distribution? ________ PubH 6414 Section 5 Part 1 17 Normal density with mean=5 and σ =1 Two normal densities with different mean values and same σ Two normal densities with different σ and the same mean PubH 6414 Section 5 Part 1 18 The 68-95-99.7 Approximation for all Normal Distributions Regardless of the mean and standard deviation of the normal distribution: • 68% of the observations fall within one standard deviation of the mean • 95% of the observations fall within approximately* two standard deviations of the mean • 99.7% of the observations fall within three standard deviations of the mean * 95% of the observations fall within 1.96 SD of the mean PubH 6414 Section 5 Part 1 19 Distributions of Blood Pressure .4 µ= 125 mmHG σ = 14 mmHG .3 68% .2 95% 99.7% .1 0 83 97 111 125 139 153 167 The 68-95-99.7 rule applied to the distribution of systolic blood pressure in men. PubH 6414 Section 5 Part 1 20 Calculating Probabilities from a Normal Distribution Curve • The total area under the curve = 1.0 which is the total probability • Areas for intervals under the curve can be interpreted as probability • What is the probability that a man has blood pressure between 111 and 139 mmHg? PubH 6414 Section 5 Part 1 21 Areas under the Curve • What if you wanted to find the probability of a man having SBP < 105 mmHg? The 68-95-99.7 rule can’t be used to find this area under the curve! 83 97 111 125 139 153 167 SBP in mmHg PubH 6414 Section 5 Part 1 22 Calculating the Areas under the Curve • Use integral Calculus with formula on slide 15. • Table A-2 in the text is a table of areas under the standard normal curve – the normal distribution with mean = 0 and standard deviation = 1 • The pnorm function in R can be used to find the area under a normal distribution density curve. PubH 6414 Section 5 Part 1 23 Using pnorm in R To find P(X< x*): > pnorm(x*, µ, σ) To find P(X>x*): > 1- pnorm(x*, µ, σ) PubH 6414 Section 5 Part 1 24 Using pnorm in R • What is the probability that a randomly selected man has SBP < 105 mmHg? Interpretation? > pnorm(105,125,14) [1] 0.07656373 PubH 6414 Section 5 Part 1 25 Areas under the Curve • What if you wanted to find the probability of a man having SBP > 150? 83 97 111 125 139 153 167 SBP in mmHg PubH 6414 Section 5 Part 1 26 Using pnorm in R • What is the probability that a man has SBP > 150 mmHg? Interpretation? > 1-pnorm(150,125,14) [1] 0.03707277 PubH 6414 Section 5 Part 1 > 27 Areas under the Curve • What if you wanted to find the probability of a man having SBP between 115 and 135? 83 97 111 125 139 153 167 SBP in mmHg PubH 6414 Section 5 Part 1 28 Using pnorm in R • What is the probability that a man has SBP between 115 and 135 mmHg? Interpretation??? > pnorm(135,125,14)-pnorm(115,125,14) [1] 0.5249495 PubH 6414 Section 5 Part 1 29 Standard Normal Distribution • The Standard Normal Distribution is the normal distribution with − x2 1 – Mean = 0 f ( x) = e 2 2π – Standard deviation = 1 Z ~ N(0,1) PubH 6414 Section 5 Part 1 30 Formula for Z-score Z= X −µ σ The value of Z tells us how many standard deviations above or below the mean the observed values falls. PubH 6414 Section 5 Part 1 31 Standard Normal Scores • Z = 1: The observation lies one SD above the mean • Z = 2: The observation is two SD above the mean • Z = -1: The observation lies 1 SD below the mean • Z = -2: The observation lies 2 SD below the mean What does Z = 0 represent? PubH 6414 Section 5 Part 1 32 Standard Normal Scores Recall: Male Systolic Blood Pressure; X ~ N(125, 14) For SBP = 105 mmHg – what is the Z-score? Area > 1.79 Interpretation? PubH 6414 Section 5 Part 1 33 Standard Normal Scores What is the probability of a Z score less than -1.43? Interpretation? > pnorm(-1.43,0,1) [1] 0.07635851 Notice this is the same as: What is the probability that selected Area a > randomly 1.79 man has SBP < 105 mmHg? > pnorm(105,125,14) [1] 0.07656373 PubH 6414 Section 5 Part 1 34 Standard Normal Scores Recall: Male Systolic Blood Pressure; X ~ N(125, 14) For SBP = 150 mmHg – what is the Z-score? Area > 1.79 Interpretation? PubH 6414 Section 5 Part 1 35 Standard Normal Scores What is the probability of a Z score greater than 1.79? Interpretation? > 1-pnorm(1.79,0,1) [1] 0.03672696 Notice this is the same as: What is the probability that a man has SBP > 150 mmHg? > 1-pnorm(150,125,14) [1] 0.03707277 PubH 6414 Section 5 Part 1 36 Z scores – why do it? – z-scores are used to find probabilities from standard normal tables such as Table A-2 in the text appendix – The z-score represents the number of standard deviations an observation is from the mean which can be useful in understanding and visualizing the data. – Probabilities from the standard normal distribution are used in confidence intervals and hypothesis tests PubH 6414 Section 5 Part 1 37 The Inverse problem • What if you instead of finding the area for a z-score you want to know the z-score for a specified area? • For example, the top 10% of the midterms scores are how many standard deviations above the mean? PubH 6414 Section 5 Part 1 38 Inverse problem: Midterm Score • Find a z* value such that the probability of obtaining a larger z score = 0.10. What is this z score? PubH 6414 Section 5 Part 1 39 Using qnorm in R The qnorm finds lower percentiles. qnorm(lower percentile, µ, σ) PubH 6414 Section 5 Part 1 40 Inverse problem: Midterm Score • Find a z* value such that the probability of obtaining a larger z score = 0.10. > qnorm(.90,0,1) [1] 1.281552 What is this z score? PubH 6414 Section 5 Part 1 41 Inverse problem: Midterm Score The top 10% of the midterms scores are how many standard deviations above the mean? PubH 6414 Section 5 Part 1 42 Inverse problem: Serum Sodium The lowest 25% of the serum sodium levels in healthy adults are how many standard deviations below the mean? PubH 6414 Section 5 Part 1 43 Inverse problem: Serum Sodium • Find a z-value such that the probability of obtaining a smaller z score = 0.25 >qnorm(0.25,0,1) [1] -0.6744898 What is this z-score? PubH 6414 Section 5 Part 1 44 Inverse Problem: Serum Sodium What Serum sodium (mEq/L) level corresponds to 0.67 standard deviations below the mean? Serum sodium ~ N(141,3) PubH 6414 Section 5 Part 1 45 Inverse Problem: Serum Sodium What Serum sodium (mEq/L) level corresponds to 0.67 standard deviations below the mean? Serum sodium ~ N(141,3) > qnorm(0.25,141,3) [1] 138.9765 PubH 6414 Section 5 Part 1 46