Download PROBABILITY AND PROBABILITY DISTRIBUTIONS

Theoretical Probability Distributions PhD Özgür Tosun THEORETICAL PROBABILITY DISTRIBUTIONS probability is a measure of chance probability distributions help us to study the probabilities associated with outcomes of the variable under study probability theory is the foundation for statistical inference. A probability distribution is a device for indicating the values that a random variable may have THEORETICAL PROBABILITY DISTRIBUTIONS Several theoretical probability distributions are important in biostatistics: I) Binomial II) Poisson III)Normal Discrete probability distributions: Variable takes only integer values. Continuous probability distribution: Variable has values measured on a continuous scale. THE BINOMIAL DISTRIBUTION: •Variable outcomes has only binary/dichotomous (male – female; diseased – not diseased; positive – negative) denoted A and B. •The probability of A is denoted by p. P(A) = p and P(B)= 1-p THE BINOMIAL DISTRIBUTION: •When an experiment is repeated n times, p remains constant (outcome is independent from one trial to another) Such a variable is said to BINOMIAL DISTRIBUTION. follow Daniel Bernoulli a The question is: What is the probability that outcome A occurs x times? Or What proportion of n outcomes will be A? The probability of x outcomes in a group of size n, if each outcome has probability p and is independent from all outcomes is given by Binomial Probability Function: n  x P(A  x)    p (1  p)n-x  x Example For families with 5 children each, what is the probability that i) There will be one male child?  5 P(A  1)    0.501 (1  0.50) 51  0.16 1  Among families with 5 children each, 0.16 have one male child. ii) There will be at least one male children? P(A  1)  P(A  1)  P(A  2)  P(A  3)  P(A  4)  P(A  5) 5      0.5A (1  0.5) 5A A1  A   1 - P(A  0) 5 5   1 -   0.5 0 (1 - 0.5)5  0  1 - 0.03  0.97 Using the probabilities associated with possible outcomes, we can draw a probability distribution for the event under study: ,4 ,3 PROBABILITY ,2 ,1 0,0 ,00 1,00 2,00 3,00 NO. OF MALE CHILDREN 4,00 5,00 Example: Among men with localized prostate tumor and a PSA<10, the 5-year survival is known to be 0.8. We can use Binimial Distribution to calculate the probability that any particular number (A), out of n, will survive 5 years. For example for a new series of 6 such men: Non will survive 5 years : P(A=0)=0,000064 Only 1 will survive 5 years : P(A=1)=0,0015 2 will survive 5 years : P(A=2)=0,015 3 will survive 5 years : P(A=3)=0,082 4 will sıurvive 5 years : P(A=4)=0,246 5 will survive 5 years : P(A=5)=0,393 All will survive 5 years : P(A=6)=0,262 ,5 ,4 ,3 PROBABILITY ,2 ,1 0,0 0 1 2 3 4 NO. SURVIVING 5 YEARS Binomial Distribution for n=6 and p=0.8 5 6 THE POISSON DISTRIBUTION: Like the Binomial, Poisson distribution is a discrete distribution applicable when the outcome is the “number of times an event occurs”. Instead of the probability of an outcome, if average number of occurrence of the event is given, associated probabilities can be calculated by using the Poisson Distribution Function which is defined as:   lambda P( X  A)  Simeon D. Poisson (1781-1840) A  e A! Example. If the average number of hospitalizations for a group of patients is calculated as 3.22, the probability that a patient in the group has zero hospitalizations is 3.220 e 3.22 P(A  0)   0.04 0! The probability that a patient has exactly one hospitalization is 3.22 e P(A  1)  1! 1 3.22  0.129 The probability that a patient will be hospitalized more than 3 times, since the upper limit is unknown, is calculated as P(A>3)=1-P(A3) EXAMPLE Normal Distribution Abraham de Moivre (1667-1754) Karl F. Gauss (1777-1855) NORMAL DISTRIBUTION Normal (Gaussian) distribution is the most famous probability distribution of continuous variables. The two parameters of the normal distribution are the mean (μ) and the standard deviation (σ). The graph has a familiar bell-shaped curve. The function of normal distribution curve is as follows: 1 f ( x)  e  2 1  xi       2   2    xi   The normal distribution is completely defined by the mean and standard deviation of a set of quantitative data:  The mean determines the location of the curve on the x axis of a graph  The standard deviation determines the height of the curve on the y axis There are an infinite number of normal distributions- one for every possible combination of a mean and standard deviation Examples of Normal Distributions Pr(X) on the y-axis refers to either frequency or probability. Examples of Normal Distributions 25 Frequency Frequency 20 15 10 5 0 // 55 Mean 60 65 70 75 80 85 90 Heart Rate (BPM) Many (but not all) continuous variables are approximately normally distributed. Generally, as sample size increases, the shape of a frequency distribution becomes more normally distributed. Frequency of occurrence Mode, Median, Mean When data are normally distributed, the mode, median, and mean are identical and are located at the center of the distribution. Quantitative variables may also have a skewed distribution: When distributions are skewed, they have more extreme values in one direction than the other, resulting in a long tail on one side of the distribution. The direction of the tail determines whether a distribution is positively or negatively skewed. A positively skewed distribution has a long tail on the right, or positive side of the curve. A negatively skewed distribution has the tail on the left, or negative side of the curve. For a normally distributed variable: ~68.3% of the observations lie between the mean and  1 standard deviation ~95.4% lie between the mean and  2 standard deviations ~99.7% lie between the mean and  3 standard deviations Mode, Median, Mean 68.3 % 95.4 % 99.7 % 3 2 1  1 2 3 68.26% 95.44% 99.74%   3   2          2   3 P(     x     )  0.6826 P(   2  x    2 )  0.9544 P(   3  x    3 )  0.9974 Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR Sex HR F 55 M 66 F 70 M 73 F 77 M 79 F 82 M 57 F M 59 F 67 F 70 M 73 F 77 M 79 M 82 67 M 70 M 73 F 77 M 79 F 83 F 61 F 68 M 70 M 73 M 77 F 80 M 83 M 61 F 68 F 71 F 74 M 77 F 80 M 83 M 62 F 68 F 71 F 74 F 78 M 80 F 84 M 62 M 68 M 71 F 74 F 78 F 81 F 84 F 63 F 69 M 71 M 74 F 78 F 81 M 85 F 64 M 69 F 72 F 75 F 78 F 81 F 86 M 64 M 69 M 72 F 75 M 78 M 81 F 86 M 64 M 69 F 73 M 75 M 78 F 82 M 89 M 66 F 70 M 73 M 76 M 79 F 82 M 89 25 Mean HR = 74.0 bpm SD = 7.5 bpm Mean  1SD = 74.0  7.5 = 66.5-81.5 bpm Mean  2SD = 74.0  15.0 = 59.0-89.0 bpm 20 Frequency For the heart rate data for 84 adults: 15 10 5 0 // 55 60 65 70 75 80 Heart Rate (BPM) 85 90 Mean  3SD = 74.0  22.5 = 51.5-96.5 bpm HR Data: • 57/84 (67.9%) subjects are between mean ± 1SD • 82/84 (97.6%) are between mean ± 2SD • 84/84 (100%) are between mean ± 3SD 100 +3 SD 95 90 +2 SD Heart rate (bpm) 85 + 1SD 80 75 Mean 70 -1 SD 65 -2 SD 60 55 -3 SD 50 45 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72 76 80 84 Subject number The “normal” range in medical measurements is the central 95% of the values for a reference population, and is usually determined from large samples representative of the population. The central 95% is approximately the mean  2 sd* Some examples of established reference ranges are: Serum “Normal” range fasting glucose sodium triglycerides 70-110 mg/dL 135-146 mEq/L 35-160 mg/dL Note: The value is actually 1.96 sd but for convenience this is usually rounded to 2 sd. The Standard Normal Distribution A normal distribution with a mean of 0, and standard deviation of 1 The distribution is also called the z distribution Any normal distribution can be converted to the standard normal distribution using the z transformation. Each value in a distribution is converted to the number of standard deviations the value is from the mean. The transformed value is called a z score. Formula for the z transformation x  z  Once the data are transformed to z-scores, the standard normal distribution can be used to determine areas under the curve for any normal distribution. Example of a z-transformation If the population mean heart rate is 74 bpm, and the standard deviation is 7.5, the z score for an individual with HR = 80 bpm is: x  80   6 z    0.8    The individual’s HR of 80 bpm is 0.8 standard deviations above the mean. The z-value can be looked up in a table for the standard normal distribution to determine the lower and upper areas defined by a z-score of 0.8 (the areas are the lower 78.8% and upper 21.2%) Using Table .0082 is the area under N(0,1) left of z = -2.40 .0080 is the area under N(0,1) left of z = -2.41 (…) 0.0069 is the area under N(0,1) left of z = -2.46 The standard Normal distribution Because all Normal distributions share the same properties, we can standardize our data to transform any Normal curve N (, ) into the standard Normal curve N (0,1). N(64.5, 2.5) N(0,1) => x z Standardized height (no units) For each x we calculate a new value, z (called a z-score). The total area under the normal distribution curve is 1: 90% of the area is between ± 1.645 sd 95% of the area is between ± 1.960 sd 99% of the area is between ± 2.575 sd Area = 90% Area = 95% Area = 99% -2.575 -1.645 -1.960 0 +1.645 +2.575 +1.960 The Normal Distribution & Confidence Intervals 90% of the area is between ± 1.645 sd 95% of the area is between ± 1.960 sd 99% of the area is between ± 2.575 sd These are the most commonly used areas for defining Confidence Intervals which are used in inferential statistics to estimate population values from sample data If a certain interval is a 95% confidence interval, then we can say that if we repeated the procedure of drawing random samples and computing confidence intervals over and over again, 95% of those confidence intervals include the true value from the population. Birth weight (xi) 3200 3450 2980 4100 2900 3500 : 3400 =3300 ; =600 Zi= xi  3300 600 -0.167 0.25 -0.533 1.333 -0.667 0.333 : 0.167 =0 ; =1.0 If it is known that the birth weights of infants are normally distributed with a mean of 3300gr and a standard deviation of 600gr, what is the probability that a randomly selected infant will weigh less than 3000gr? zi = xi - m s 3000 - 3300 = = -0.5 600 P(Xi £ 3000) = P(Zi £ -0.5) = 0.31 More than 3000gr? Ans: 0.19+0.50=0.69 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753 0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141 0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879 0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224 0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852 0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133 0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830 1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441 1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545 1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633 1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706 1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767 2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817 2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890 2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916 2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936 2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952 2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964 2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974 2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981 2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986 3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990 Area between 0 and z If the mean and the standard deviation of the BMI of adult women are 24 and 6 units respectively, what proportion of women will have BMI>30 (what proportion of women will be clssified as obese)? zi = xi - m s 30 - 24 = = 1.0 6 P(Xi ³ 30) = P(Zi ³ 1.0) = 0.16 16% of the adult women will be classified as obese. 18  22 P( x  18)  P( z  )  P( z  1) 4  0,5  0,3413  0,1587 Example • In a normal curve with mean = 30, s = 5, what is the proportion of scores below 27? 27  30 Z 27   0.6 5 Smaller portion of a Z of 0.6 is 0.2743 Mean to Z equals 0.2257 and -4 -3 -2 -1 0 27 1 2 3 4 0.5 - 0.2257 = 0.2743 Portion  27% Example • In a normal curve with mean = 30, s = 5, what is the proportion of scores fall between 26 and 26  30 35? Z 26  .3413 .2881 5  0.8 35  30 Z 35  1 5 Mean to a Z of 0.8 is 0.2881 Mean to a Z of 1 is 0.3413 0.2881 + 0.3413 = 0.6294 -4 -3 -2 -1 26 0 1 2 3 4 Portion = 62.94% or  63% Example • The Stanford-Binet IQ test has a mean of 100 and a SD of 15, how many people (out of 1000 ) have IQs between 120 and 140? Z140 140  100   2.66 15 Mean to a Z of 2.66 is 0.4961 .4082 Z120 120  100   1.33 15 Mean to a Z of 1.33 is 0.4082 .4961 0.4961 - 0.4082 = 0.0879 Portion = 8.79% or  9% -4 -3 -2 -1 0 1 120 2 3 140 4 0.0879 * 1000 = 87.9 or  88 people When the numbers are on the same side of the mean: subtract - =

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PROBABILITY AND PROBABILITY DISTRIBUTIONS