Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE NORMAL DISTRIBUTION Chapter 6 Prob. Model for a Continuous RV Continuous RV’s typically involve the measure of attributes such as length, weight, time and temperature - intervals. (Discrete RV - counting). Graphical form - smooth curve Ex. Symmetrical distribution - Normal, t-dist. This curve is denoted by f(x) and is called a probability density function (pdf). -108- Empirical Rule - (pg 117) A variable having approximately a bell shaped distribution should have: 1. Approximately 68% of the observations fall within one SD of the mean. 2. Approximately 95% of the observations fall within two SD of the mean. 3. Almost all the observations (99.6%) fall within three SD of the mean. -109- The areas under a pdf correspond to probabilities for x. The area A is the probability that x assumes a value between a and b. That is, . Note: When X is a continuous RV, P(X=a)=0. Hence . -110- A pdf for a continuous RV must satisfy: 1. f(x) is non-negative ($0). 2. The total area under the curve representing f(x) equals 1. That is, -111- The Normal Dist’n (pg 115) - One of the most useful and frequently encountered continuous RV's. - symmetrical about : - Its spread is determined by the value of its standard deviation. Note: The normal dist’n with a mean of : and a SD of F is denoted by N(:, F). -112- A RV X with mean : and SD F is normally dist’d if its pdf is given by -4 < x < 4 (infinity) where B = 3.14159... e = 2.71828... Notation: X - N(:, F) Read : Ex. " X (a RV) is normally dist’ed with mean : and SD F". X - N(100,15) Mean = 100 Variance = 152 (or 225) SD = 15 -113- The standard Normal Dist’n The standard normal dist’n is denoted as Z - N(0, 1) _ a Mean = : = 0 SD = F = 1.0 To calculate probabilities we need calculus tables or Always use these Probability calculations with normal dist’ns. Use the fact that if X - N(:, F) then -114- the standardized normal RV. (Read ‘Z is normal 0,1'). • Only Z (standard normal dist’n) is tabulated- Table B2 in your book. (Will provide a different table - easier to use!) Words: Z equals the distance from X to :, measured in standard deviations. Note: The Z transformation is sometimes referred as the z-score. -115- Areas Under the Standard Normal N(0,1) Curve Basic Properties: 1. The total area under the N(0,1) is 1. 2. The N(0,1) is symmetric around 0. 3. Most of the area under N(0,1) curve lies between -3 and 3. -116- Using the standard normal table: Ex: Determine the area under the N(0,1) curve that a. lies to the left of 2.11 P(Z < 2.11) = P ( Z # 2.11 ) = 0.9826 -117- b. lies to the right of -1.25 P (Z > -1.25) = 1 - P(Z # -1.25) = 1 - 0.1056 = 0.8944 -118- c. lies between -0.5 and 2.47 inclusive. P(-0.5 # Z # 2.47) = P (Z # 2.47) - P(Z < -0.5) = 0.9932 - 0.3085 = 0.6847 -119- Finding the z-score for a specified area: Ex. Determine the z-score having an area of a. 0.025 to its right Same as : Area to its left is 0.975. From table: 0.975 = P(Z # 1.96) Hence, z-score is 1.96. -120- b. 0.05 to its left There is no 0.05 in table, but 0.0495 = P ( Z # -1.65 ) 0.0505 = P ( Z # -1.64 ) Hence (using interpolation) The z-score is -1.645. -121- Notation: z" denotes the z-score having area " (alpha) to its right under N(0,1) curve. From above : z0.025 = 1.96 What is z0.05? z0.05 = 1.645 (Because of symmetry and part (b) above) -122- Working with Normally Distributed Variables To determine a percentage or probability for a normally dist'ed variable: Steps: 1. Sketch the normal curve. 2. Shade the region of interest and mark delimiting x-values. 3. Compute the z-scores for the delimiting x-values found in (2). 4. Use table provided to obtain the area under the N(0,1) curve. -123- Ex. Each year, thousands of college seniors take the Graduate Record Examination (GRE). The scores are transformed so that they have a mean of 500 and a SD of 100. Furthermore, the scores are known to be normally dist’ed. Determine the percentage of students that score: a. between 350 and 600 inclusive. : = 500 and F = 100 -124- P(350 # X # 600) = ? (Shaded area). To use Normal tables, first transform the normal RV X into the standard normal RV Z or -1.5 # Z # 1 Hence P(350 # X # 600) = P(-1.5 # Z # 1) = P (Z #1) - P(Z # -1.5) = 0.8413 - 0.0668 = 0.7745 Represents the area under N(0,1) over the interval from -1.5 to 1. -125- b. 375 or grater. P(X $ 375) =P = P (Z $ -1.25) = 1 - P(Z < -1.25) = 1 - 0.1056 = 0.8944 -126- c. below 750. P(X < 750) = = P ( Z < 2.5 ) = 0.9938 -127- d. between 300 and 450. P(300 < X < 450) = P( -2 < Z < -0.5) = P (Z < -0.5) - P(Z < -2) = 0.3085 - 0.0228 = 0.2857 -128- e. between 587 and 650. P(587 < X < 650) = P( 0.87 < Z < 1.5) = P (Z < 1.5) - P(Z < 0.87) = 0.9332 - 0.8078 = 0.1254 f. exactly equal to 680. P(X = 680) = P(Z = 1.8) = 0.0 -129- g. What score is exceeded by exactly 5%? (95 percentile). P(Z < a) = 0.95 6 a = 1.645 8 Excel: norminv(0.95, 0, 1) 1.645 = x = 1.645 ×100 + 500 = 164.5 + 500 = 664.5 -130- Determining if the Distribution is Normal (pg 124) If variable is normally dist’ed, to assess normality: • Large sample 6 look at histogram. Bell shaped? • Small/large sample 6 look at normal probability plots. Fairly linear? -131- Normal probability plot: Scatter plot of observed values on horizontal axis and normal scores on vertical axis. Normal scores - the observations we would expect to get for a variable having a N(0,1) dist’n. 6 If plot is roughly linear, then accept as reasonable that the variable is approx. normal. -132- Ex. The serum bilirubin level of 9 patients admitted to a hospital are as follows: 20.5 26.6 14.8 23.4 21.3 22.9 12.7 19.2 15.2 a. Construct a normal probability plot for these data. First, arrange data in ascending order and obtain the normal scores. Normal Scores? Sample size = 9 Idealized sample from a N(0,1) of size 9? -133- Cumulative Prob. Normal Scores 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -1.28 -0.84 -0.52 -0.25 0.00 0.25 0.52 0.84 1.28 -134- Sorted Obser. 12.7 14.8 15.2 19.2 20.5 21.3 22.9 23.4 26.6 b. Assess the normality of serum bilirubin level. The normal probability plot in (a) looks fairly linear suggesting that serum bilirubin is approximately normally dist’ed. c. Outliers in the data? No, none of the data values seem to fall outside the overall pattern of the plot. -135- Sampling Distribution Recall: Descriptive measures of a population are called parameters. Example: : , F², p. Descriptive measures calculated from a sample are called statistics. Example: , S², . The distribution of a statistic is called the sampling distribution of the statistic. -136- The Sampling Distribution of the sample Mean (pg 125) Questions: What is the dist'n of ? Idea: From repeated sampling Sample 1 of size n 6 Sample 2 of size n 6 ! Sample m of size n Questions: ! 6 8 What is the dist'n of ? What is the mean of ? What is the SD of ? -137- Central Limit Theorem (CLT) If relatively large samples of size n are drawn from any population, the sampling dist'n of is approximately normal. - If the popul'n dist'n is normal, the sampling dist'n of will be exactly normally dist'd. - If the population dist'n is non-normal, the sampling dist'n of will be, for large samples (n$30), approximately normally dist'd (by the CLT). Mean and SD of -138- ? The Mean and SD of The mean of , for samples of size n, is equal to the mean of the original popul'n. That is: The SD for , for samples of size n, equals the SD of the parent popul’n divided by the square root of the sample size. The SD of a statistic is called the standard error of the statistic. : is called the standard error of the mean. -139- Example 1 Suppose it is known that the response time of healthy subjects to a particular stimulus is normally distributed with a mean of 15 seconds and a variance of 4 second. a. What is the mean and SD of the parent popul'n? X : Response time in seconds of healthy subjects to the particular stimulus : = 15 seconds F = 4 seconds 6 X - N( 15 , 4) -140- b. If 5 healthy subjects are randomly selected and the average response time to the stimulus is calculated, what is the mean and SD of the sample mean? Mean: SD: - N( 15 , 1.79) -141- c. Plot the two distributions found in (a) original population and (b) average of 5 observations. -142- d. Find the probability that a randomly selected subject will have a response time of 17 seconds or more? P[X > 17] = ? P [ X > 17] = = P [ Z > 0.50 ] = 1 - P[Z #0.50] = 1 - 0.6915 = 0.3085 -143- e. Find the probability that a random sample of 5 subjects will have a mean response time of 17 seconds or more? P[ > 17] = ? In this case, the transformation needed to standardized the normal RV is: -144- = P [ Z > 1.12] = 1.0 - P[Z # 1.12] = 1.0 - 0.8686 = 0.1314 -145- Example 2 The mean and SD of the total cholesterol value for certain population are 200 and 20 mg/100 ml, respectively. If 45 individuals are selected at random from this population and their average total cholesterol value is calculated, a. Is it reasonable to assume a normal dist’n for the sample mean ? Why or why not? Yes, since sample size n=45. Hence - N( 200 , 20/ -146- ) b. Find the probability that the sample mean of the 45 total cholesterol levels will be between 190 and 205 mg/100 ml. =? = = P[- 3.35 < Z < 1.68] = P[ Z < 1.68 ] - P[ Z < -3.35] = 0.9535 - 0.0004 = 0.9531 -147-