Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
THE NORMAL distribution WHAT IS A NORMALLY DISTRIBUTED DATA? For example, what is the normal height for 18 y-o. males in Manila? How can we say that 5½ ft is the normal height for them? Say, 5½ft! NORMAL = 5½ft I can find too many 18 y-o males around who are about 5.5ft tall! MODE = 5½ft The shortest guy I have found is about 5 ft, and the tallest us 6 ft. 5½ ft is right in the middle. Not too small, not too tall! MEDIAN = 5½ft Well, people say 5½ft is just the average height for them. MEAN = 5½ft Now, can you describe the number of 18 y.o. guys who are shorter or taller than the normal 5½ft height, if you count them? The numbers are decreasing smoothly as I count the guys who are either shorter or taller than the normal 5½ft height! Therefore, in Statistics, a NORMALLY DISTRIBUTED data set means: MODE = MEDIAN = MEAN and Smoothly decreasing frequencies both for lower and higher values than normal! The frequencies are smoothly decreasing for both lower and higher values than the normal value THE NORMAL DISTRIBUTION — WHAT IS A NORMALLY DISTRIBUTED DATA SET? Page 1 HISTOGRAM PATTERN OF NORMALLY DISTRIBUTED DATA HISTOGRAMS IN ACTUAL CASES THE NORMAL DISTRIBUTION INTERVAL CLASSES Median=Mode=Mean FREQUENCY WITH 27 INTERVAL CLASSES Median=Mode=Mean FREQUENCY Median=Mode=Mean FREQUENCY WITH 9 INTERVAL CLASSES INTERVAL CLASSES BASIC PROPERTIES The normal distribution is BELL-SHAPED, due to its smoothly decreasing frequency pattern. An “exactly” normal data IS IMPOSSIBLE!! Such data can only be approximately normal. The normal distribution is used as a SUBSTITUTE for any approximately normal data. INTERVAL CLASSES THE NORMAL DISTRIBUTION — THE HISTOGRAM PATTERN OF NORMALLY DISTRIBUTED DATA Page 2 ONE NORMAL TO REPRESENT ALL THE OTHERS! TAKE NOTE! Imagine that you have: ORIGINAL DATA MEAN ST.DEV. Then, using the z-score formula: X X (mean) Z s (st.dev) (COMPLETE LIST) You convert all your original data values into z-scores Of course, you have also computed the mean and the standard deviation! IN SHORT: If you convert your entire data set into z-scores, you will have a new data set (all z-scores, apparently!) having: mean = 0 and standard deviation = 1! And this works for all kinds of data set! You now have: CONVERTED DATA SET MEAN = 0 ST.DEV. = 1 (ALL Z-SCORE VALUES) Computing the mean and standard deviation of the converted data, REMEMBER THIS, you will get: ONESTANDARDNORMAL? ALL NORMALLY DISTRIBUTED DATA SETS, MEAN = 0 ST.DEV. = 1 WHEN CONVERTED INTO Z-SCORES , ALL OF THEM TURN INTO 0 THE STANDARD NORMAL THE NORMAL DISTRIBUTION — ONE NORMAL TO REPRESENT ALL THE OTHERS! Page 3 A CLOSER VIEW OF THE STANDARD NORMAL BELL CURVE From this Table, we will see that the interval Z=-1.5 to Z=-0.7 under the bell-curve, encloses an area of 0.1752 or 17.52% TAKE NOTE! This AREA is the same as the RELATIVE FREQUENCY AREA = 0.1752 of the interval! Not important in the standard normal! We have a special TABLE for the areas under the standard normal bell-curve. (Later!) The standard normal is derived from a given normal data set, by converting its data values into z-scores. Hence, the horizontal axis of the standard normal is the axis of the z-scores. Recall!! For the standard normal, the mean = 0. So, the point Z=0 must be marked as the point of symmetry of the bell-curve and its maximal point. In the standard normal, the vertical axis has no significance! For the usual (frequency) histograms, this line is the axis for the class frequencies. What serves as relative frequencies in the standard normal is the % AREA enclosed by the DESIGNATED INTERVAL on the z-score axis, UNDER THE BELL-CURVE! TOTAL AREA = 1.00 -1.5 Lastly, the TOTAL AREA enclosed by the standard normal bell-curve is 1.00 (or 100%). -0.7 INTERVAL 0 Z-SCORE AXIS (converted data values) THE NORMAL DISTRIBUTION — A CLOSER VIEW OF THE STANDARD NORMAL BELL CURVE Page 4 HOW TO READ THE STANDARD NORMAL TABLE On the left is a standard normal distribution TABLE OF CUMULATIVE AREAS. Does it give us all possible areas under the standard normal bell curve? Usually, it only gives the left-tail area under the standard normal from a specified z-value! This blue-shaded section of the standard normal is what the ‘left-tail from a specified z-value’ refers to z Other areas can be obtained from the left-tail area by subraction. (Coming shortly.) THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE Page 5 THE PROBABILITY NOTATION P(EVENT) FOR THE AREAS NOTE! The area under the standard normal is the relative frequencies (%) of the interval (of z-values) enclosing it. But the relative frequency is the probability (after appropriate restatement). So, the notation P(Event) is also used to denote area under the standard normal. (For now, we will use a rectangle in place of the standard normal bell-curve for easy visualization…) P(Z<1.12) 1.12 The shaded area under the standard normal will be denoted as: P( Z<1.12 ) Read the symbol P(??) as: “the area under the interval (??)” The interval Z<1.12 consists of all z-values less than 1.12 The value of P(Z<1.12) is what the Table will give us! For the other possible areas: P(Z>-0.75) -0.75 -0.75 P(-1.23<Z<0.95) -1.23 AREA = 1 0.95 -0.75 P(Z<0.95) -1.23 P(Z<-0.75) 0.95 THE NORMAL DISTRIBUTION — THE PROBABILITY NOTATION P(EVENT) FOR THE AREAS P(Z<-1.23) -1.23 0.95 Page 6 HOW TO FIND P(Z<1.27) IN THE STANDARD NORMAL TABLE To find the value of P(Z<1.27) in the Table of Cumulative Areas : 0.8980 0 1.27 First, find ‘1.2’ in the very first column (the ‘z’ column). Then, along the row (where you find ‘1.2’) find the value under the column ‘0.07’. Therefore, P(Z<1.27) = 0.8980 THE NORMAL DISTRIBUTION — HOW TO FIND P(X<12) IN THE STANDARD NORMAL TABLE Page 7 EXAMPLE 1. Draw the specified section of the standard normal and find its area. A. Z<-1.04 NOTE! Area = P(Z<-1.04) = 0.1492 No need to place the specified z-values precisely! Just correctly, relative to the middle point z=0, and to each other! -1.04 B. Z>0.82 0 -1.04 Area = P(Z>0.82) = 1 – P(Z<0.82) = 1 – 0.7939 = 0.2061 C. 1.25<Z<2.08 0 0.82 0 1.25 Area = P(1.25<Z<2.08) = P(Z<2.08) – P(Z<1.25) = 0.9812 – 0.8944 = 0.0868 THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE 2.08 Page 8 EXAMPLE 2. Find the Z-value (?) such that Find the z-value in the Table whose left-tail area is nearest to 0.29 = 0.2900 (must have 4 decimal places) A. P(Z<?) = 0.29 left-tail area = 0.29 ? = -0.55: P(Z<-0.55) = 0.2912 ? = -0.56: P(Z<-0.56) = 0.2877 (nearer!) ? Find the z-value in the Table whose left-tail area is nearest to 0.55 = 0.5500 (must have 4 decimal places) B. P(Z>?) = 0.45 left-tail area 0.45 0.55 ? = 0.12: P(Z<0.12) = 0.5478 ? = 0.13: P(Z<0.13) = 0.5517 (nearer!) ? THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE Page 9 EXAMPLE 3. It was found that certain type of storage battery lasts an average of 3.0 yrs with the standard deviation of 0.5 years. If the battery lives are normally distributed: A. Find the percentage of batteries that will last less than 3.7 years THINK! The battery lives (in yrs) are normal, with MEAN = 3.0 and ST.DEV. = 0.5 So the histogram for battery lives is: We want those battery lasting less than 3.7 years. To use the standard normal, we just convert the data values to z-scores using the formula: X 3.7 : Z Z 3.0 3.7 years 0 1.4 z-score XX s 3.7 3 1.4 0.5 The shaded section of the standard normal, expressed in probability notation is: P(Z<1.4) = 0.9192 or 91.92% THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE Page 10 B. Find the percentage of batteries that will last at least 2.1 years. MEAN = 3.0 ST.DEV. = 0.5 Convert the data values to Z-score: Z X 2.3 : Shaded section: P(Z>-1.8) = 1 – P(Z<-1.8) years 2.1 3.0 -1.8 2.1 3 1.8 0.5 0 = 1 – 0.0359 z-score = 0.9641 or 96.41% C. Find the percentage of batteries that will last around 3.5 to 3.8 years. MEAN = 3.0 ST.DEV. = 0.5 3.0 3.5 -1.8 0 3.8 years z-score Convert the data values to Z-score: 3.5 3 1 0.5 3.8 3 1.6 Z 0.5 Z X 3.5 : X 3.8 : Shaded section: P(1<Z<1.6) = P(Z<1.6) – P(Z<1) = 0.9452 – 0.8413 = 0.1039 or 10.39% THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE Page 11 EXAMPLE 4. The height of miniature poodles is normally distributed with a mean of 30 cms and standard deviation 4.1 cms. A. Find the percentage of miniature of poodles which are taller than 35 cms. MEAN = 30 ST.DEV. = 4.1 3.0 35 0 1.22 Convert the data values to Z-score: X 35 : Shaded section: Height (cm) Z 35 30 1.22 4.1 P(Z>1.22) = 1 – P(Z<1.22) = 1 – 0.8888 z-score = 0.1112 or 11.12% B. Is it possible to find a miniature poodle which is shorter than 18cms? MEAN = 30 ST.DEV. = 4.1 35 3.0 -2.93 0 Convert the data values to Z-score: X 18 : Z 18 30 2.93 4.1 Shaded section: P(Z<-2.93) = 0.0017 or 0.17% Height (cm) 0.17% — less than 1% chance! z-score So, it’s almost impossible to find such a poodle! THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE Page 12 C. Find the height range of the tallest 10% of all miniature poodles. MEAN = 30 ST.DEV. = 4.1 0.90 The tallest 10% belong to the shaded right-tail from some Z-value (?) whose area is 0.10 (=10%) Therefore, the left-tail area at this Z-value (?) is 0.90. 0.10 30 35.25 0 ? Height (cm) The tallest 10% starts at 35.25cms. From the Table, find that Z-value, ? = 1.28 Convert Z-value back to data value X. using: z-score X X Zs X 30 (1.28)(4.1) 35.25 cms The height range for the tallest 10% is 35.25cms and above. THE NORMAL DISTRIBUTION — HOW TO READ THE STANDARD NORMAL TABLE Page 13