Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Normal Distribution Lecture Notes Professor Richard Blecksmith [email protected] Dept. of Mathematical Sciences Northern Illinois University Math 101 Website: http://math.niu.edu/∼richard/Math101 Section 2 Website: http://math.niu.edu/∼richard/Math101/fall06 1. Normal Distribution Curve 34% 34% 2.5% 13.5% µ−σ µ − 2σ µ 2.5% 13.5% µ+σ µ + 2σ In a normal distribution • Fact 1. • Fact 2. – 50% – 50% Center = mean = median The data lies equally distributed on each side of the center. of the data lies to the left of µ and of the data lies to the right of µ. 2. The 68 – 95 – 99 Rule • Fact 3. – 68% of the data lies within 1 standard deviation of the mean – 95% of the data lies within 2 standard deviations of the mean – 99% of the data lies within 3 standard deviations of the mean 1 2 3. Standardizing Data Given normally distributed data, with mean µ and starndard deviation σ. If x is a data point, we wish to know: • how many standard deviations is x to the right (or left) of the center? • That is, x = µ + z · σ. Solve for z. µ+z·σ =x z·σ =x−µ z = (x − µ)/σ 4. The z–Rule Original Data Value x Standardized Data Value z = (x − µ)/σ • A negative value of z represents a data point to the left of the center • A positive value of z represents a data point to the right of center 5. Example from Text (page 51) The lifetime of 20,000 flashlight batteries are normally distributed, with a mean of µ = 370 days and a standard deviation of σ = 30 days. 1. What percentage of the batteries are expected to last more than 340 days? Solution: z = (x − µ)/σ = (340 − 370)/30 = −1.00 • Look up z = 1 in the chart. • (The negative means that this value occurs one standard deviation to the left of the center µ.) • The corresponding P value is 34.1%. 3 6. Draw the picture 34.1 µ − 1.00σ 50 µ The answer is 34.1 + 50 = 84.1%. 7. Question 2 2. How many batteries can be expected to last less than 325 days? Solution: Work with percentages. • z = (x − µ)/σ = (325 − 370)/30 = −1.50 • Look up z = 1.5 in the chart. • The corresponding P value is 43.3%. 8. Draw the picture 43.3 µ − 1.50σ µ • Fifty percent of the data lies to the left of the center. • Since 43.3% lies between µ − 1.50σ and the center µ, • the percentage to the left of µ − 1.50σ is 50.0 − 43.3 = 6.7% The final answer is: 6.7 percent of 20,000 = .067 × 20, 000 = 1340 9. SAT Example • In 2001 a total of 1,276,320 college-bound students took the SAT exam. 4 • The mean and standard deviation of the test scores was µ = 506 and σ = 111. • 68% of the students fall within 1 standard deviation of the mean, • that is in the range µ−σ = 506−111 = 395 to µ+σ = 506+111 = 617. • 95% of the students fall within 2 standard deviations of the mean, that is in the range µ − 2σ = 506 − 222 = 284 to µ + 2σ = 506 + 222 = 728. • Where is the cutoff between the first and second Quartile? 10. SAT Example Cont’d • We want P = 25%. • The (3-digit) chart shows the z-value corresponding to P = .25 is z = .675. • This means that 25% of the data occurs before you get within .675 standard deviations of µ (on the left). • Another 25% lies between µ − .675σ and µ itself. • So the first quartile occurs at • Q1 = µ − .675σ = 506 − (.675)111 = 431 • It turns out Q1 was exactly 430. • The third quartile occurs at • Q1 = µ + .675σ = 506 + (.675)111 = 581 11. Draw the Picture 2001 SAT Scores 25% 25% µ − 0.675σ Q1 = 431 25% µ 506 25% µ + 0.675σ Q3 = 581 I.4 Sampling Lecture Notes 5 12. Statistical Thinking Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. – H. G. Wells, author of “War of the Worlds” Definition: Statistics is the science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively evaluated. 13. Three Phases of Statistics • Collect the data • Analyze the data – order the data – graphical displays – numerical calculations (such as mean and standard dev) • Interpret the results – use proper statistical techniques to substantiate or refute hypothesized statements – match data to the appropriate technique – determine whether the proper assumptions are satisfied 14. Two types of statistics • Descriptive statistics – summarize and describe a characteristic for some group • Inferential statistics – estimate, infer, predict, or conclude something about a larger group 15. Examples Descriptive Batting Average Yards Per Carry Test Scores Inferential Polls Medical Studies Market Surveys 6 16. Two types of data • Quantitative data – values recorded on a natural numerical scale • Qualitative data – classified into categories 17. Quantitative Data • Weight of subjects in medical sample • Height of buildings in Chicago • Temperatures per day at Antarctica Weather Station 18. Qualitative Data • Gender of subjects in medical sample • Political affilation of respondents in a poll survey • Class (fresh, soph, jr, sr) of Math 101 students 19. Vocabulary • The population is the entire set of objects (people or things) under consideration. • A sample is a subset of the population that is available for the analysis. • A bias is a favoring of certain outcomes over others. • A census collects data from each member of the population. • A statistic is a statement of numerical information about a sample. • A parameter is a statement of numerical information about a population. 20. Census versus Sample Would you use a census or a sample to determine the following: • Project the winner of an election • Calculate a baseball player’s batting average 7 • Predict whether it will rain tomorrow • Test whether the soup is too salty • Calculate Shaq’s free throw average • Use a market study to determine a new flavor of toothpaste • Report the Dow Jones Average • Generalize a medical study to other groups • The average score on the first test 21. Dealing with bias Bias in some form occurs in the collecting of most, if not all, sets of data. The bias may come from • the portion of the population surveyed • the phrasing of the questions 22. Examples • “Dewey defeats Truman” projection of Chicago Tribune based on 1948 telephone poll • “Are you in favor of Illinois banning cell phones in cars? Dial *91 on your cellular phone to vote.” • “Do you feel budget cuts are more important than humanitarian programs that would need to be cut to obtain a balanced budget?” 23. Methods for Choosing Samples • Judgement Sample 8 – Use the opinion of person(s) deemed qualified to choose members of the sample. – Example: to investigate study habits of atheletes, ask their coaches and teachers. • Simple Random Selection – Use random numbers to select the sample. – Page 315 Random Digit Table: 72985547555515086461 • Stratefied Sampling – Divide the population into relatively homogenous groups, draw a sample from each group, and take their union. 24. Goals of a good sample • from the correct population • chosen in an unbiased way • large enough to reflect total population 25. Normal Distribution of Random Events Toss a coin 100 times and count the number of heads. How many heads would you expect? • about 50 • exactly 50 It does not seem reasonable that the count will be exactly 50. We would not be surprised if the number of heads turned out to be 48 or 51 or even 55. We would be surprised to see 80 heads, and would begin to suspect that the coin was not fair. 26. Coin Toss Data Experiment: A coin is tossed n = 100 times. 9 The experiment is repeated 1000 times. Here are the results: 27. Frequency Table: No. of Heads Heads Freq 1 0 .. . 0 34 0 35 2 36 2 37 2 38 2 39 5 40 14 41 16 42 25 43 30 44 31 Heads Freq 45 54 46 49 47 54 48 66 49 89 50 70 51 77 52 85 53 62 54 57 55 52 56 40 57 36 Heads Freq 58 27 59 19 60 11 61 11 62 5 63 4 64 2 65 0 66 0 67 1 68 0 .. . 0 100 0 28. Mean and Standard Deviation mean = 50.296 stand dev = 5.100 10 29. Coin Toss Histogram 30 40 50 60 70 30. Sampling Distributions If we could examine all possible samples of size n of a population, then the frequency distribution of the means of these samples is normally distributed. • • • • µ = the mean over the entire population σ = the standard deviation over the entire population x = the mean of the sampling distribution σx = the standard deviation of the sampling distribution 31. Two Rules Rule 1. x = µ σ Rule 2. σx = √ n We are assuming in Rule 2 that the size of the entire population is much larger than the sample size n.