Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Basic Quantitative Methods in the Social Sciences (AKA Intro Stats) 02-250-01 Lecture 4 A Quick Review • The entire area under the normal curve can be considered to be a proportion of 1.00 • A proportion of .50 lies to the left of the mean, and a proportion of .50 lies to the right of mean Area Under the Normal Distribution and Z-Scores Normal Distribution with z-score points of reference: Properties of Area Under the Normal Distribution • Since the normal curve is a bell shape, the proportion of scores between whole z-scores is not equal • For example, .3413 of the scores lie between the z-scores of 0 (the mean) and 1 (or -1), while only .1359 of the scores lie between the z-scores of 1 and 2 (or -1 and -2) Properties of Area Under the Normal Distribution .3413 .3413 .1359 .1359 .0215 .0215 .0013 .0013 Z= -3 -2 -1 0 +1 +2 +3 Properties of Area Under the Normal Distribution Z-scores* Proportion under the curve -1 to +1 .6826 (.3413+.3413) -2 to +2 .9544 -3 to +3 .9974 -4 to +4 1.0000 *Z-scores are expressed in standard deviation units, i.e., a z-score of -1 represents one standard deviation below (to the left of) the mean Normal Distribution Example • A study of 2500 University of Windsor students showed that the average amount of sleep lost in the week prior to writing a statistics exam (in hours) was normally distributed with = 7.79 and = 1.75 (don’t worry, this isn’t real data!) • This distribution is shown with the abscissa (xaxis) marked in raw score and z-score units: Normal Distribution Example .3413 .3413 .1359 .1359 .0215 .0215 .0013 X= Z= Z= .0013 2.54 -3 -3 4.29 -2 -2 6.04 -1 -1 7.79 0 0 9.54 +1 +1 11.29 +2 +2 13.04 +3 +3 Example cont. • We can see from this diagram that 34.13% of U of W students lost between 6.04 and 7.79 hours of sleep in the week prior to a stats test (between z=-1 and z=0) • 13.59% of students lost between 9.54 and 11.29 hours of sleep in that week (between z=+1 and z=+2) • 49.87% of students lost between 2.54 & 7.79 hours of sleep (between z=-3 and z=0) (.0215+.1359+.3413 = .4987 = 49.87%) Properties of Area Under the Normal Distribution • The symbol Z is used to denote the z-score having area (alpha) to its right under the normal curve • The proportion of area under the curve between the mean and a z-score can be found with the help of a table (Table E.10, Howell, p. 452) and a little math… • In this example, we want to know the area between the mean and z = 0.20: • Look under the column “mean to z” at z=0.20 • The proportion = 0.0793 • Therefore, .0793 (or almost 8%) is the proportion of data scores between the mean and the score that has a z score of 0.20 Example cont. • This means that the area between the mean and z = 0.20 has an area under the curve of 0.0793: .0793 .4207 Z: 0 0.20 Example cont. • Since half of the normal distribution has an area of .5000, we can determine the area beyond z = .20 by subtracting the area from the mean to z = .20 from .5000: • Area beyond z=.20 = .5000 - .0793 • Area beyond z=.20 = .4207 • (Note: If you look at the “smaller portion” in the table, you will see it’s .4207) Example cont. • Since the normal curve is symmetrical, the area between the mean and z = -.20 is equal to the area between the mean and z = +.20: .0793 .0793 .4207 .4207 Z: -0.20 0 +0.20 Normal Distribution Table • Table E.10 has 3 columns: Mean to z Larger portion Smaller portion Table: Mean to z Table: Larger Portion Table: Smaller Portion A Couple of Notes • 1) Always report proportions (area under the curve) to four decimal places. This means that if you report an area as a percentage, it will have two decimal places (e.g., .7943 = 79.43%) • 2) When using Table E.10, be careful not to confuse z=.20 with z=.02 (this is a common mistake) • 3) Remember that a negative z value has the same proportion under the curve as the positive z value because the normal distribution is symmetrical • 4) When working on z-score problems, it is highly recommended that you draw a normal distribution and plot the mean, x, and their corresponding z-scores Another Example! • We often want to know what the area between two scores is, as in this example: • Assume that the marks in this class are normally distributed with = 69.5 and = 7.4. What proportion of students have marks between 50 and 80? Example: Area Between 2 Scores 1) Calculate the z-scores for X values (50 & 80) z = (50-69.5)/7.4 = -19.5/7.4 = -2.64 z = (80-69.5)/7.4 = 10.5/7.4 = 1.42 2) Find the proportions between the mean and both z-scores (consult Table E.10) z(-2.64) = .4959 is the proportion between the mean and z. z(1.42) = .4222 is the proportion between the mean and z. Example: Area Between 2 Scores • Third, add these proportions together to find your answer: .4959 + .4222 = .9181 • This means that 91.81% of students have Stats marks between 50 and 80 Smaller and Larger Portions • Smaller portion = proportion in the tail • Larger portion = proportion in the body • Using the same data ( = 69.5 and = 7.4) we can calculate areas using the Smaller and Larger Portions in the Normal Distribution table: • Find the number of students who have stats marks of less than 80.6 • z = (80.6-69.5)/7.4 = +1.5 Larger Portion • Area below z = +1.5 = 0.9332 This means that 93.32% of students had a mark of 80.6 or less in this class Smaller Portion • Find the number of students who have marks of 76.93 or better: • z = (76.93-69.5)/7.4 = 1.00 • Area in smaller portion = .1587 • This means that 15.87% of students in this class had a mark of 76.93 or better Converting Back to X • Assume = 30 and = 5, what raw scores correspond to z=-1.00 and z=+1.5? z X Therefore X ( z ) X 30 ( 1.0 5) 25 X 30 (1.5 5) 37.5 Proportion • What proportion of scores lie between z=-1.00 and z=+1.50? • Area from mean to z=-1.00 = .3413 • Area from mean to z=+1.50 = .4332 • Add them together to get the proportion that lies between these two z-scores: .3413+.4332 = .7745 Finding for Number of Observations • In this example, if we know the sample size, (e.g., n=212) we can calculate how many people lie between z=-1.00 and z=+1.50: • Area between z=-1.00 and z=+1.50 = .7745 (see the last slide) • Multiply the proportion by n: (.7745)(212) = 164.19 Approximately 164 people And a Little More • Finally, we can find a z-score from the table if we know the proportion of scores (i.e., we can work backwards): • Suppose the birth weight of newborns is normally distributed with = 7.73 and = 0.83 • What birth weight identifies the top (heaviest) 10% of newborns? Example cont. • Look at Table E.10 and find the z-score that identifies the top proportion of 0.1000: look in the smaller portion column (the tail) .1000 z=? Example cont. • Looking in the smaller portion column, we find that z=1.28 has an area of .1003 z=1.29 has an area of .0985 Which do we pick? • Pick the one that is closest to an area of .1000: this is z=1.28 Example cont. • Now solve for X: X (z )( ) X = (1.28)(0.83) + 7.73 = 1.06 + 7.73 = 8.79 So any weight equal to or greater than 8.79 pounds is in the top 10% of birth weights Probability • Everything that can possibly happen has some likelihood of happening: probability is a measure of that likelihood • Probability: The quantitative expression of likelihood of occurrence Probability • Probability is a ratio of frequencies • The numerator (top) is the frequency of the outcome of interest • The denominator (bottom) is the frequency of all possible outcomes Coin Toss Example • If a fair* coin is tossed in the air, it can land on either heads or tails • This means a coin has 2 possible outcomes • If we want to know the probability of tossing a fair* coin and having it land on heads, we calculate as follows: *Note: fair means a normal coin, one that is not weighted differently Coin Toss Frequency of interest Frequency of all possible outcomes For a coin toss, this is : 1 2 The probability of the coin landing on heads is: p(heads) = ½, or p(heads) = .5 Another Example • Suppose there are 90 students in a class, 59 of them are women and 31 are men • If one of the students is chosen at random, the probability of choosing a woman is: p(woman) = 59/90 More Probability • If the entire class was women (e.g., there were no male students), the probability of choosing a woman would be 90/90 • If the entire class was men, the probability of choosing a woman would be 0/90 More Probability • As a numerical value, probabilities can range from 0.00 to 1.00 • The numerator can range from a minimum of 0 to a maximum equal to the denominator Express Yourself! • Probability can be expressed as a fraction, e.g., p(woman) = 59/90 • Or as a decimal fraction: p(woman) = .6556 • Although not usually expressed as a percentage (e.g., 65.56%), they often are in popular media Probability cont. • Even if we do not know the actual observed frequencies (e.g., the number of women), probabilities can be determined theoretically • Without throwing a die, we can deduce the probability of landing on a 5 Die Example cont. • We know the die has 6 sides - 6 possible outcomes • We are only interested in one side (the 5), so the probability of landing on a 5 is: p(5) = 1/6 = 0.1667 Probability and the Normal Distribution • The normal distribution can be thought of as a probability distribution. Here’s how: • We know (from Table E.10) the proportion of scores that fall above or below a given z score • If you were to randomly pick a score from a sample of scores, what is the probability that you would pick a score that has a corresponding z score of .40 or greater? Probability and the Normal Distribution • The proportion of scores above or below a given z score is the same as the probability of selecting a score above or below the z score e.g., the probability of selecting a score from a normal distribution that has a z score of .40 or greater is .3446 (the area in the smaller portion of z = .40) Example #1 • Suppose people’s scores on a personality test are normally distributed with a mean of 50 and a population standard deviation of 10. • If you were to pick a person completely at random, what is the probability that you would pick someone with a score on this personality test that is higher than 60? Example #1 • Step #1: Write down what you know X 60 50 10 • Step #2: What do you want to find? p( X 60) • Step #3: Draw the normal distribution, write in the mean, standard deviation, and the X and shade the area you are looking for Example #1, Step #3 X: 20 30 40 50 60 70 80 Example #1 • Step #4: Calculate z score(s) z X 10 60 50 z 1.00 z 10 10 • Step #5: Use Table E.10 to find the probability of selecting a score in your shaded area Here we want p ( X 60) or p ( z Look up the smaller portion of z=1.00 p( z 1.00) .1587 1.00) Example #1 • Step #6: Interpret: The probability of picking someone at random who has a personality test score of 60 or greater is .1587 Example #2 • Length of time spent waiting in line to buy tickets at the movies is normally distributed with a mean of 12 minutes and a population standard deviation of 3 minutes. • If you go to see a movie, what is the probability that you will wait in line to buy tickets for between 7.5 and 15 minutes? Example #2 • Step #1: Write down what you know X 1 7.5 X 2 15 12 3 • Step #2: What do you want to find? p (7.5 X 15) • Step #3: Draw the normal distribution, write in the mean, standard deviation, and both X scores and shade the area you are looking for Example #2, Step #3 X: 3 6 7.5 9 12 15 18 21 Example #2 • Step #4: Calculate z score(s) z X1 7.5 12 4.5 1.50 3 3 zX2 z X 15 12 3 1.00 3 3 • Step #5: Use Table E.10 to find the probability of selecting a score in your shaded area Here we want p (7.5 X 15) or p( 1.50 z 1.00) Look up the mean to z of z = 1.00 = .3413 Look up the mean to z of z = -1.50 = .4332 Example #2 • Add the two areas together! (Each represent the mean to z, so adding them together gives you the overall shaded area) = .3413+.4332=.7745 p( 1.50 z 1.00) .7745 Example #2 • Step #6: Interpret: The probability of waiting in line to buy tickets at the movie for between 7.5 and 15 minutes is .7745. (Note: This means that you will wait in line for between 7.5 and 15 minutes 77.45% of the time).