Download Presentation

Week 8, Part I Using the Standard Normal Curve The Standard Normal Distribution   We have learned about the standard normal distribution: a normal curve with a mean of zero and a standard deviation of 1. How can you convert any normal distribution into a standard normal distribution? The Standard Normal Distribution  To convert any normal distribution into a standard normal distribution, convert the scores to z-scores:    First, subtract the mean from each score. Then, divide by the standard error. If you perform this procedure on every score in a normal distribution, the result is a standard normal distribution. The Standard Normal Distribution  What is so great about a standard normal distribution? The Standard Normal Distribution  The standard normal distribution can be easily used to determine the probability associated with a particular range of values on a normally distributed variable. The Standard Normal Distribution  How does it work?    By “partitioning” the area under the normal curve into sub-areas corresponding to values within and outside the range of values. The area within the range of values corresponds to the probability of having values within that range. The area outside the range of values corresponds to the probability of having values outside that range. The Standard Normal Distribution  For example, the shaded area below corresponds to the probability of obtaining a value that is from 0 to 1 standard deviations above the mean. The Standard Normal Distribution  That probability, it turns out, is .3413.   How do I know? I look it up in a Normal Distribution Table (Statistical Table B in the back of the book!) The Standard Normal Distribution  What is the probability of having a value from 0 to 1.96 standard deviations above the mean? The Standard Normal Distribution   Answer: .4750 Next question: What is the probability of having a value that is more than 1.96 standard deviations above the mean? The Standard Normal Distribution  There are two ways to get this answer.  First, since we just saw that the probability of a z-score between 0 and +1.96 is .4750 and we know that the overall probability of a z-score that is greater than 0 is .5000, we can answer the question using subtraction:   .5000 - .4750 = .0250. Second, we can look up the answer in the Normal Distribution Table provided by the book. The column “C” entry next to 1.96 is .0250. The Standard Normal Distribution  One important extension of this “partitioning” procedure is that it applies equally to areas to the right of the mean (i.e. to ranges of scores below the mean.)   What is the probability of having a value between 0 and 1.96 standard deviations below the mean (i.e., the probability of having a z-score between 0 and -1.96)? Answer: the same as the probability of having a z-score between 0 and +1.96: .475. The Standard Normal Distribution  A second important extension of this “partitioning” procedure is that it applies to ranges that include the mean. Of particular interest, we can quickly obtain the probability that an observation falls within a certain distance Z from the mean. To do this, we simply add: p(from 0 to Z)+p(from –Z to 0) = 2*p(from 0 to Z). The Standard Normal Distribution  What is the probability that an observation falls within 1 standard deviation of the mean?    First, translate the question: what is the probability of having a z-score from 1.00 to +1.00? Answer: 2*P(0<z<+1.00) = 2*.3413 = .6826. This is the familiar result that 68% of the observations in a normally distributed population fall within one standard deviation of the mean. The Standard Normal Distribution  Given the answer to the previous question (.6826), you should be able to say what is the probability that an observations falls more than one standard deviation from the mean. Well? The Standard Normal Distribution  Since the area under the normal curve equals 1 and the probability of having a z-score from -1 to +1=.6826, the probability of having |z|>1.00 is given by 1 - .6826 = .3174. The Standard Normal Distribution  Let’s take another example. What is the probability that an observation from a normal distribution falls 1.96 standard deviations from the mean? The Standard Normal Distribution  First, translate the question: What is the probability of having a z-score between -1.96 and +1.96?  P(-1.96<z<+1.96) = ? The Standard Normal Distribution  Then, solve the problem:   P(-1.96<z<+1.96) = 2*P(0<z<+1.96)=2*P(-1.96<z<0) = 2*.4750 = .9500. There is a “95% probability” that an observation from a standard normal distribution falls within 1.96 standard deviations of the mean.  (That is why 1.96 is such a special number.) The Standard Normal Distribution  Based on what we have done so far, what is the probability that an observation from a normal distribution falls more than 1.96 standard deviations from the mean? The Standard Normal Distribution  Based on what we have done so far, what is the probability that an observation from a normal distribution falls more than 1.96 standard deviations from the mean? The Standard Normal Distribution  Another extension: even though the table only lists probabilities for ranges bounded by either 0 or infinity, you can use subtraction to obtain probabilities associated with ranges not bound by either of these.   What is the probability that an observation from a normal distribution falls between 1 and 1.96 standard deviations above the mean? Translation: p(1.00<z<1.96) = ? The Standard Normal Distribution  p(1.00<z<1.96) = p(0<z<1.96) – p(0<z<1.00) = .4750 - .3413 = .1337. The Standard Normal Distribution  This can be seen graphically: Partitioning in reverse: finding critical values of z   So far we have seen that you can easily identify probabilities under the normal curve that correspond to ranges of zscores by looking them up in the Normal Distribution Table. We can also perform the same procedure in reverse. Partitioning in reverse: finding critical values of z  Rather than start with a range of z scores and determine an associated probability, we can start with a given probability and determine the z-scores that define the associated range. We call these z-scores “critical values” of z, denoted by Z. Partitioning in reverse: finding critical values of z  For example, we might have the following question: what is the value of z that defines a distance above the mean in which only 1% of the observations fall?   How many standard deviations above the mean do only 1% of observations fall? What is the value, Z, of z for which p(Z<z)=.01? Partitioning in reverse: finding critical values of z  To answer this type of question, we also consult the Normal Distribution Table. But now we look in Column C rather than Column A and we find the value for Z corresponding to the entry nearest .01.  Turns out that the answer is +2.33. Partitioning in reverse: finding critical values of z   Note that we have to take the sign of Z into account. Once again, due to the symmetry of the normal distribution, the fact that the area under the normal curve equals 1.00, and additive property of probabilities, if +2.33 is the value of Z above which 1% of the observations fall, then the following must also be true:      +2.33 is the value of Z below which 99% of the observations fall. -2.33 is the value of Z below which 1% of the observations fall. -2.33 is the value of Z above which 99% of the observations fall. 2.33 is the value of Z that demarcates a symmetrical range around the mean outside which 2% of all observations fall. 2.33 is the value of Z that demarcates a symmetrical range around the mean within which 98% of all observations fall. Partitioning in reverse: finding critical values of z  What is the Z that demarcates a range about the mean beyond which only 5% of observations fall?   To answer this, first divide the probability of falling outside the range by 2, since the observations can fall in either tail of the distribution. .05/2 = .025. Next, find .025 in column C of the Normal Distribution Table. What is corresponding Z score? Some terminology    We refer to the probability of falling outside a particular range of z-scores as a (alpha). We use Za to denote the critical value of Z for a particular probability, a. We refer to the area that is outside the range of zscores as the “critical region.” Some terminology  We distinguish between “one-tailed a” and “two-tailed a.”    One-tailed a means the entire critical region is contained in one of the tails (either positive or negative) of the distribution. Two-tailed a means the critical region is equally divided between both tails (positive and negative) of the distribution. Remember: the Normal Distribution Table presents one-tailed a’s in Column C. The look up a Za corresponding to a particular two-tailed a, first divide the two-tailed a by 2.  E.g., to determine Za for two-tailed a = .01, find the Za for one-tailed a=.005. Looking up .005 in Column C, the closest entry corresponds to 2.58 in column A. The probability that an observation is at least 2.58 standard deviations from the mean is .005. 2.58 is the critical value of Z defining a two-tailed critical region of .01, i.e., the value of z that defines a symmetrical range about the mean beyond which 1 percent of the observations fall. Remember!   The first step when working with normal distributions is to convert the original scores (“xscores”) into z-scores. In order to use the Normal Distribution Table you must have z-scores. Once you convert x-scores into z-scores, you can turn a question about an x-score into a question about a z-score. Then you can answer the question by partitioning the area under the standard normal curve. Remember!   When doing these types of problems, always start by drawing a normal curve, labeling the appropriate Za values, and shading the appropriate area. That helps you intuitively understand what you are looking for. Let’s go through an example: chapter 6, exercise 16. Week 8, Part II Sampling Distributions and the Central Limit Theorem Sampling distributions    We have seen how the area under the standard normal curve can be partitioned in order to determine the probability that an observation from a normal distribution falls within or outside particular ranges of values. How is this useful for making statistical inferences? To see how, we must learn about the concept of a sampling distribution. Sampling distributions   Recall that we use statistics to make inferences about a population based on a sample. For example, we might be interested in estimating a mean in the population (e.g., mean age in a population of children) based on a sample from that population. By convention, we use Greek letters to denote population characteristics:    mx denotes the population mean on the variable x. sx denotes the population standard deviation on the variable x An estimate of the mean age in the population based on one sample – a “point estimate” – is only an estimate. We use Roman letters to denote sample characteristics:   x denotes the sample mean on the variable x. sx denotes the sample standard deviation on the variable x. Sampling distributions    We usually have only one sample from which to estimate mx. But imagine that we could take multiple samples of a given size, N.  Say that x is age and mx in our population of children is 4.5.  It would not be unreasonable, were we to take 5 samples of 6 individual children to obtain the sample means to the right (see Ritchey, p. 192). Note that none of these means (none of these “point estimates”) give us the true mean, but each of them is pretty close to the true mean. In fact, we are more likely to get point estimates that are close to the mean than point estimates that are far from the mean, but in some cases we nonetheless will get point estimates far from the true mean.  From the population for this example, we might get an estimate of 0.4. x 1 = 4 .0 x 2 = 5 .5 x 3 = 4 .3 x 4 = 5 .3 x 5 = 4 .7 Sampling distributions  Now imagine that rather than just five samples of 6 individuals from our population of children, we took a very large number of samples. For example, say we took 10,000 samples and each time we wrote down the sample mean.     We would have 10,000 sample means, which would themselves form a frequency distribution. Just as we can convert any frequency distribution into a probability distribution, we can convert this frequency distributions of sample means into probability distributions of sample means. Now say we took every possible sample of 6 individuals from our population and computed the sample mean for each. These sample means would also form a distribution. We call probability distribution of the point estimates of some population parameter (such as mx ) drawn from all the possible samples of a given size a sampling distribution.  If our parameter of interest is mx then we are interested in the sampling distribution of sample means. The Central Limit Theorem   Because sampling distributions involve a very large number of samples, they are often hypothetical. But statisticians have studied the properties of sampling distributions by taking large numbers of samples of given sizes from populations with known means and other parameters of interest. They discovered that sampling distributions have some very important properties, which are captured by the “Central Limit Theorem”: 1. m x = m x 2 .s x = sx N 3. The distribution is normal (for N > 120) or approximately normal (for N<120). The Central Limit Theorem 1. m x = m x 2 .s x = sx N 3. The distribution is normal (for N > 120) or approximately normal (for N<120).  In other words: 1. 2. 3. The mean of the sampling distribution of sample means is the population mean. The standard deviation of the sampling distribution of sample means, which we call the standard error, is the population standard deviation divided by the square root of the sample size. The sampling distribution of sample means is normally distributed if N is greater than 120. If N is less than 120, it is almost normally distributed. The Central Limit Theorem  The fact that the standard error is the population standard deviation divided by the square root of the sample size implies that the larger the sample size, the closer (on average) a point estimate of the population mean will be to the true population mean.  To see this, look at figure 7-4 on p. 198 of Ritchey.  This confirms the basic intuition that larger samples are better than smaller samples for estimating population parameters.  Since we don’t usually know the population standard deviation (otherwise, why would we need to look at a sample?) we approximate the standard error using the sample standard deviation: sx sx = n 1 The Central Limit Theorem   The most useful aspect of the central limit theorem is the third point: the sampling distribution of sample means is (at least approximately) normal. This property holds regardless of the shape of the population distribution of x.  To see this – and to get a better intuitive understanding of sampling distributions – run the demo at: http://onlinestatbook.com/stat_sim/sampling_dist/index.html The Central Limit Theorem  What this means is that you can use the normal distribution to make inferences about the population parameters of variables even if the original variables are not normally distributed!  We do this by partitioning the area under the normal curve in a manner similar to what we’ve already been doing. The t distribution  In fact, we use a slightly different distribution called the t distribution.  To convert x scores to t scores, we use the same formula as we do to convert them to z scores: xx tx = sx  The t distribution is very similar to the normal distribution   They are identical for sample sizes of 120 or greater For smaller sample sizes they are flatter (see Figure 7-7 in Ritchey, p.201) The t distribution    The t distribution varies by sample size – what we call “degrees of freedom” (N-1) We use the t distribution table (Statistical Table C) in way similar to one of the ways we use the normal distribution table: We identify critical values of t (Ta) corresponding to particular critical regions (probabilities of falling outside the range defined by Ta). The t distribution  To use the t distribution table, we start with a pre-determined a; for example, a = .05.  We then determine the critical value of t (Ta) associated with the corresponding degrees of freedom.   Remember: df = n-1 for the t distribution. We must also keep in mind whether we are dealing with a one-tailed or a two-tailed a. Proportions   When our x variable is a dichotomous variable, we can think of the mean of x within a population as the proportion of the population falling in one of the categories (call it “success”). (Review) This follows logically if we code the variable as a dummy variable, assigning a 1 to the category denoting success and 0 to the category denoting failure.   For example, say we convert the variable “gender” (1 for women, 2 for men) into a dummy variable called “woman” equalling 1 for women, 0 for men. The mean of this variable in a population of interest will give us the proportion of women, which is equivalent to the probability that a randomly drawn individual is a woman. Proportions  It also follows from the characteristics of dummy variables that the population standard deviation is given by: sP =  PQ = N P(1  P) N Where P denotes the proportion of successes (probability of success) and Q denotes the proportions of failures (probability of failure).

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Presentation