Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 5 Selected material from: Ch. 7 Random variables and probability distributions Ch. 8 Sampling variability and sampling distributions Random variables A random variable is a numerical variable whose value depends on the outcome of a chance experiment. Discrete random variables have values that are isolated points on the number line. Possible values of a discrete random variable Continuous random variables have possible values that include an entire interval on the number line. Possible values of a continuous random variable 2 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Discrete probability distributions The probability distribution of a discrete random variable x gives the probability associated with each possible x value. The probabilities pi for a discrete variable x with k possible values (xi, i = 1,…,k) must satisfy 0 ≤ pi ≤ 1 for each i p1 + p2 + ... + pk = 1 The probability P(x in A) of any event A is found by summing the pi for the outcomes xi making up A. 3 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Hot tub example A store sells electric and gas hot tubs; 40% of all customers buy electric and 60% buy gas. Assume customers arrive independently to the store. What is the probability that out of the next four customers, the first three buy gas and the last one buys an electric hot tub? Find the distribution of the number of customers out of 4 that buy electric hot tubs. 4 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Hot tub example A company sells electric and gas hot tubs; 40% of all customers buy electric and 60% buy gas. What is the probability that out of the next four customers, the first three buy gas and the last one buys an electric hot tub? Solution G = customer buys gas E = customer buys electric Need P(GGGE). Assume customers are independent. P(GGGE) = P(G)P(G)P(G)P(E) = (.6)(.6)(.6)(.4) = .0864. There is a 8.64% chance of observing this sequence of purchases. 5 Hot tub example Find the distribution of the number of customers out of 4 that buy electric hot tubs; 40% buy electric, 60% gas. Solution • X = number of customers out of 4 that buy electric hot tubs. • X = 0, 1, 2, 3, or 4. • P(X = 0) = P(GGGG) = P(G)P(G)P(G)P(G) = (.6)(.6)(.6)(.6) = .1296, using independence and that 60% buy gas hot tubs. • P(X = 1) = P(EGGG) + P(GEGG) + P(GGEG) + P(GGGE) = 4(.4)(.6)(.6)(.6) noting that the order does not matter, = 4(.0864) using the previous calculation, = .3456. • P(X = 2) = 6(.4)(.4)(.6)(.6) noting that there are 6 ways to choose 2 objects out of 4 [double-check that you can write out these sequences yourself], = .3456 again. 6 Hot tub example Find the distribution of the number of customers out of 4 that buy electric hot tubs; 40% buy electric, 60% gas. Solution continued • P(X = 3) = 4(.4)(.4)(.4)(.6) = .1536. • P(X = 4) = (.4)(.4)(.4)(.4) = .0256. • Summarizing gives the following table: X 0 1 2 3 4 P(X) .1296 .3456 .3456 .1536 .0256 Double check that the P(X) sum to 1! 7 Hot tub example The distribution of the number of customers out of 4 that buy electric hot tubs is given by the following table. Find the probability that out of 4 customers at least 2 buy electric hot tubs. X 0 1 2 3 4 P(X) .1296 .3456 .3456 .1536 .0256 Solution P(X 2) = P(X = 2) + P(X = 3) = P(X = 4) = .3456 + .1536 + .0256 = .5248 There is a 52.5% chance that at least two out of four customers buy electric hot tubs. 8 Continuous Probability Distributions If one constructs histograms of the actual amount of water in collections of 1 liter bottles of water from a factory, they might observe something like the following. Sample 1: Water contents from 100 bottles measured to the nearest .1 milliliter (mL). 9 Sample 2: Water contents from 100 bottles measured to the nearest .01 mL. Limiting curve as the accuracy of water content increases © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Probability distributions for continuous random variables A probability distribution for a continuous random variable x is specified by a function f(x) called the density function, which satisfies 1. f(x) 0 2. The total area under the density curve is equal to 1: f ( x)dx 1. 10 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Probabilities: The probability that x falls in any particular interval is the area under the density curve that lies above the interval. a P(x < a) a P(x < a) Notice that for a continuous random variable x, P(x = a) = 0 for any specific value a because the “area above a point” under the curve is a line segment and hence has no area. This means P(x < a) = P(x ≤ a). 11 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. More examples of probabilities a b P(a < x < b) a b P(a < x < b) P(a < x < b) 12 a b P(a < x < b) = P(a ≤ x < b) = P(a < x ≤ b) = P(a ≤ x ≤ b) © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Method of probability calculation The probability that a continuous random variable x lies between a lower limit a and an upper limit b is P(a < x < b) = (area to the left of b) – (area to the left of a) = P(x < b) – P(x < a) = a 13 b b a © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: You’re late Define a continuous random variable x by x = the number of minutes by which the Professor starts the class late. Suppose that x has a probability distribution with density function .25 2 x 6 f(x) 0 otherwise The graph looks like .25 So this professor is always late. f(x) 1 14 2 3 4 5 6 7 x = number of minutes © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: You’re late Find the probability that the professor is between 3 and 4.5 minutes late. .25 1 2 3 4 5 6 7 The probability is represented by the orange shaded area in the graph. Since that shaded area is a rectangle, area = (base)(height)=(1.5)(.25) = .375. 15 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Mean, standard deviation The mean value of a random variable x, denoted by μx or μ, describes where the probability distribution of x is centered. Larger mean The standard deviation of a random variable x, denoted by σx or σ, describes variability in the probability distribution. Smaller standard deviation 16 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Discrete random variables The mean of a discrete random variable X is calculated as : x p(x). all values of x The variance is : σ 2X 2 (x μ) p(x) and the standard deviation all values of x (sd) is X σ 2X . 17 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Mean, variance of a linear function If x is a random variable with mean x 2 and variance X and a and b are numerical constants, the random variable y defined by y = a + bx is called a linear function of the random variable x. The mean of y = a + bx is y = a + bx = a + bx The variance of y is y abx b X 2 2 2 2 From which it follows that the standard deviation of y is y a bx b x | | Absolute value 18 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: Car sales A car dealership would like to evaluate its average daily costs (and standard deviation). The daily cost depends directly on how many salesmen are working on a given day. Specifically the cost of doing business on a day involves a fixed cost of $255 plus an additional cost of $110 for each salesman working. 30% of the time just one salesman is working, 40% of the time two salesmen are working, 20% of the time three salesmen are working 10% of the time four salesmen are working. How do we solve this problem? 19 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: Car sales Specifically the cost of doing business on a day involves a fixed cost of $255 plus an additional cost of $110 for each salesman working. 30% of the time just one salesman is working, 40% of the time two salesmen are working, 20% of the time three salesmen are working 10% of the time four salesmen are working. Sounds like there are two variables: cost and number of salesmen working and that these two are related to each other. This is a linear function problem! 20 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: Car sales 30% of the time just one salesman is working, 40% of the time two salesmen are working, 20% of the time three salesmen are working 10% of the time four salesmen are working. This is a given distribution, so it must be the x variable. Define x to be the number of salesmen working. x 1 2 3 4 p(x) 0.3 0.4 0.2 0.1 Specifically the cost of doing business on a day involves a fixed cost of $255 plus an additional cost of $110 for each salesman working. y must be the cost since it depends on x, specifically: y = 255 + 110 x 21 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: Car sales First, calculate the mean of x using the formula for discrete random variables. x 1 2 3 4 p(x) xp(x) 0.3 0.3 0.4 0.8 0.2 0.6 0.1 0.4 2.1 x 2.1 On average 2.1 salesmen are working on a given day. According to the linear function formula the mean of y = 255 + 110x is y 255110 x 255 110 x 255 110(2.1) $486 22 The average daily cost for the dealership is $486. © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: Car sales Calculate the variance of x using the formula for discrete random variables. 2 x p(x) (x-) p(x) 1 0.3 0.3630 2 0.4 0.0040 3 0.2 0.1620 4 0.1 0.3610 0.8900 0.89 2 x 0.89 0.9434 x Use the linear function to obtain the standard deviation of y = 255 + 110x: 2 (1 1 0 ) 2 2x (1 1 0 ) 2 (0 .89 ) 1 07 6 9 25 5 11 0 X 255 + 110x 2 55 1 10 X 1 1 0 x 1 1 0(0 .9 4 3 4 ) 1 0 3 .7 7 255 + 110x The typical daily cost for the dealership is $486 +/‐ $103.77. 23 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Binomial distribution Properties of a Binomial experiment 1. It consists of a fixed number n of observations called trials. 2. Each trial can result in one of only two mutually exclusive outcomes labeled success (S) and failure (F). 3. Outcomes of different trials are independent. 4. The probability that a trial results in S is the same for each trial. Example: Flip a coin 10 times and count the number of times heads appears, that is a Binomial random variable X ~ Bin(10,1/2). A binomial random variable X is defined as the number of successes out of n trials: X takes on the values 0, 1, 2, …., n. 24 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Binomial distribution Let n = number of independent trials in a binomial experiment πp = constant probability that any particular trial results in a success. Then P(x) = P(x successes among n trials) = n! x (1 ) xn-x x!(n x)! Example: X ~ Bin(10,1/2) X can only be 0, 1, 2, …, up to 10 heads. The expected number of heads out of 10 tosses is 10(1/2)=5. • n! = n (n‐1) ... 2 1. Use 0! = 1. • E(x) = n • Var(x) = n(1 ‐ ) 25 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Geometric distribution Suppose an experiment consists of a sequence of trials with the same conditions as for a binomial experiment: The trials are independent. Each trial can result in one of two possible outcomes, success and failure. The probability of success is the same for all trials. x = number of trials until the first success is observed (including the success trial). The probability distribution of x is called the geometric probability distribution: 26 p(x) = (1 – )x-1 … Example: What is the chance that a coin has to be flipped three times before a head appears? X~Geo(1/2), P(X = 3)= (1/2)2(1/2)=1/8 =.125. x = 1, 2, 3, © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Normal distribution Two numbers determine a normal distribution 1. Mean µ 2. Standard deviation Probability density function f ( x) Cumulative distribution function F ( x) 27 1 2 2 e 1 2 2 ( ) x 2 , x x f (u )du Warning! Some texts write N(µ, 2), some N(µ, ). Normal Distributions = 1 Normal Distributions = 0 -6 -4 -2 0 2 4 6 8 -4 28 -2 0 2 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. 4 Standard normal distribution A normal distribution with mean 0 and standard deviation 1 is called the standard (or standardized) normal distribution. Symmetric around 0. Half of the area (0.50) lies below 0 and half above 0. 29 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Normal tables based on a standard Normal distribution P(z < ‐1.96) = .0250 Bottom half of the Normal table (negative z values) 30 z* -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 -3.1 -3.0 -2.9 -2.8 -2.7 -2.6 -2.5 -2.4 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 -0.0 0.00 0.0001 0.0001 0.0002 0.0002 0.0003 0.0005 0.0007 0.0010 0.0013 0.0019 0.0026 0.0035 0.0047 0.0062 0.0082 0.0107 0.0139 0.0179 0.0228 0.0287 0.0359 0.0446 0.0548 0.0668 0.0808 0.0968 0.1151 0.1357 0.1587 0.1841 0.2119 0.2420 0.2743 0.3085 0.3446 0.3821 0.4207 0.4602 0.5000 0.01 0.0001 0.0001 0.0002 0.0002 0.0003 0.0005 0.0007 0.0010 0.0013 0.0019 0.0025 0.0034 0.0046 0.0061 0.0080 0.0104 0.0136 0.0174 0.0222 0.0281 0.0351 0.0436 0.0537 0.0655 0.0793 0.0951 0.1131 0.1335 0.1562 0.1814 0.2090 0.2389 0.2709 0.3050 0.3409 0.3783 0.4168 0.4562 0.4960 0.02 0.0001 0.0001 0.0001 0.0002 0.0003 0.0005 0.0006 0.0009 0.0013 0.0018 0.0024 0.0033 0.0044 0.0059 0.0078 0.0102 0.0132 0.0170 0.0217 0.0274 0.0344 0.0427 0.0526 0.0643 0.0778 0.0934 0.1112 0.1314 0.1539 0.1788 0.2061 0.2358 0.2676 0.3015 0.3372 0.3745 0.4129 0.4522 0.4920 0.03 0.0001 0.0001 0.0001 0.0002 0.0003 0.0004 0.0006 0.0009 0.0012 0.0017 0.0023 0.0032 0.0043 0.0057 0.0075 0.0099 0.0129 0.0166 0.0212 0.0268 0.0336 0.0418 0.0516 0.0630 0.0764 0.0918 0.1093 0.1292 0.1515 0.1762 0.2033 0.2327 0.2643 0.2981 0.3336 0.3707 0.4090 0.4483 0.4880 0.04 0.0001 0.0001 0.0001 0.0002 0.0003 0.0004 0.0006 0.0008 0.0012 0.0016 0.0023 0.0031 0.0041 0.0055 0.0073 0.0096 0.0125 0.0162 0.0207 0.0262 0.0329 0.0409 0.0505 0.0618 0.0749 0.0901 0.1075 0.1271 0.1492 0.1736 0.2005 0.2296 0.2611 0.2946 0.3300 0.3669 0.4052 0.4443 0.4840 0.05 0.0001 0.0001 0.0001 0.0002 0.0003 0.0004 0.0006 0.0008 0.0011 0.0016 0.0022 0.0030 0.0040 0.0054 0.0071 0.0094 0.0122 0.0158 0.0202 0.0256 0.0322 0.0401 0.0495 0.0606 0.0735 0.0885 0.1056 0.1251 0.1469 0.1711 0.1977 0.2266 0.2578 0.2912 0.3264 0.3632 0.4013 0.4404 0.4801 0.06 0.0001 0.0001 0.0001 0.0002 0.0003 0.0004 0.0006 0.0008 0.0011 0.0015 0.0021 0.0029 0.0039 0.0052 0.0069 0.0091 0.0119 0.0154 0.0197 0.0250 0.0314 0.0392 0.0485 0.0594 0.0721 0.0869 0.1038 0.1230 0.1446 0.1685 0.1949 0.2236 0.2546 0.2877 0.3228 0.3594 0.3974 0.4364 0.4761 0.07 0.0001 0.0001 0.0001 0.0002 0.0003 0.0004 0.0005 0.0008 0.0011 0.0015 0.0021 0.0028 0.0038 0.0051 0.0068 0.0089 0.0116 0.0150 0.0192 0.0244 0.0307 0.0384 0.0475 0.0582 0.0708 0.0853 0.1020 0.1210 0.1423 0.1660 0.1922 0.2206 0.2514 0.2843 0.3192 0.3557 0.3936 0.4325 0.4721 0.08 0.0001 0.0001 0.0001 0.0002 0.0003 0.0004 0.0005 0.0007 0.0010 0.0014 0.0020 0.0027 0.0037 0.0049 0.0066 0.0087 0.0113 0.0146 0.0188 0.0239 0.0301 0.0375 0.0465 0.0571 0.0694 0.0838 0.1003 0.1190 0.1401 0.1635 0.1894 0.2177 0.2483 0.2810 0.3156 0.3520 0.3897 0.4286 0.4681 0.09 0.0001 0.0001 0.0001 0.0002 0.0002 0.0003 0.0005 0.0007 0.0010 0.0014 0.0019 0.0026 0.0036 0.0048 0.0064 0.0084 0.0110 0.0143 0.0183 0.0233 0.0294 0.0367 0.0455 0.0559 0.0681 0.0823 0.0985 0.1170 0.1379 0.1611 0.1867 0.2148 0.2451 0.2776 0.3121 0.3483 0.3859 0.4247 0.4641 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Normal tables Top half of the Normal table (positive z values) 31 z* 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 0.00 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.9987 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.9987 0.9991 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.9987 0.9991 0.9994 0.9995 0.9997 0.9998 0.9999 0.9999 0.9999 0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.9988 0.9991 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.9988 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.9989 0.9992 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.9990 0.9993 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.9999 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Calculations using the standard normal distribution P(z < 1.83) = 0.9664 32 P(z > 1.83) = 1 – P(z < 1.83) = 1 – 0.9664 = 0.0336 Symmetry: P(z < ‐1.83) = P(z > 1.83) = 0.0336 Intervals: P(‐1.83 < z < 1.83) = P(z < 1.83) ‐ P(z < ‐1.83) = 0.9664 – 0.0336 = 0.9328 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Finding quantiles Using the standard Normal tables, in each of the following, find the z values that satisfy : The point z with 98% of the observations falling below it. The closest entry in the table to 0.9800 is 0.9798 corresponding to a z value of 2.05. 33 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Normal tables need .98 .9798 is the closest probability to .98. Row # + Column # = 2.0 + .05 = 2.05 is the answer 34 z* 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 0.00 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257 0.7580 0.7881 0.8159 0.8413 0.8643 0.8849 0.9032 0.9192 0.9332 0.9452 0.9554 0.9641 0.9713 0.9772 0.9821 0.9861 0.9893 0.9918 0.9938 0.9953 0.9965 0.9974 0.9981 0.9987 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665 0.8869 0.9049 0.9207 0.9345 0.9463 0.9564 0.9649 0.9719 0.9778 0.9826 0.9864 0.9896 0.9920 0.9940 0.9955 0.9966 0.9975 0.9982 0.9987 0.9991 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686 0.8888 0.9066 0.9222 0.9357 0.9474 0.9573 0.9656 0.9726 0.9783 0.9830 0.9868 0.9898 0.9922 0.9941 0.9956 0.9967 0.9976 0.9982 0.9987 0.9991 0.9994 0.9995 0.9997 0.9998 0.9999 0.9999 0.9999 0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357 0.7673 0.7967 0.8238 0.8485 0.8708 0.8907 0.9082 0.9236 0.9370 0.9484 0.9582 0.9664 0.9732 0.9788 0.9834 0.9871 0.9901 0.9925 0.9943 0.9957 0.9968 0.9977 0.9983 0.9988 0.9991 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729 0.8925 0.9099 0.9251 0.9382 0.9495 0.9591 0.9671 0.9738 0.9793 0.9838 0.9875 0.9904 0.9927 0.9945 0.9959 0.9969 0.9977 0.9984 0.9988 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749 0.8944 0.9115 0.9265 0.9394 0.9505 0.9599 0.9678 0.9744 0.9798 0.9842 0.9878 0.9906 0.9929 0.9946 0.9960 0.9970 0.9978 0.9984 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770 0.8962 0.9131 0.9279 0.9406 0.9515 0.9608 0.9686 0.9750 0.9803 0.9846 0.9881 0.9909 0.9931 0.9948 0.9961 0.9971 0.9979 0.9985 0.9989 0.9992 0.9994 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790 0.8980 0.9147 0.9292 0.9418 0.9525 0.9616 0.9693 0.9756 0.9808 0.9850 0.9884 0.9911 0.9932 0.9949 0.9962 0.9972 0.9979 0.9985 0.9989 0.9992 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810 0.8997 0.9162 0.9306 0.9429 0.9535 0.9625 0.9699 0.9761 0.9812 0.9854 0.9887 0.9913 0.9934 0.9951 0.9963 0.9973 0.9980 0.9986 0.9990 0.9993 0.9995 0.9996 0.9997 0.9998 0.9999 0.9999 0.9999 0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830 0.9015 0.9177 0.9319 0.9441 0.9545 0.9633 0.9706 0.9767 0.9817 0.9857 0.9890 0.9916 0.9936 0.9952 0.9964 0.9974 0.9981 0.9986 0.9990 0.9993 0.9995 0.9997 0.9998 0.9998 0.9999 0.9999 0.9999 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Standardization = true population mean = true population standard deviation The formula x z gives the number of standard deviations that x is from the mean. • Analog to the Z‐score (Ch. 4), which used the sample mean and standard deviation. • Z then follows the standard normal distribution N(0,1). 35 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Finding normal probabilities To calculate probabilities for any Normal distribution, standardize the relevant values and then use the table of z curve areas. More specifically, if x is a variable whose behavior is described by a Normal distribution with mean μ and standard deviation σ, then x b * P ( x b) P P( z b ) b * where z ~ N(0,1) and b is the number you should look up in the standard Normal table. 36 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example : Picante sauce A company produces “20 ounce” jars of picante sauce. Suppose the “20 ounce” jars follow a N(20.2,0.1252) distribution curve. What proportion of the jars are under‐filled, defined as having less than 20 ounces of sauce? Looking up ‐1.60 on the standard normal table yields P(Z ‐1.60)= 0.0548. 5.48% of the jars contain less than 20 ounces of sauce. Solution z 37 x 2 0 2 0 .2 1 .6 0 0 .1 2 5 Example: Picante sauce 99% of the jars contain more than what amount of sauce? Recall, = 20.2 ounces and = 0.125 ounces. Solution: When we try to look up 0.0100 in the body of the table we do not find this value. We do find the following z - 2 .4 - 2 .3 - 2 .2 0 .0 0 .0 0 0 .0 1 0 .0 1 0 82 07 39 0 .0 0 .0 0 0 .0 1 0 .0 1 1 80 05 35 0 .0 0 .0 0 0 .0 1 0 .0 1 2 78 02 32 0 .0 0 .0 0 0 .0 0 0 .0 1 3 75 99 29 0 .0 0 .0 0 0 .0 0 0 .0 1 4 73 96 25 0 .0 0 .0 0 0 .0 0 0 .0 1 5 71 94 22 The entry closest to 0.0100 is 0.0099 the z value ‐2.33. Since the relationship between the real scale (x) and the z scale is z=(x‐ )/ we solve for x getting x = + z x = 20.2+(‐2.33)(0.125) = 19.90875 = 19.91. 99% of all “20 oz” jars contain more than 19.91 ounces (oz). 38 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Properties of the sampling distribution of the sample mean 1 n X X i , where X 1 , X 2 ,..., X n independent with mean , variance 2 . n i 1 Rule 1 : X Central Limit Theorem Rule 2 : X n Rule 3 : When X i normally distributed for i 1,...,n, then X also normal. Rule 4 : When X i not normally distributed but n large ( 30) then, X also normal. Symmetric distributions 39 Population n =4 n=9 n = 16 Skewed distributions Population n=4 n=10 n=30 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Central Limit Theorem If n is large (> 30) or the population distribution is normal, the standardized variable x X x z X n has approximately a standard normal distribution, N(0,1). 40 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: Cereal A food company sells “18 ounce” boxes of cereal. Let x denote the actual amount of cereal in a box of cereal. Suppose that x is normally distributed with µ = 18.03 ounces and = 0.05. a) What proportion of the boxes will contain less than 18 ounces? Solution 18 18.03 P(x 18) P z 0.05 P(z 0.60) 0.2743 Note! You did not need the central limit theorem here. 41 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Example: Cereal b) A case consists of 24 boxes of cereal. What is the probability that the mean amount of cereal across the 24 boxes is less than 18 ounces? Now you need the central limit theorem! 18 18.03 P(x 18) P z 0.05 24 P(z 2.94) 0.0016 This chance is much less than for a single box (p=0.27). 42 © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Properties of the sampling distribution of p Let p be the proportion of successes in a random sample of size n from a population whose proportion of successes is . Denote the mean of p by p and the standard deviation by p. Then the following rules hold. Rule 1: p (1 ) Rule 2: p n Rule 3: When n is large and is not too near 0 or 1, the sampling distribution of p is approximately Normal. Informal Rule If both np ≥ 10 and n(1‐p) 10, then it is safe to use a Normal approximation. 43 Example: Defectives If the true proportion of defectives produced by a certain manufacturing process is 0.08 and a sample of 400 is chosen, what is the probability that the proportion of defectives in the sample is greater than 0.10? Solution: Since n = 400(0.08) = 32 > 10 and n(1‐) = 400(0.92) = 368 > 10, it is reasonable to use the normal approximation. p 0.08 p z p p p (1 ) 0.08(1 0.08) 0.013565 n 400 0.10 0.08 1.47 0.013565 P( p 0.1) P( z 1.47) 1 P( z 1.47) 1 0.9292 (from the standard normal table) 0.0708 44 There is a 7.1% chance that the proportion of defectives exceeds 0.10. © 2008 Brooks/Cole, a division of Thomson Learning, Inc. Tutorial 5a: Consulting times The amount of time spent by a statistical consultant with a client is a random variable having a Normal distribution with mean 60 minutes and standard deviation 10 minutes. Use the Normal tables to help answer the questions below. 1.) What is the probability that more than 45 minutes is spent with the consultant? 2.) What amount of time is exceeded by only 10% of all clients at a meeting? 3.) If the consultant assesses a fixed charge of 10 euros (overhead) and then 50 euros per hour, what is the average cost of a meeting? 45 Tutorial 5b: Playing catch Sophie is a dog that loves to play catch. Unfortunately she is not very good and only catches 10% of all the balls thrown to her. 1.) Let x be the number of times you have to throw a ball to Sophie before she catches it. What is the distribution of x? Calculate the probability that it takes two throws before Sophie catches the ball, i.e., she misses the first but catches the second ball. 2.) Suppose you will throw the ball five times to Sophie and let x be the number of times out of five that she catches the ball. What is the distribution of x? What is the chance she catches none of the five balls? Does this result make intuitive sense? Why? 46