Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Random Variable Quantitative (numeric) Qualitative (categorical) Nominal Interval Ordinal Discrete Ratio Continuous SUMMARIZING NUMERIC DATA • • • • • Simple Frequency Table Grouped Frequency Table Histogram Frequency Polygon Cumulative Frequency Distribution 3- 3 Measures of Central Location • Arithmetic Mean • Median • Mode. 3- 4 Mean for grouped data: fm population mean : N fm sample mean : x n 3- 5 Median for grouped data: n c CF 2 median L fm 3- 6 Mode for grouped data: f m f m1 mode L c 2 f m f m1 f m1 Measures of Dispersion (Variability) • • • • Range Variance and Standard Deviation Coefficient of Variation Non-central Locations: Inter-fractile Ranges Standard Deviation s n( x 2 ) ( x ) 2 s n( fm2 ) ( fm) 2 (ungrouped data) n(n 1) n(n 1) (grouped data) Coefficient of variation: s CV (100)% x 3- 10 Empirical Rule: 68% 95% 99.7% 3s 2s 1s 1s 2s 3s Symmetric Distribution Zero skewness → :Mean =Median = Mode Mean Median Mode The Relative Positions of the Mean, Median, and Mode: 3- 11 Positively skewed: Mean>Median>Mode Mode Mean Median 3- 12 3- 13 Negatively Skewed: Mean<Median<Mode Mean Mode Median Non-Central Location Measures (Fractiles or Quantiles) • • • • • Quartiles Sextiles Octiles Deciles Percentiles Calculating Quartiles for Grouped Data The jth quartile for grouped data is given by: jn CF c 4 Qj L fQ j n = sample size L = lower limit of jth quartile class CF = < cumulative frequency of immediately preceding class. fQj = frequency of jth quartile class. Example A sample of 20 randomly-selected hospitals in the US revealed the following daily charges (in $) for a semiprivate room. 153 159 142 146 141 140 130 148 142 163 134 151 122 167 137 152 143 168 159 141 1.1 Using class intervals of width 10 units, construct a less-than cumulative frequency distribution of the above data. Let 120 units be the lower limit of the smallest class. 1.2 Draw a less-than ogive and use it to estimate the 80th percentile. 1.3 For the grouped data of question 1.1 above, calculate: 1.3.1 The mean, median and mode 1.3.2 The interquartile range.. 1.3.3 The coefficient of variation. Interpret the result obtained. Solution 1.1 Class Freq, f <cum freq, F 120 < 130 1 1 130 < 140 3 4 140 < 150 8 12 150 < 160 5 17 160 < 170 3 20 ∑ = 20 1.2 Less-than Ogive 25 cum Frequency 20 15 10 5 0 100 110 120 130 140 150 Upper Class Limit 80th percentile = 158 160 170 180 Class Freq, f <cum freq, F midpt, m fm 120 < 130 1 1 125 125 130 < 140 3 4 135 405 140 < 150 8 12 145 1160 150 < 160 5 17 155 775 160 < 170 3 20 165 495 ∑ = 20 1.3.1 ∑ = 2960 fm 2960 x 148 f 20 n 2 CF x L c med xmod e Lmod e med f med f m f m1 140 10 4 10 147.5 8 (8 3) 10 146.3 c 140 (16 3 5) (2 f m f m 1 f m 1 ) Class Freq, f <cum freq, F 120 < 130 1 1 130 < 140 3 4 140 < 150 8 12 150 < 160 5 17 160 < 170 3 20 ∑ = 20 1.3.2 Q3 15 12 150 10 156 5 (5 4 ) Q1 140 10 141.3 8 IQR Q3 Q1 156 141.3 14.7 1.3.3 Class Midpt, m fm fm2 120 < 130 125 125 15625 130 < 140 135 405 54675 140 < 150 145 1160 168200 150 < 160 155 775 120125 160 < 170 165 495 81675 ∑ = 2960 ∑ = 440300 CV = standard deviation/mean s → fm 2 2 ( fm) / n (n 1) 440300 29602 / 20 10.8 19 CV = 10.8/148 0.073 ≡ 7.3% → data clustered around mean. BASIC PROBABILITY CONCEPTS • • • • • • Random Experiment Sample Space Event Collectively Exhaustive Events Dependent Events Independent Events • Marginal Probability • Joint Probability: P(A∩B) = P(B∩A) • Conditional Probability: P(A|B) = P(A∩B)/P(B) P(B|A) = P(A∩B)/P(A) . Complement Rule: P(A’) = 1 – P(A) or P(A) = 1 – P(A’) Special Multiplication Rule: P(A and B) = P(A)P(B) = P(B)P(A) General Multiplication Rule: P(A and B) = P(AB) = P(A)P(B/A) or P(A and B) = P(AB) = P(B)P(A/B) Special Addition Rule: P(A or B) = P(A)+P(B) GeneralAddition Rule: P(A or B) = P(A)+P(B) – P(A and B) Example A company manufactures a total of 8000 motorcycles a month in three plants A, B and C. Of these, plant A manufactures 4000, and plant B manufactures 3000. At plant A, 85 out of 100 motorcycles are of standard quality or better. At plant B, 65 out of 100 motorcycles are of standard quality or better and at plant C, 60 out of 100 motorcycles are of standard quality or better. The quality controller randomly selects a motorcycle and finds it to be of substandard quality. Calculate the probability that it has come from plant B. Solution P(B/substd) = No. of substd items from B/Total no. of substd items No of substd items from A = 4000x(100 – 85)/100 = 40x15 = 600 No of substd items from B = 3000x(100 – 65)/100 = 30x35 = 1050 No of substd items from C =1000x(100 – 60)/100 = 10x40 = 400 Total number of substd items = 600 +1050 + 400 = 2050 P(B/substd) = 1050/2050 = 0.512 PROBABILITY DISTRIBUTIONS • Properties • Discrete distributions • Normal distributions Binomial Probability Distribution n! x n x P( x ) (1 ) x!(n x )! Example According to a leading newspaper, the largest cellular phone service in the US has about 36 million subscribers out of a total of 180 million cell phone users. If six cell phone users are randomly selected, what is the probability that at least two of them subscribes to this service? n=6 36 / 180 0.2 n! P( x) x (1 ) n x x!(n x)! P( x 2) 1 P(0) P(1) P(0) P(1) 6! (0.2) 0 (1 0.2) 6 0.262 0!(6 0)! 6! (0.2)1 (1 0.2) 5 0.393 1!(6 1)! P( x 2) 1 0.262 0.393 0.345 Poisson Probability Distribution P( x ) x e x! Example Customers arrive randomly and independently at a service point at an average rate of 30 per hour. 1. Calculate the probability that exactly 20 customers arrive at the service point during any given hour. 2. Calculate the probability that during any 5 minute period at least 3 customers arrive at the service point. Solution 1. 2. 3020 30 P(20) e 0.0134 20! λ = 30/hr P( x) x x! e ; λ = 30/60 min = 2.5/5 min P( x 3) 1 P(0) P(1) P(2) 2.5 0 2.5 P(0) e 0! 0 2.51 2.5 P(1) e 1! 2.5 2.5 e → P(x ≥ 3) = 1 0! = 0.497 1 2.5 2.5 e 1! 2.5 2 2.5 P(2) e 2! 2.5 2 2.5 e 2! Normal probability distribution Standard normal or z-distribution z x s r a l i t r b u i o n : m = 0 , s2 = 1 Normal Distribution 0 . 4 0 . 3 Theoretically, curve extends to infinity Normal curve is symmetrical . 2 0 . 1 f ( x 0 . 0 - 5 a Mean, median, and mode are equal x z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 0.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987 0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987 0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987 0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988 0.04 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988 0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989 0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989 0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989 0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990 0.09 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990 Example Six hundred candidates wrote an entrance test for admission to a management course. The marks obtained by the candidates were found to be normally distributed with a mean of 132 marks and a standard deviation of 18 marks. 1. How many candidates scored between 140 and 160 marks? 2. If the top 60 performers were given confirmed admission, calculate the minimum mark (to the nearest integer) above which a candidate would be guaranteed admission? Solution 1. z x s Z1 =(140 -132)/18 = 0.4444 → P1 ≈ 0.172 Z2 =(160 -132)/18 = 1.5556 → P2 ≈ 0.440 → P (160<X<140) ≈ 0.440 – 0.172 = 0.268 → 0.268 x 600 students ≈ 161 students 2. Let xc denote the minimum mark. zc 60/600 = 0.1 = 10%. xc s P(0 <z<zc) = 0.50 - 0.10 = 0.4 → zc = 1.28 xc 132 xc 132 1.28 xc 155 18 18 HYPOTHESIS TESTING • What is a Hypothesis? • What is Hypothesis Testing? Basic Terms • • • • • • • • • • • Null hypothesis Alternative hypothesis Level of significance Type I error Type II error Critical value Test statistic Rejection area Acceptance area One-tailed test Two-tailed Test Five-Step Procedure for Hypothesis Testing Step 1: State the null and alternative hypotheses Step 2: Determine the critical value associated with the the level of significance Step 3: Identify and calculate the test statistic Step 4: Formulate and apply the decision rule Step 5: Draw a conclusion Testing a Single Population Mean Large sample (n > 30) Test statistic: ztest x s n Small sample (n ≤ 30) Test statistic: ttest x s n df\p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0.4 0.32492 0.288675 0.276671 0.270722 0.267181 0.264835 0.263167 0.261921 0.260955 0.260185 0.259556 0.259033 0.258591 0.258213 0.257885 0.257599 0.257347 0.257123 0.256923 0.256743 0.25658 0.256432 0.256297 0.256173 0.25606 0.255955 0.255858 0.255768 0.25 1 0.816497 0.764892 0.740697 0.726687 0.717558 0.711142 0.706387 0.702722 0.699812 0.697445 0.695483 0.693829 0.692417 0.691197 0.690132 0.689195 0.688364 0.687621 0.686954 0.686352 0.685805 0.685306 0.68485 0.68443 0.684043 0.683685 0.683353 0.1 3.077684 1.885618 1.637744 1.533206 1.475884 1.439756 1.414924 1.396815 1.383029 1.372184 1.36343 1.356217 1.350171 1.34503 1.340606 1.336757 1.333379 1.330391 1.327728 1.325341 1.323188 1.321237 1.31946 1.317836 1.316345 1.314972 1.313703 1.312527 0.05 6.313752 2.919986 2.353363 2.131847 2.015048 1.94318 1.894579 1.859548 1.833113 1.812461 1.795885 1.782288 1.770933 1.76131 1.75305 1.745884 1.739607 1.734064 1.729133 1.724718 1.720743 1.717144 1.713872 1.710882 1.708141 1.705618 1.703288 1.701131 0.025 12.7062 4.30265 3.18245 2.77645 2.57058 2.44691 2.36462 2.306 2.26216 2.22814 2.20099 2.17881 2.16037 2.14479 2.13145 2.11991 2.10982 2.10092 2.09302 2.08596 2.07961 2.07387 2.06866 2.0639 2.05954 2.05553 2.05183 2.04841 0.01 31.82052 6.96456 4.5407 3.74695 3.36493 3.14267 2.99795 2.89646 2.82144 2.76377 2.71808 2.681 2.65031 2.62449 2.60248 2.58349 2.56693 2.55238 2.53948 2.52798 2.51765 2.50832 2.49987 2.49216 2.48511 2.47863 2.47266 2.46714 0.005 63.65674 9.92484 5.84091 4.60409 4.03214 3.70743 3.49948 3.35539 3.24984 3.16927 3.10581 3.05454 3.01228 2.97684 2.94671 2.92078 2.89823 2.87844 2.86093 2.84534 2.83136 2.81876 2.80734 2.79694 2.78744 2.77871 2.77068 2.76326 0.0005 636.6192 31.5991 12.924 8.6103 6.8688 5.9588 5.4079 5.0413 4.7809 4.5869 4.437 4.3178 4.2208 4.1405 4.0728 4.015 3.9651 3.9216 3.8834 3.8495 3.8193 3.7921 3.7676 3.7454 3.7251 3.7066 3.6896 3.6739 Testing a Single Population Proportion: Large sample (n > 30) Test statistic: ztest p (1 ) n Small sample (n ≤ 30) Test statistic: ttest p (1 ) n Tests Involving Two Sample Means Small sample sizes 𝑡= 𝑥1 − 𝑥2 − (µ1 − µ2 ) 𝑛1 − 1 𝑠1 2 + 𝑛2 − 1 𝑠2 2 1 1 ( + ) 𝑛1 + 𝑛2 − 2 𝑛1 𝑛2 Degrees of freedom = 𝑛1 + 𝑛2 − 2 Example Students are trained using two different formats for an accounting program. A random sample of 10 students are trained using format 1, and the number of errors in a prototype examination is as follows : 11 8 8 3 7 5 9 5 1 3 . Another random sample of 12 students using format 2 was used and the errors in the same examination was : 10 11 9 7 2 11 12 3 6 7 8 12 . Example Investigate at the 10% level of significance if there is a difference in the mean of the samples. Solution 𝐻0 ∶ µ1 = µ2 (The two review formats are effectively equal) 𝐻1 ∶ µ1 ≠ µ2 (The two review formats are effectively not equal) Solution 𝑥1 = 6.000 and 𝑥2 = 8.167 𝑠1 = 3.127 and 𝑠2 = 3.326 𝑡= 6.000 − 8.167 1 1 10.484( + ) 10 12 = −1.563 Solution 𝑑𝑓 = 10 + 12 − 2 = 20 𝑡𝑐𝑟𝑖𝑡 = ±1.725 Since 𝑡 falls in the acceptance region, we conclude that there is no difference in the mean errors. Tests Involving Two Sample Means ztest x1 x2 ( 1 2 ) 2 1 2 2 s s n1 n2 Example A union representing workers at a large industrial concern accused management that discriminatory wages were paid to the workers in two production facilities, A and B. It claimed that workers in facility A were being paid differently than those in facility B. The company investigates the claim by examining the pay of 70 workers from each production facility. The results were as follows. Facility A Facility B Mean salary $455.00 $463.00 Std deviation $10.00 $13.00 What conclusion did the company reach? Investigate at the 5% level of significance. Solution H0: H1: A B A B → two tailed-test nA, nB > 30 → z test. α = 5% → zcrit = 1.96 z test x A xB s A2 / n A s B2 / n B 455 463 100 / 70 169 / 70 4.081 Since │ztest │ > │zcrit│ reject H0 → Sufficient statistical evidence to suggest a significant difference in the salaries. Tests Involving Two Sample Proportions ztest p1 p2 ( 1 2 ) 1 1 pq n1 n2 n1 p1 n2 p2 p n1 n2 q 1 p Example Surveys were conducted in two major cities “A” and “B” to ascertain viewer habits regarding a popular television channel. In city “A”, 1000 people were interviewed and 680 said they viewed the channel. In city “B”, 600 people were interviewed and 444 said they viewed the channel. Investigate, at the 5% level of significance, whether there is a significant difference between the viewing habits in the two cities. A B H1 : A B H0 : → two tailed-test; α = 5% → zcrit = 1.96 p n p B nB 680 444 p A A 0.7025 q = 1 – p = 0.2975 n A nB 1000 600 ztest p A pB 680 / 1000 444 / 600 2.54 pq(1 / nA 1 / nB ) 0.7025 0.29751 / 1000 1 / 600 Since │ztest │> │zcrit │, reject H0 at the 5% level of significance. → Sufficient statistical evidence to suggest a significant difference in the viewing habits. Chi-square Applications Major Characteristics: positively skewed non-negative family of chi-square distributions H0: There is no difference between the observed and expected frequencies. H1: There is a difference between the observed and the expected frequencies. Test statistic: 2 fo f e 2 stat fe The critical value is a chi-square value with (k-1) degrees of freedom, where k is the number of categories df\area 0.995 0.99 0.975 0.95 0.90 0.75 0.5 0.25 0.10 0.05 0.025 0.01 0.005 1 0.00004 0.00016 0.00098 0.00393 0.01579 0.10153 0.45494 1.3233 2.70554 3.84146 5.02389 6.6349 7.87944 2 0.01003 0.0201 0.05064 0.10259 0.21072 0.57536 1.38629 2.77259 4.60517 5.99146 7.37776 9.21034 10.5966 3 0.07172 0.11483 0.2158 0.35185 0.58437 1.21253 2.36597 4.10834 6.25139 7.81473 9.3484 11.3449 12.8382 4 0.20699 0.29711 0.48442 0.71072 1.06362 1.92256 3.35669 5.38527 7.77944 9.48773 11.1433 13.2767 14.8603 5 0.41174 0.5543 0.83121 1.14548 1.61031 2.6746 4.35146 6.62568 9.23636 11.0705 12.8325 15.0863 16.7496 6 0.67573 0.87209 1.23734 1.63538 2.20413 3.4546 5.34812 7.8408 10.6446 12.5916 14.4494 16.8119 18.5476 7 0.98926 1.23904 1.68987 2.16735 2.83311 4.25485 6.34581 9.03715 12.017 14.0671 16.0128 18.4753 20.2777 8 1.34441 1.6465 2.17973 2.73264 3.48954 5.07064 7.34412 10.2189 13.3616 15.5073 17.5346 20.0902 21.955 9 1.73493 2.0879 2.70039 3.32511 4.16816 5.89883 8.34283 11.3888 14.6837 16.919 19.0228 21.666 23.5894 10 2.15586 2.55821 3.24697 3.9403 4.86518 6.7372 9.34182 12.5489 15.9872 18.307 20.4832 23.2093 25.1882 11 2.60322 3.05348 3.81575 4.57481 5.57778 7.58414 10.341 13.7007 17.275 19.6751 21.9201 24.725 26.7569 12 3.07382 3.57057 4.40379 5.22603 6.3038 8.43842 11.3403 14.8454 18.5494 21.0261 23.3367 26.217 28.2995 13 3.56503 4.10692 5.00875 5.89186 7.0415 9.29907 12.3398 15.9839 19.8119 22.362 24.7356 27.6883 29.8195 14 4.07467 4.66043 5.62873 6.57063 7.78953 10.1653 13.3393 17.1169 21.0641 23.6848 26.119 29.1412 31.3194 15 4.60092 5.22935 6.26214 7.26094 8.54676 11.0365 14.3389 18.2451 22.3071 24.9958 27.4884 30.5779 32.8013 16 5.14221 5.81221 6.90766 7.96165 9.31224 11.9122 15.3385 19.3689 23.5418 26.2962 28.8454 31.9999 34.2672 17 5.69722 6.40776 7.56419 8.67176 10.0852 12.7919 16.3382 20.4887 24.769 27.5871 30.191 33.4087 35.7185 18 6.2648 7.01491 8.23075 9.39046 10.8649 13.6753 17.3379 21.6049 25.9894 28.8693 31.5264 34.8053 37.1565 19 6.84397 7.63273 8.90652 10.117 11.6509 14.562 18.3377 22.7178 27.2036 30.1435 32.8523 36.1909 38.5823 20 7.43384 8.2604 9.59078 10.8508 12.4426 15.4518 19.3374 23.8277 28.412 31.4104 34.1696 37.5662 39.9969 Example A certain drug is claimed to be effective in curing the common cold. In a clinical trial involving 500 patients having the common cold, 250 were given the drug and the rest were given sugar pills. The patients’ reactions to the treatment are recorded in the table below. Helped Harmed No Effect Total Drug 150 30 70 250 Sugar Pills 130 40 80 250 Total 280 70 150 500 On the basis of the above data, can it be concluded, at the 5% significance level, that there is a significant difference in the effect of the drug and sugar pills? f e0 H0: No significant difference in effect of drug and sugar pills. H1: There is a significant difference in effect of drug and sugar pills. α = 0.05, df = (2-1)(3-1) = 2 → f0 150 30 70 130 40 80 2 2 calc 3.524 crit fe 140 35 75 140 35 75 2 crit 5.991 f0 – fe 10 -5 -5 -10 5 5 (f0 - fe)2/fe 0.7143 0.7143 0.3333 0.7143 0.7143 0.3333 = 3.524 Hence do not reject H0 at α = 0.05. → insufficient statistical evidence to suggest that there is a significant difference between drug and sugar pills. LINEAR REGRESSION AND CORRELLATION • • • • • • Correlation analysis Scatterplot Correlation coefficient Dependent and independent variables The coefficient of determination Linear regression equation Correlation Coefficient Formula: r n xy x y n x x 2 2 n y y 2 The coefficient of determination = r2 2 The regression equation : Y' = a + bX Y' = average predicted value of Y for any X. a = Y-intercept = estimated Y value when X=0 b = slope of the line. b n xy x y n x x 2 y b x a n 2 Example The following data relates to the training periods and average weekly sales of seven randomly selected salesmen in a large company. Salesman Training (hours) A B C 20 5 10 Ave weekly sales ($’000) 44 22 35 D E F 13 12 8 32 27 26 G 15 35 1. Calculate the correlation coefficient. Comment on the value obtained. 2. Determined the coefficient of determination and interpret the value obtained. 3. Assuming a linear relation between the variables in the given data, obtain the regression equation connecting the variables. 4. Estimate the weekly sales of a salesman who had 22h of training. Is the result reliable? Explain. Solution 1. Let x denote training period (in hours) and let y denote sales (in $’000) x y x2 Y2 xy 20 5 10 13 44 22 35 32 400 25 100 169 1936 484 1225 1024 880 110 350 416 12 8 27 26 144 64 729 676 324 208 15 35 225 1225 525 83 221 1127 7299 2813 r n xy x y n x x 2 2 n y y 2 7 x 2813 83 221 (7 x1127 832 )(7 x7299 2212 ) 2 0.9 strong positive linear relationship between x and y 2. r2 = 0.81 81% of variation in Y due to variation in X. The remaining 19% due to other factors. 3. b n xy x y n x x 2 2 7 x 2813 83 221 1.35 = 2 7 x1127 83 a y bx = 221/7 – 1.35 x 83/7 =15.56 → y = 15.56 +1.35x 4. When x = 22 hours, y = 15.56 + 1.35 x 22 = 45.3 x $1000 = $45300 No. Regression equation valid only in the domain 5 ≤ x ≤ 20 TIME SERIES AND FORECASTING Components • The Secular Trend (T) • The Cyclical Variation (C) • The Seasonal Variation (S) • The Irregular Variation (I) Multiplicative Model: Y = T.C.S.I The linear trend equation : T = a + bt Seasonal Indices Moving average Centred moving average Ratio to centred moving average Adjusted seasonal average Deasonalizing a series. Example The Following table gives the quarterly healthcare claims (in R millions) against all healthcare claims for the period 2008 to 2010. Year Q1 Q2 Q3 Q4 2008 14.0 15.6 21.5 18.3 2009 13.1 14.7 24.8 19.4 2010 14.4 17.3 25.6 15.8 1. Represent the above data in as time series plot. 2. Calculate the quarterly seasonal indices for healthcare claims using the ratio-to moving average method. Interpret the results. 3. Derive a trend line using the method of least squares 4. Estimate the seasonally-adjusted trend value of health care claims for the third quarter of 2011. 1. 30.0 Quarterly Healthcare Claims ( in Rm) for the period 2008 - 2010 Claims (Rm) 25.0 20.0 15.0 10.0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 2008 2009 2010 2. Season Data (Rm) 4MA (Rm) Centred 4MA (Rm) Unadj. SI(%) 2008 Q1 Q2 Q3 Q4 2009 Q1 Q2 Q3 Q4 2010 Q1 Q2 Q3 Q4 14.0 15.6 21.5 18.3 13.1 14.7 24.8 19.4 14.4 17.3 25.6 15.8 17.350 17.125 16.900 17.725 18.000 18.325 18.975 19.175 18.275 - 17.238 17.013 17.313 17.863 18.163 18.650 19.075 18.725 - 124.7 107.6 75.7 82.3 136.5 104.0 75.5 92.4 - Q1 Q2 2 008 Q3 Q4 124.7 107.6 2 009 75.7 82.3 136.6 104.0 2 010 75.6 92.4 - - Mean SI 75.7 87.4 130.7 105.8 Adj. SI 75.7 87.5 130.9 106.0 The annual seasonal influences are as follows: Q1: substantial decrease of 24.3% Q2: decrease of 12.5% Q3: substantial increase of 30.9% Q4: increase of 6.0% t 1 2 3 4 5 6 7 8 9 10 11 12 ∑ = 78 3. 4. tT t2 T 1 14.0 14.0 4 31.2 15.6 9 64.5 21.5 16 73.2 18.3 25 65.5 13.1 36 88.2 14.7 49 173.6 24.8 64 155.2 19.4 81 129.6 14.4 100 173.0 17.3 121 281.6 25.6 144 189.6 15.8 ∑ = 214.5 ∑ = 650 ∑ = 1439.2 T(t) = 15.9 +0.31t Adj. Estimate for Q3 of 2011: Y(2011, Q3) = T(15) x 1.309 = (15.9 + 0.31 x 15) x 1.309 = 26.9 ≡ R26.9m STATISTICAL DECISION THEORY Components to Decision-Making Situation • Decision alternatives or acts States of nature • Payoffs • Decision Making Without Probabilities • Maximin Strategy • Maximax Strategy • Minimax Regret Strategy Decision Making with Probabilities • Payoff table • Expected Payoff or Expected Monetary Value (EMV) Decision Trees • Decision nodes • Even nodes • Tree Structure • EMV calculations Example A large corporation arranged to use an ocean linear as a floating hotel for its annual convention. The shipping company had to make a decision whether or not to lease the ship. If leased, the company would get a flat fee and an additional percentage of profits from the convention, which could attract as many as 50000 people. The company’s analysts estimated that if the ship were leased there would be a 50% chance of realizing a profit of $700000, a 30% chance of making a profit of $800000, 15% chance of making a profit of $900000 and a 5% chance of making a profit of $1m. If the ship were not leased, it could be used for its usual voyage over the convention duration. In this case there would a 90% probability of making a profit of $750000 and a 10% probability that profits would be $780000. The company has one additional option. It the ship were leased, and it became clear within the first few days of the convention that the profits were going to be in the $700000 range, the company could choose to promote the convention on its own by offering participants discounts on the ocean liner’s cruises. The company’s analysts believe that if this action were chosen there would be a 60% chance that profits would increase to $740000 and a 40% chance that the promotion would fail, lowering profits to $680000. 4.1 Draw a decision tree to depict the above problem. 4.2 What decision should the shipping company take? Show all working. Do not Promote $700000 C 0.5 0.4 $680000 0.6 $740000 Promote D 0.3 $800000 Lease B 0.15 0.05 $900000 $1000000 Do not lease $750000 0.9 A 0.1 $780000 4.2 EMV = max[EMV(A), EMV(B)] EMV(A) = $780000 x 0.1 + $750000 x 0.9 = $753000 EMV(B) = $1000000x0.05 + $900000x0.15 + $800000x0.3 + 0.5xEMV(C) = $425000+0.5xEMV(C) EMV(C) = max[$700000, EMV(D)] = max[$700000, $680000x0.4 + $740000x0.6] = $716000 → promote Hence EMV (B) = $425000 + $716000x0.5 = $783000 → EMV = $783000 Decision: Lease and then promote the convention if profits from lease are in the $700000 range.