* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction - Southern Oregon University
Survey
Document related concepts
Transcript
Lecture Outlines for Applied Business Statistics Rene Leo E. Ordonez, Southern Oregon University Summer 2010 Note: The problems below were based on the text Doing Statistics for Business with Excel, 2nd Edition, by Pelosi and Sandifer. The problems in each section are courtesy of same text. Page Number Reference to Excel and Minitab (Statistical functions) 3 Introduction 8 Probability Distributions 10 Sampling Distributions and Confidence Intervals 19 Hypothesis Testing: An Introduction 39 Inferences: One Population (Hypothesis Testing) 56 Comparing Two Populations 68 Improving and Managing Quality 83 Experimental Design and Analysis of Variance (ANOVA) 89 Analysis of Qualitative Data (Chi-square) 102 Regression and Correlation 113 Sample Midterm Exam 140 Sample Final Exam 148 EXCEL 2007 Statistical Procedure Descriptive Statistics (mean, median, etc.) MINITAB Data> Data Analysis > Descriptive Statistics Stat > Basic Statistics > Display Descriptive Statistics Confidence Interval Estimates Mean Proportion Data > Data Analysis > Descriptive Statistics NONE Stat > Basic Statistics > 1 Sample t (or 1 Sample z) Stat > Basic Statistics > 1 Proportion One Population Hypothesis Test Mean Proportion Data > Data Analysis > Descriptive Statistics NONE Stat > Basic Statistics > 1 Sample t (or 1 Sample z) Stat > Basic Statistics > 1 Proportion Two Populations Hypothesis Test Means of 2 Dependent Samples Means of 2 Independent Samples (small samples, equal vars.) Means of 2 Independent Samples (small samples, unequal vars) Means of 2 Independent Samples (large samples) Data > Data Analysis > t-Test:Paired Two Sample for Means t-Test: Two Sample Assuming Equal Variances t-Test: Two Sample Assuming Unequal Variances z-Test: Two Sample for Means Stat > Stat > Stat > Stat > Basic Statistics > Paired t Basic Statistics > 2 Sample t Basic Statistics > 2 Sample t Basic Statistics > 2 Sample z Variances of 2 Populations F-Test Two-Sample for Variances Stat > Basic Statistics > 2 Variances Proportion of 2 Populations NONE Stat > Basic Statistics > 2 Proportions Analysis of Variance One Factor (use for comparing 2 or more population means) Two Factor With Replication Two Factor Without Replication Interaction Effect Plot Data > Data Analysis > Anova: Single Factor Anova: Two-Factor Wtih Replication Anova: Two-Factor Wtihout Replication NONE Stat > ANOVA > Oneway Unstacked (or Stacked) Stat > ANOVA > Twoway Stat > ANOVA > Twoway Stat > ANOVA > Interactions Plot Chi-square Analysis Goodness of Fit Test Comparing Proportions of Two or More Groups Testing Independence of Two Nominal Variables NONE NONE NONE NONE Stat > Tables > Cross Tabulation (for raw data) Stat > Tables > Chisquare Test (for summarized data) Data > Data Analysis > F-Test Two-Sample for Variances Stat > Basic Statistics > 2 Variances Data > Data Analysis > Regression Stat > Regression Fitted Line Plot Regression Residual Plots Comparing Variances of Two Populations Regression and Correlation Analysis STATISTICAL PROCESSING USING MINITAB A free 30-day trial copy of the full commercial version of Minitab can be downloaded from www.minitab.com Basic Statistics Confidence Interval Estimation Hypothesis Testing Use for generating means, standard deviation, etc Use for testing a population mean (n ≥ 30 or known) Use for testing a population mean (n < 30, unknown, and normal) Use for testing a population proportion (approximation to the binomial) Use for comparing means of two INDEPENDENT sample Use for comparing means of two DEPENDENT samples Use for comparing proportions of two populations One-way and Two-way ANOVA Use for testing variances (or standard deviation) of two groups Use for One-way ANOVA (procedure for comparing means of two or more independent samples) Use for two-way ANOVA with replication Use for generating interactions plot for two-way ANOVA with replication BA 282: Applied Business Statistics Course Outline Chisquare Tests Use for performing Chisquare test using RAW data (procedure for testing whether two qualitative variables are independent) Use for performing Chisquare test using TABULATED data Regression and Correlation Analysis Use for generating output for regression and correlation analysis (for simple and multiple models) Lecture Notes to Accompany BA 282: Applied Business Statistics Page 4 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline STATISTICAL PROCESSING USING EXCEL Data > Data Analysis Use for One-way ANOVA Use for two-way ANOVA with replication Use for generating means, standard deviation, etc Use for testing variances (or standard deviation) of two groups Use for regression and correlation analysis Use for comparing means of two DEPENDENT sample Use for comparing means of two INDEPENDENT samples (n < 30 and equal variances) Use for comparing means of two INDEPENDENT samples (n < 30 and unequal variances) Use for comparing means of two INDEPENDENT samples (n ≥ 30) Lecture Notes to Accompany BA 282: Applied Business Statistics Page 5 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Important Note: If you don’t see Data Analysis option under the DATA tab you have to add it in. Here are the steps: 1. Click on the Microsoft Icon (upper left corner), then select Excel Options 2. Click Add Ins, then click Analysis Tool Pak VBA, then Go Lecture Notes to Accompany BA 282: Applied Business Statistics Page 6 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 3. Select Analysis Toolpak and Analysis Toolpak VBA, then click OK Lecture Notes to Accompany BA 282: Applied Business Statistics Page 7 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline INTRODUCTION 1. What is statistics? A science that deals with rules and procedures that govern how to: collect summarize describe interpret data 2. Why study statistics? Decisions! Decisions! Decisions! 3. The Importance of understanding probability Some 'real life' examples (it’s just a game!) Monty Hall Dilemma Suppose you're on a game show, and you're given the choice of three doors. Behind one door is a car, behind the others, goats. You pick a door, say number 1, and the host, who knows what's behind the doors, opens another door, say number 3, which has a goat. He says to you, "Do you want to pick door number 2?" Is it to your advantage to switch your choice of doors? (Craig. F. Whitaker,Columbia, MD ) Three Shell Game Operator: Step right up, folks. See if you can guess which shell the pea is under. Double your money if you win. After playing the game a while, Mr. Mark decided he couldn't win more than once out of three. Operator: Don't leave, Mac. I'll give you a break. Pick any shell. I'll turn over an empty one. Then the pea has to be under one of the other two, so your chances of winning go way up. 4. Ways of assigning (determining) probabilities Subjective - describes an individual's personal judgement about how likely a particular event is to occur. It is not based on any precise computation but is often a reasonable assessment by a knowledgeable person. Relative -- Relative probability is another term for proportion; it is the value calculated by dividing the number of times an event occurs by the total number of times an experiment is carried out. p( x) x n Objective (classical) – is probability based on symmetry of games of chance or similar situations. For example: Coin tossing experiment P(head) Die tossing experiment P(“one”) Monty Hall Dilemma P(win) Lecture Notes to Accompany BA 282: Applied Business Statistics Page 8 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 5. Important statistical terms and concepts (KNOW THESE DEFINITIONS AND SYMBOLS!) Population vs. sample population -- any entire collection of people, animals, plants or things from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about sample -- a group of items selected from a population. Conclusions about the population are drawn by studying the sample. Parameter vs. statistics parameter – a numeric characteristic of a population statistic – a numeric characteristic of a sample. It is used to estimate and unknown population parameter Parameters are often assigned Greek letters ( e.g. , , ), whereas statistics are assigned Roman letters (e.g. s, p). Common measures of central tendency Mean, median, mode Common measures of dispersion Range, variance, standard deviation Common symbols used in statistics Parameters Statistics Important Symbols: Must Know Size One Population Mean Variance Standard deviation Proportion Two Populations Comparing Means Comparing Proportions Comparing Variances Comparing Standard Deviations Lecture Notes to Accompany BA 282: Applied Business Statistics POPULATION N SAMPLE n 2 x s2 s p 1 vs 2 1 vs 2 21 vs 22 1 vs 2 x 1 vs x 2 p1 vs p2 s21 vs s22 s1 vs s2 Page 9 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline PROBABILITY DISTRIBUTIONS 1. Definitions probability – likelihood or chance of an event occurring experiment -- any process or study which results in the collection of data, the outcome of which is unknown. random variable -- an outcome of an experiment. It need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we often want to represent outcomes as numbers. Usually denoted by the letter “X” Example: toss a coin 5 times (experiment), observe the number of heads (variable) Randomly select 20 students (sample), record each student’s GPA (variable) 2. Random variables discrete (185) - usually involves counting (e.g. number of defectives, number of correct answers, etc.) If a random variable can take only a finite number of distinct values, then it must be discrete in the coin tossing experiment above, the random variable is “number of heads” x = {0, 1, 2, 3, 4, 5} continuous (186) – usually involves something that is measured A continuous random variable is one which takes an infinite number of possible values. Examples include height, weight, the amount of sugar in an orange, the time required to run a mile in the student sampling above, the random variable is GPA x = {0 to 4.0} 3. Common Discrete Probability Distributions Uniform Binomial (191) The trials must meet the following requirements: a) the total number of trials (n) is fixed in advance; b) there are just two outcomes of each trial; success and failure; c) the outcomes of all the trials are statistically independent; d) all the trials have the same probability of success Example: coin tossing Lecture Notes to Accompany BA 282: Applied Business Statistics Page 10 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Hypergeometric (200) a) each trial has just two outcomes; success and failure; b) the outcomes of all the trials are statistically dependent; c) the probability of success changes from trial to trial 4. Poisson (203)-- typically, a Poisson random variable is a count of the number of events that occur in a certain time interval or spatial area. For example, the number of cars passing a fixed point in a 5 minute interval; the number of calls received by a switchboard during a given period of time Common Continuous Probability Distributions Uniform (219) f(x) A B Exponential (Not covered but will be introduced and covered in BA 380-Operations Management) x 5. Normal (223 to 228) The Normal Distribution (223 to 228) Characteristics bell-shaped mean = median = mode area underneath the curve equals 1 symmetric about the mean (left side is mirror-image of right side) area left of mean = 0.50 = area right of mean asymptotic Lecture Notes to Accompany BA 282: Applied Business Statistics x Page 11 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline How to find areas (probabilities) using the Normal table (229 to 234) 6. A MUST UNDERSTAND CONCEPT! Standard Normal Distribution z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 1 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 2 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 Lecture Notes to Accompany BA 282: Applied Business Statistics Page 12 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline EXCEL FUNCTIONS FOR NORMAL DISTRIBUTION IMPORTANT: LEARN HOW TO USE THESE FUNCTIONS! Lecture Notes to Accompany BA 282: Applied Business Statistics Page 13 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Exercises in Using the Standard Normal Table Use Excel’s =NORMSDIST(z) function or the Standard Normal Distribution to answer the problems below. 1) Draw the normal distribution, and shade and find the areas (probabilities) of the following expressions: e.g. P ( Z > 0 ) = ? =1 0 z a) P ( Z < 1.0 ) b) P ( Z > 1.0 ) c) P ( Z < 1.0 ) d) P ( 1.0 < Z < 1.0 ) Lecture Notes to Accompany BA 282: Applied Business Statistics Page 14 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics e) P ( 1.0 < Z < 2.5) Course Outline f) P ( Z > 2.65 ) 2) Given probabilities and their respective probability expressions, draw the normal distribution, shade the areas and find the corresponding z values: Use Excel’s =NORMSINV(area) function or the Standard Normal Distribution to answer the problems below. a) P ( Z < z ) = 0.95 Lecture Notes to Accompany BA 282: Applied Business Statistics b) P ( Z > z ) = 0.95 Page 15 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics c) P ( Z < z ) = 0.25 Lecture Notes to Accompany BA 282: Applied Business Statistics Course Outline d) P ( Z > z ) = 0.25 Page 16 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Learning it! Exercises The amount of money spent by students for textbooks in a semester is a normally distributed random variable with a mean of $235 and a standard deviation of $15 (a) Sketch the normal distribution that describes the amount of money spent on textbooks in a semester. (b) What is the probability that a student spends between $220 and $250 in any semester? (c) What percentage of students spend more than $270 on textbooks in any semester? (d) What percentage of students spend less than $225 in a semester? Lecture Notes to Accompany BA 282: Applied Business Statistics Page 17 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline The actual amount of a certain brand of orange juice in a container marked half gallon is a normally distributed random variable with a mean of 65 oz. and a standard deviation of 0.35 oz. (a) What percentage of the containers contain more than 64.5 oz? (b) What percentage of the containers contain between 64 and 66 oz? (c) If federal law says that 98% of all the containers must be or above the labeled weight, does this brand of orange juice meet the requirement? Lecture Notes to Accompany BA 282: Applied Business Statistics Page 18 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline The size of a gift/specialty store in a regional super mall is a normally distributed random variable with a mean of 8,500 sq ft and a standard deviation of 260 sq ft. What is the probability that a randomly selected gift/specialty store in a regional super mall is: a) more than 8000 sq ft? b) between 8300 and 9000 sq ft? c) less than 9,500 square feet Lecture Notes to Accompany BA 282: Applied Business Statistics Page 19 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline SAMPLING DISTRIBUTIONS AND CONFIDENCE INTERVALS 1. The Distribution of the Sample Mean ( x ) and the Central Limit Theorem (266 to 269) Central Limit Theorem Definition (271): When randomly sampling from a population, the distribution of the sample mean( x ) is: approximately normal regardless of the original population distribution so as the sample is large (at least 30. But this sample size restriction is not required if the population is normal to begin with) with a mean x equal to and a standard deviation x equal to X n X x x = x n X X x Lecture Notes to Accompany BA 282: Applied Business Statistics X x Page 20 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics 2. Course Outline Confidence Intervals Use: For estimating unknown population parameters Definition of confidence interval (290) A probability that the interval contains the true population parameter e.g. P ( U ≤ ≤ L ) = 1 Components of confidence interval – point estimate and margin of error a. Point estimate (290) A single number that is calculated from sample data Is used to estimate a population parameter e.g. sample mean is a point estimate for population mean, sample proportion is a point estimate of population proportion POPULATION SAMPLE N n Size One Population 2 Mean Variance Standard deviation Proportion x s2 s p Two Populations Comparing Means Comparing Proportions Comparing Variances Comparing Standard Deviations 1 vs 2 1 vs 2 21 vs 22 1 vs 2 Parameters Lecture Notes to Accompany BA 282: Applied Business Statistics x 1 vs x 2 p1 vs p2 s21 vs s22 s1 vs s2 Point estimators a.k.a. statistics Page 21 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline b. Margin of error (e) when added to and subtracted from the point estimator gives the upper and lower limit for the range of values where the population parameter could be found is affected (or determined) by: confidence level sample size population variability Interpreting the confidence interval (294 - 295) Say that you computed a 95% confidence interval estimate for the mean of a certain population as 3.2 and 3.5 correct interpretation : “We are 95% confident that the interval 3.2 and 3.5 contains the true population mean” incorrect interpretation: “There is a 95% chance that the population mean is in the interval 3.2 and 3.5” 3. Computing a confidence interval for the population mean () z-dist for Large samples or known C.I . x Z / 2 ( n) t-dist for small samples and unknown C.I . x t / 2, n1 (s Lecture Notes to Accompany BA 282: Applied Business Statistics (290-297) (298-304) n) Page 22 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 4. Computing a confidence interval for qualitative data (the population proportion ()) (305-310) C.I . p Z / 2 p(1 p) n 5. Sample Size Calculations (311-313) For estimating a population mean z 2 2 n e2 For estimating a population proportion (Using the Normal distribution as an approximation to the Binomial distribution) z 2 p(1 p) n e2 Factors affecting sample size requirement (1) confidence level (2) variability of the population (3) acceptable level of margin of error Lecture Notes to Accompany BA 282: Applied Business Statistics Page 23 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline CONFIDENCE INTERVAL ESTIMATION A single POPULATION PARAMETER (Mean and Proportion) Population Proportion () Population Mean ( ) Yes No Is population standard deviation ( ) known? np and n(1-p) 5 ? Yes Yes Use z-distribution (assume Normal if n < 30) Use Normal Distribution as Approximation to the Binomial Distribution Is n 30? No Use t-distribution (assume Normal pop'n) C.I . x Z ( n) C.I. x Z ( n) C .I . x t ( s N n N 1 Lecture Notes to Accompany BA 282: Applied Business Statistics C.I. x t ( s n) n) C .I . p Z N n N 1 C.I. p Z p (1 p ) n p (1 p ) n Page 24 of 148 Ordonez, School of Business, SOU N n N 1 BA 282: Applied Business Statistics Course Outline CONFIDENCE INTERVAL ESTIMATION Confidence IntervaI for a. Known C.I . x Z / 2 ( b. n) Unknown C.I . x t / 2, n1 (s n) Confidence Interval for C.I . p Z / 2 p(1 p) n SAMPLE SIZE (n) DETERMINATION For estimating n z 2 2 e2 For estimating z 2 p(1 p) n e2 Lecture Notes to Accompany BA 282: Applied Business Statistics Page 25 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline STUDENT’S t-DISTRIBUTION TABLE Lecture Notes to Accompany BA 282: Applied Business Statistics Page 26 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Learning it! Exercises (by hand and using Excel) 7.6 A manufacturer of pain reliever claims that it takes an average of 12.75 minutes for a person to be relieved of headache pain after taking its pain reliever. The time it takes to relief is normally distributed with a standard deviation of 0.5 minutes. A sample of 12 people is taken and the data are shown here: 12.9 13.2 12.7 13.1 13.0 13.1 13.0 12.6 13.1 13.0 13.1 12.8 a. Find the sample mean b. Find the standard error of the sample mean c. If the manufacturer claims that the mean is 12.75 minutes, find the z-score of the sample mean d. What do you think of the manufacturer’s claim based on the z-score? (translation – if the claim made by the manufacturer is correct, how likely is it to observe a sample at least as large as your answer to (a)) Lecture Notes to Accompany BA 282: Applied Business Statistics Page 27 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.8 The U-Male-It Hardware chain has 10 different stores within a certain geographical region. The dollar value of customer sales is normally distributed with an average sale of $35.25 and a standard deviation of $2.50. You have recently been hired to manage one these stores and under your leadership the average sales based on a sample of 100 customers is $36.50. You are very proud of the increased average sale and point this out to senior management. a. Find the z-score for $36.50 b. Based on this z-score, is your pride justified? (translation – is the difference observed between the original average sale of $35.25 and the sample average sales of $36.50 significant, or is the difference mainly attributable to chance?) Lecture Notes to Accompany BA 282: Applied Business Statistics Page 28 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.16 A national grocery chain is considering opening a store at a particular location. To be sure that enough traffic goes by that location, the grocery chain took a sample of vehicles crossing the intersection on 40 days. The results are shown in the table below: Number of Cars Crossing Location per Day 1431 1540 1293 1340 1302 1700 1533 1402 1255 1840 1272 1467 1377 1642 1572 1220 1450 1139 1520 1477 1483 1227 1227 1515 1529 1684 1257 1242 1588 1782 1238 1350 1535 1491 1276 1367 1533 1513 1420 1375 a. Find a 95% confidence interval for the average number of cars that pass this location on a daily basis. The standard deviation is assumed to be 165 cars. b. The company has decided to open a store at this location only if there is a daily average of at least 1400 cars passing this location. Based on your confidence interval, would you advise the company to open a store at this location? Explain why or why not. Lecture Notes to Accompany BA 282: Applied Business Statistics Page 29 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.20 The police department is concerned about the ability of officers to identify drunk drivers on the road. Before instituting a new training program they take a sample of 28 arrests and record the level of alcohol in the blood at the time of the arrest. Assume that the level of alcohol in the blood is normally distributed. The data are shown below: 92 127 204 209 93 256 182 141 108 184 173 151 173 253 105 133 194 159 153 147 133 101 150 209 207 133 180 252 (a) Find a 90% confidence interval for the average alcohol level in the blood at the time of arrest (b) Find a 95% confidence interval for the average alcohol level in the blood at the time of arrest Lecture Notes to Accompany BA 282: Applied Business Statistics Page 30 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Excel Solution Data > Data Analysis > Descriptive Statistics Descriptive statistics Margin of Error Lecture Notes to Accompany BA 282: Applied Business Statistics Page 31 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Minitab Solution Stat > Basic Statistics > 1-Sample t > Options Lecture Notes to Accompany BA 282: Applied Business Statistics Page 32 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.21 A large amusement park has recently added 5 new rides, including a large roller coaster called Mind Eraser. Management is concerned about the waiting times on the new roller coaster. A random sample of 10 people is selected and the time (in minutes) that each person waits to ride the Mind Eraser is recorded and shown below: 43 80 48 61 74 66 54 72 58 68 (a) Find a 95% confidence interval for the average waiting time for the Mind Eraser, assuming that the waiting time is normally distributed. (b) The park management thinks that if customers have to wait more than 60 minutes for a ride, then the park should increase the staff to reduce the waiting time. Based on your confidence interval, does the park need to increase the staff? Explain why or why not. Lecture Notes to Accompany BA 282: Applied Business Statistics Page 33 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.25 I asked 100 imaginary friends (only to avoid the time and cost of data collection) the following question: “Do you regularly watch MTV’s Beavis and Butthead?” Of the 100 friends, 35 of them answered yes. (a) Calculate a 95% confidence interval for the viewership of this show. (b) MTV is considering canceling the show if less than one-third of the population regularly watches the show. Based on this information, what will MTV do? Lecture Notes to Accompany BA 282: Applied Business Statistics Page 34 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Minitab Solution Stat > Basic Statistics > 1-Proportion (No Excel procedure) Lecture Notes to Accompany BA 282: Applied Business Statistics Page 35 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.31 How many stores must be sampled for the woman who wants to buy a ranch to be 95% confident that the error in estimating the average fat content per pound in steaks sold in the Portland, Maine area is at most 0.05 oz? The standard deviation of fat content is known to be 0.30 oz. 7.32 How many months must be sample for analysts to be 99% confident that the error in estimating the average monthly price of peanut butter is at most $0.02? Assume the standard deviation is $0.035 Lecture Notes to Accompany BA 282: Applied Business Statistics Page 36 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.42 In an effort to improve the quality of the CD players that your company makes, you have started to sample the component parts that you purchase from an outside supplier. You will accept the shipment of parts only if there is less than 1% defectives in the shipment. Recognizing that you cannot test the entire shipment (or population), you select a sample of 25 components to test. You find 3 defectives in the sample. (a) Find a 90 percent confidence interval for the proportion of components in the population that are defective. (b) Based on your confidence interval, should you accept the shipment? Why or why not? Lecture Notes to Accompany BA 282: Applied Business Statistics Page 37 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline 7.44 A hotel is studying the proportions of rooms that are not ready when customers check in to the hotel. (a) How many rooms must be in the sample for the hotel to be 95% confident that the margin of error is at most 1%? (c) How many rooms must be in the sample for the hotel to be 95% confident that the margin of error is at most 3%? Lecture Notes to Accompany BA 282: Applied Business Statistics Page 38 of 148 Ordonez, School of Business, SOU BA 282: Applied Business Statistics Course Outline Parametric Hypothesis Testing Testing a Mean () z test t test Comparing Two or More Populations Comparing Two Populations One Population Testing a Proportion () Testing a Variance ( 2) z -test F test Comparing Two Means ( ) Dependent Samples Comparing Two Proportions ( ) Comparing Two Variances ( 2 2) z-test F test Independent Samples Reverts back to One Population t test z -test t test Equal variances t-test (pooled t-test) Lecture Notes to Accompany Applied Business Statistics Unequal variances t-test Page 39 of 148 School of Business, SOU Comparing Means of 2 or More Groups ( ) Comparing Proportions of 2 or More Groups ( ) ANOVA F test Chi-square test BA 282: Applied Business Statistics Course Outline HYPOTHESIS TESTING: AN INTRODUCTION 1. What is a hypothesis test? (327) a hypothesis is an idea, an assumption, or a theory about the behavior of one or more variables in one or more populations a hypothesis test is a statistical procedure that involves formulating a hypothesis and using sample data (n) to decide on the validity of the hypothesis i.e. is the sample consistent with the hypothesis (in which case you believe the hypothesis) or whether the sample is inconsistent with the hypothesis (in which case you choose not to believe it or to reject it) important! in statistical testing, regardless of the specific hypothesis that you are testing, the basic procedure is the same! Your understanding of the concepts introduced in this chapter is crucial for the remaining chapters! 2. Steps in performing hypothesis test: (328-332) Step 1: Set up the null and alternative hypotheses Step 2: Identify the significance level () for determining the critical value Step 3: Identify the appropriate distribution Step 4: Collect the sample data (for determining the computed value) Step 5: Compare the computed value to the critical value (or the p-value to the significance level) Step 6: Make a statistical conclusion (reject the null or fail to reject the null) Step 7: Make a managerial conclusion (usually a statistical test is conducted to assist in a decision-making process) 3. Null vs. Alternative Hypotheses and decision rule (329) Important things to remember about H0 and H1 H0: null hypothesis and H1: alternate hypothesis H0 and H1 are mutually exclusive and collectively exhaustive H0 is always presumed to be true H1 has the burden of proof a random sample (n) is used to “reject H0” or to “fail to reject H0 “ If we conclude 'do not reject H0', this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favor of H1; rejecting the null hypothesis then, suggests that the alternative hypothesis may be true. equality is always part of H0 (e.g. “=” , “≥” , “≤”). “≠” “<” and “>” always part of H1 H 0 : H 1 : Lecture Notes to Accompany Applied Business Statistics Page 40 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Structure: H 0 : null hypothesis H1 : alternate hypothesis (can also be written as H A ) Reject H 0 if : {evaluativ e condition} is true 4. Setting up the null and alternative hypotheses (Is the test two-tailed (non-directional) or onetailed (directional)?) and establishing the Rejection Region (333) identify the parameter being tested (, , 2) determine how many populations are included in the test Is “the claim” the null hypothesis or the alternate hypothesis? In actual practice, the status quo is set up as H0 If the claim is “boastful” the claim is set up as H1 (we apply the Missouri rule – “show me”). Remember, H1 has the burden of proof In problem solving, look for key words and convert them into symbols (see examples below) Some Examples Keywords Larger (or more) than Smaller (or less) No more than At least Has increased Is there difference? Has not changed Has “improved”, “is better than”. “is more effective” Inequality Symbol > < ≥ > ≠ = Part of: H1 H1 H0 H0 H1 H1 H0 See note below H1 The direction of the test involving claims that use the words “has improved”, “is better than”, and the like will depend upon the variable being measured. For instance, if the variable involves time for a certain medication to take effect, the words “better” “improve” or more effective” are translated as “<” (less than, i.e. faster relief). On the other hand, if the variable refers to a test score, then the words “better” “improve” or more effective” are translated as “>” (greater than, i.e. higher test scores) The equality (, ≥, =) is always part of the null hypothesis. Lecture Notes to Accompany Applied Business Statistics Page 41 of 148 School of Business, SOU BA 282: Applied Business Statistics 5. Course Outline Two types of error in hypothesis testing: Type 1 () vs.Type 2 () (330-333) Statistical definitions: Type 1 () – the probability of rejecting a TRUE H0 Type 2 () – the probability of failing to reject (or “accepting”) a FALSE H0 True Condition Statistical Conclusion H0 TRUE H0 FALSE Reject H0 Type 1 () Correct Correct Type 2 () Fail to reject H0 More on Type 1 (): in addition to its definition as “the probability of rejecting a TRUE H0 it is also: known as the significance level of a test (or simply, the significance level) usually ranges between 0.01 and 0.10 (which level is ‘best’? see next subsection) used to generate the critical value for a test an area at the tail end of a distribution, and this area is known as the reject H0 region (or the rejection region) The critical value marks the boundary between the reject H0 and fail to reject H0 regions 0 z Which should be avoided - Type 1 or 2 error? For a given sample size (n), there is a trade-off between Type 1 and Type 2 errors, that is, decreasing one will increase the other To decrease both types at the same time, a larger sample size must be taken However, because of cost, time, and practicality of sampling concerns, oftentimes we need to choose between type 1 and type 2 errors. Which should we decrease? Depends on the cost associated with each type of error Lecture Notes to Accompany Applied Business Statistics Page 42 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline EXAMPLES: In each of the example below, the Type 1 and Type 2 errors are defined in nonstatistical terms. Can you identify the ‘cost’ associated with each type of error? For instance, in criminal cases, the cost associated with a Type 1 error (that is, a jury convicting an innocent person) is the freedom, or worse yet, the life of the accused. Now compare this to the cost of a Type 2 error. As a society, which do we consider as worse? Justice system - criminal and civil cases H0: Innocent H1: Guilty True Condition Statistical Conclusion Reject H0 (Guilty) Fail to reject H0 (not Guilty) Innocent Guilty Type 1 () – conclude that accused is guilty when in fact is innocent Correct Correct Type 2 () – conclude that accused is not guilty when in fact is Business - quality control situations – process monitoring H0: Process is in control H1: Process is not in control True Condition Statistical Conclusion Reject H0 (process not OK) Fail to reject H0 (process OK) Lecture Notes to Accompany Applied Business Statistics Process OK Process Not OK Type 1 () – conclude that process is not in control when in fact is Correct Correct Type 2 () – conclude that process is OK when in fact is not Page 43 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Business - quality control situations- quality assurance H0: Lot of shipment is good H1: Lot of shipment is not good True Condition Statistical Conclusion Reject H0 (shipment is not good) Fail to reject H0 (shipment is good) Lecture Notes to Accompany Applied Business Statistics Good Lot Not Good Lot Type 1 () – conclude that lot is not good when in fact is (producer’s risk) Correct Correct Type 2 () – conclude that shipment is good when in fact is not (consumer’s risk) Page 44 of 148 School of Business, SOU BA 282: Applied Business Statistics 6. P-values Course Outline (339) The probability value (p-value) of a statistical hypothesis test is the probability of getting a value of the test statistic as extreme as or more extreme than that observed by chance alone, if the null hypothesis H0, is true. (see example below) It is the probability of wrongly rejecting the null hypothesis if it is in fact true. When used as a decision rule in hypothesis testing, the p-value is compared to the significance level (α). If the r-value is smaller, the conclusion is to reject the null hypothesis (or, we say that the result “is significant.” Here’s a decision rule using the p-value as a decision rule – this applies to ALL forms of hypothesis tests! H 0 : null hypothesis H1 : alternate hypothesis Reject H 0 if : Remember this very important rule! p - value Important interpretation! Small p-values suggest that the null hypothesis is unlikely to be true. The smaller it is, the more convincing is the rejection of the null hypothesis. It indicates the strength of evidence for say, rejecting the null hypothesis H0, rather than simply concluding 'reject H0' or 'do not reject H0'. Lecture Notes to Accompany Applied Business Statistics Page 45 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline P-value Example: example: Hypothesis Major Concepts, Hypothesis One PopulationDetermining Appropriate Test, Hypothesis Testing Major Concepts_Pvalues Lecture Notes to Accompany Applied Business Statistics Page 46 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline TESTING A POPULATION MEAN () H0: = value H1: value H0: value H1: < value Reject H0 if: Z < Z t < t, n 1 Reject H0 if: Z > Z/2 t > t/2, n 1 H0: value H1: > value Reject H0 if: Z > Z t > t, n 1 t x s z n x n TESTING A POPULATION PROPORTION () H0: = value H1: value Reject H0 if: Z > Z/2 H0: value H1: < value Reject H0 if: Z < Z H0: value H1: > value Reject H0 if: Z > Z z p (1 ) n TESTING A POPULATION VARIANCE (2) H0: 2 = value H1: 2 value Reject H0 if: 2 < 2 1-/2 2 > 2 /2 H0: 2 value H1: 2 < value Reject H0 if: 2 < 2 1- Lecture Notes to Accompany Applied Business Statistics H0: 2 value H1: 2 > value Reject H0 if: 2 > 2 Page 47 of 148 School of Business, SOU 2 (n 1) s 2 2 BA 282: Applied Business Statistics Course Outline Learning it! Exercises Setting up the Hypotheses and Determining Type I and II Levels For items 8.7 to 8.27 below, do the following: (a) State the Null and Alternative hypotheses (b) State the consequence of a Type I error (c) State the consequence of a Type II error (d) Suggest a value for , and justify your choice 8.7 Administrators at a small college are concerned that part-time evening students may not be familiar with all the services of the College. They wish to offer an orientation program to these students but recognize that most of the part-time students work during the day and are generally very busy. The administrators do not want to prepare an elaborate presentation if only a handful of part-time students will attend. Hence, they will conduct the orientation if more than 25% of the part-time students are interested in attending. 8.8 A company CEO is thinking about setting up an on-site day-care program for its employees. The CEO has stated that she will do so only if more than 80% of the employees favor such a decision. Set up the null and alternative hypotheses. 8.9 In an attempt to improve quality many manufacturers are developing partnerships with their suppliers. A local fast-food burger outfit has partnered with its supplier of potatoes. The burger outfit buys potatoes in bags that weigh 20 lbs. It wishes to set up the null and alternative hypotheses to test if the bags do weigh on average 20 lbs. 8.10 You are a connoiseur of chocolate chip cookies and you do not think that Nabisco’s claim that every bag of Chips Ahoy cookies has 1000 chocolate morsels is correct. Set up the null and alternative hypotheses to test this claim. Lecture Notes to Accompany Applied Business Statistics Page 48 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 8.11 Antilock brake systems (ABS) have been hailed as a revolutionary safety feature. A study by the National Traffic Safety Administration looked at fatal accidents. The claim is that cars with ABS are in fewer fatal crashes than those without. 8.12 A college placement office wonders whether there is a difference between the average salary of engineering graduates and business school graduates. 8.13 Your new television has a 1-year warranty. You are given the option to buy a 3-year warranty and wonder if it is worth it. You wish to test the hypothesis that the average time before a problem occurs is more than 3 years 8.14 M&M/Mars claims that at least 20% of the M&M’s in each package are the new blue color 8.15 A computer center is arguing for more computers in the lab for students at a midsize college. The computer center at a university claims that the average amount of time that students spend on-line has increased from last year’s average of 1 hour per day. Lecture Notes to Accompany Applied Business Statistics Page 49 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 8.16 It seems like you spend more money on groceries during the summer months when you eat more ice cream and drink more fluids. You know that you spend on average of $25 per week on groceries during the winter months. Set up the null and alternative hypotheses to decide if, on average, you spend more than this amount per week during the summer months. 8.17 M&M Mars claims that at least 20% of the M&M’s in each package are the new blue color. Set up the null and alternative hypotheses to test this claim. 8.18 The computer center at a university claims that the average amount of time that students spend on-line has increased from last year’s average of 1 hour per day. Set up the null and alternative hypotheses to test this claim. Lecture Notes to Accompany Applied Business Statistics Page 50 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline (note: for the following problems use Excel’s Data Analysis Tools to generate the descriptive statistics to minimize hand calculation) Problem 8.1 The School Committee members of a midsize New England city agreed that a strict discipline code had caused an increase in the number of student suspensions. The number of suspensions for a sample of schools in this city for the periods September 1992 to February 1993 is shown below: CITY Central MCDI Chestnut Duggan Kennedy Forest Park Puttnam Kiley Central Academy Commerce Bridge Number of Suspensions 245 1 65 133 97 149 1024 56 254 114 7 The average number of suspensions for the previous year was 130.5 with a population standard deviation of 158.2 (a) Set up the null and alternative hypotheses to test if the average number of suspensions has changed (b) Test your hypothesis using significance level of 0.05 (c) Find the p-value (d) Display the data to see if it is reasonable to assume that the underlying population distribution is normal. (e) Based on the p-value, what can you conclude about the average number of suspensions. Lecture Notes to Accompany Applied Business Statistics Page 51 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Minitab Solution Stat > Basic Statistics > 1-Sample Z Population standard deviation Hypothesized mean Raw data Direction of test (alternative hypothesis) Lecture Notes to Accompany Applied Business Statistics Page 52 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Minitab Output Confidence interval for the true mean Lecture Notes to Accompany Applied Business Statistics Computed statistic (compare to the critical statistic) p-value of the test (compare to the significance level) Page 53 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 8.2 The Educational Testing Service (ETS) designs and administers the SAT exams. Recently the format of the exam changed and the claim has been made that the new exam can be completed in an average of 120 minutes. A sample of 50 new exam times yielded an average of 115 minutes. The standard deviation is assumed to be 2 minutes. (a) Set up the null and the alternative hypotheses to test if the average time to complete the exam is has changed from 120 minutes. (b) Test your hypothesis using significance level of 0.05 (c) Find the p-value (d) Based on the p-value, what can you conclude about the average time to complete the new exam? Lecture Notes to Accompany Applied Business Statistics Page 54 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 8.28 A major manufacturer of glue products thinks it has found a way to make glue adhere longer than the current average of 90 days. The manufacturer wishes to see whether the glue products made this way have an average time to failure greater than 90 days. A sample of 30 tubes of new glue yield an average of 93 days before failing. The failure time is normally distributed with a standard deviation of 3 days. (a) Set up the null and the alternative hypotheses to test whether average time to failure is greater than 90 days. (b) Test your hypothesis using significance level of 0.05 (c) Find the p-value (d) Based on the p-value, what can you conclude about the average time to failure for the new product? Lecture Notes to Accompany Applied Business Statistics Page 55 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline INFERENCES: ONE POPULATION (HYPOTHESIS TESTING) 1. Testing the Mean () z-dist for Large samples or known t-dist for small samples and unknown (334) (341) 2. Testing the Population Variance(2) 2 distribution (Not covered) 3. Testing the Population Proportion() z-dist. (approx. to the Binomial dist.) (349) 4. Hypothesis Testing using Minitab and Excel One Population Testing a Mean () z test Testing a Proportion () Testing a Variance ( 2) z -test F test t test Lecture Notes to Accompany Applied Business Statistics Page 56 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline TESTING A POPULATION MEAN () H0: = value H1: value H0: value H1: < value Reject H0 if: Z < Z t < t, n 1 Reject H0 if: Z > Z/2 t > t/2, n 1 z x H0: value H1: > value Reject H0 if: Z > Z t > t, n 1 t x s n n TESTING A POPULATION PROPORTION () H0: = value H1: value Reject H0 if: Z > Z/2 H0: value H1: < value Reject H0 if: Z < Z z H0: value H1: > value Reject H0 if: Z > Z p (1 ) n TESTING A POPULATION VARIANCE (2) H0: 2 = value H1: 2 value Reject H0 if: 2 < 2 1-/2, n - 1 2 > 2 /2, n - 1 H0: 2 value H1: 2 < value Reject H0 if: 2 < 2 1-, n-1 2 Lecture Notes to Accompany Applied Business Statistics H0: 2 value H1: 2 > value Reject H0 if: 2 > 2 , n - 1 (n 1) s 2 2 Page 57 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Learning it! Exercises Problem 9.1 The cost of common goods and service in 5 cities is shown in the table below (USA Today): City Aspirin (100) Los Angeles Tokyo London Sydney Mexico City $7.69 $35.93 $9.69 $7.43 $1.16 Fast Food (hamburger, fries, soft drink) $4.15 $7.62 $5.80 $4.53 $3.63 Woman’s Toothpaste Haircut/Blow Dry (6.4 oz) $20.11 $76.24 $44.35 $29.93 $17.94 $2.42 $4.24 $3.63 $2.08 $1.08 a. You have just returned from a business trip and you lost your receipt for the aspirin you purchased but would like to be reimbursed by your company (since you had taken the aspirin after a stressful business meeting!). You guesstimate a cost of $10.00. Your boss claims that the average cost of aspirin is less than $10.00. Using these data, can you “prove” your boss wrong? Conduct the necessary hypothesis test. Assume that all costs are normally distributed. b. Based on these data, is there enough evidence to support your submitting a cost of $10.00 for the fast-food meal on your trip? c. If you remove Tokyo from the data set do your answers to parts (a) and (b) change? What does this tell you about the effect of outliers on the hypothesis test of µ when you have a small sample? Lecture Notes to Accompany Applied Business Statistics Page 58 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 9.2 The marketing material for a New England Ski resort advertises that they can make snow whenever the temperature is 32°F or below. To demonstrate how often this happens their material includes the following line graph of the weekly average temperatures (See graph in text). The data that generated the graph are shown below: Week Temperature 1 2 18 19 3 24 4 35 5 33 6 14 7 22 8 20 9 23 10 33 11 27 12 23 13 30 14 35 Is there enough evidence for the ski resort to claim that the average weekly temperature is less than 32°F? Assume that the average weekly temperature is normally distributed. Lecture Notes to Accompany Applied Business Statistics Page 59 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 9.4 If you like shopping for the best deal on long-distance phone services, then you’ll enjoy sorting through offers from 10 different marketers vying to be your energy supplier. Residents of 16 communities will be the first in Massachusetts to wade into the coming nationwide experiment in deregulation of the natural gas industry. The average consumer uses 1232 therms of natural gas, for which the average cost has been $520.24. The table shows proposed costs to deliver 1232 therms of gas from 10 competitors: Company All Energy Marketing Co Broad Street/ Energy One Global Energy Services Green Mountain Energy Partners KBC Energy Services Louis Dreyfus Energy Services National Fuel Resources NorAm Energy WEPCO Gas Western Gas Resources Lecture Notes to Accompany Applied Business Statistics Cost ($) 478.66 450.24 468.16 471.24 435.53 472.24 468.22 442.20 443.52 457.81 Page 60 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 9.5 Computer centers at universities and colleges are certainly aware of the increased number of Web surfers. To begin to understand the demands that will be made on the computer center resources, one school studied 25 children in grades 7 to 12. The number of hours that these children spend on the Internet in 1 week is shown here: 5.0 4.6 5.9 4.4 4.9 5.1 5.7 4.0 3.8 5.6 6.7 4.1 5.5 5.5 6.7 5.2 5.4 5.0 6.7 4.8 5.8 3.6 5.4 4.1 4.8 Is there enough evidence to indicate that children spend more than a average of 5 hours per week Web surfing? Assume that the time spent Web surfing is normally distributed. Lecture Notes to Accompany Applied Business Statistics Page 61 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 9.6 A company that sells mail-order computer systems has been planning inventory and staffing based on an assumption that the variance of their weekly sales is 180 ($1000 2). The weekly sales are normally distributed. The company selects 15 weeks at random from the past year and obtains the data (in thousands of dollars) shown below: Weekly Sales: 191 222 222 223 223 225 227 228 229 232 234 234 236 244 253 a) What is the sample variance for these data? b) Set up the hypotheses to test whether the population variance is different from 180. c) At the 0.05 level of significance, what can you conclude about the company’s assumption? Lecture Notes to Accompany Applied Business Statistics Page 62 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 9.7 In manufacturing, the amount of material that is wasted or lost during a process is very important. In preparing financial estimates, a company assumes that the percent material lost for its new process has a variance 10%2 . After the new process has been running for a month and appears to be stable, the cost analyst looks at the percent material lost and finds the following data: Daily Loss: 10 12 12 13 14 14 18 19 19 20 a) What is the sample variance for these data? b) Set up the hypotheses to test whether the actual variance is greater than the value the company has been assuming. Assume that the daily loss is normally distributed. c) At the 0.05 level of significance, what can you conclude? Lecture Notes to Accompany Applied Business Statistics Page 63 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 9.10 Companies are increasingly concerned about employees playing video games at work. In addition to reducing productivity, this habit shows down networks and uses valuable storage space. A recent article stated at 80% of all employees play video games at work at least once a week. A large company that employs many engineers wonders if its employees are as bad as the article claims. If they are, the company will install software that detects and removes video games from the network. The company surveys (anonymously) 100 employees and finds that 85 of the employees surveyed have played video games at work in the past week. a. Set up the null and alternative hypotheses to test whether the proportion of the company’s employees that play video games is greater than the proportion stated in the article. b. At the 0.05 level of significance, test the hypotheses. c. What do you recommend that the company do? Lecture Notes to Accompany Applied Business Statistics Page 64 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 9.11 An alumni office is interested in serving their alumni better in order to encourage more donations to the college. A survey of 200 alumni was conducted to determine whether half-day training sessions offered on the campus were of interest. If more than 75% of the alumni were interested, the college would start a program. The survey showed that 160 of the alumni surveyed were interested in such a program. a. Set up the null and alternative hypotheses to test whether the college should implement the program. b. At the 0.05 level of significance, test the hypotheses. c. What do you recommend that the college do? Lecture Notes to Accompany Applied Business Statistics Page 65 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 9.12 A company that makes computer keyboards has specifications that allow it produce a product that has a maximum of 3% defective. The company has been receiving more customer complaints than usual. A sample of 50 keyboards has 2 defectives. a. Set up the null and alternative hypotheses to test whether the proportion defective keyboards has exceeded the amount allowed by the specifications. b. At the 0.05 level of significance, test the hypotheses. c. What do you recommend that the company do? Lecture Notes to Accompany Applied Business Statistics Page 66 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline COMPARING TWO POPULATIONS 1. Comparing Means of Two Populations (1 vs. 2) Dependent vs. Independent Samples Comparing Means using Two Independent Samples - Large samples (z-distribution) - Small samples (t-distribution) Comparing Means using Two Dependent Samples - t-distribution (365) (375) (384) 2. Comparing Proportions of Two Populations (1 vs. 2) Using the z-distribution as approximation to the Binomial (371) 3. Comparing Variances of Two Populations (21 vs. 22) Using the F-distribution (404) are available Lecture Notes to Accompany Applied Business Statistics Page 67 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline COMPARING TWO POPULATIONS: HYPOTHESIS TESTING Comparing Means of T wo P opulat ions ( 1 vs. (1 Dependent Samples Comparing Variances of T wo P opulat ions Comparing Proportions of Two Populations 2) vs. 2) 2 2 ( 1 vs. 2) Independent Samples Yes No Are n 1 and One populat ion t-t est Are n 1 1 , n1(1-1) , n2 2 , n2(1-2) 5? No n2 30? Yes 1 t Use z test d s n Yes Are the population variances equal? 2 z No ( x1 x 2 ) ( 1 2 ) 2 2 s1 s2 n1 n2 t Use pooled t test Use unpooled t test Use z-test 3 4 5 ( x1 x 2 ) ( 1 2 ) 1 1 s 2p n1 n2 where : s 2p Lecture Notes to Accompany Applied Business Statistics z (n1 1) s ( n2 1) s n1 n2 2 2 1 2 2 ( p1 p2 ) ( 1 2 ) 1 1 p (1 p ) n1 n2 Page 68 of 148 n1 p1 n2 p2 School of Business, where SOU: p n1 n2 Us e Bin om ial Dis t rib ut i on Use F test 6 where : s2 F L2 sS sL2 larger of the two sample variances sS2 smaller of the two sample variances v1 (n 1), where n is the size of the sample that has the larger variance v2 (n 1), where n is the size of the sample that has the smaller variance BA 282: Applied Business Statistics Course Outline COMPARING TWO POPULATION MEANS (1 vs. 2) H0: 1 = 2 H1: 1 2 Reject H0 if: Z > Z/2 t > t/2, n1+ n2 2 H0: 1 2 H1: 1 < 2 Reject H0 if: Z < Z t < t, n1+ n2 2 z ( x1 x 2 ) ( 1 2 ) 2 t 2 s1 s 2 n1 n2 H0: 1 2 H1: 1 > 2 Reject H0 if: Z > Z t > t, n1+ n2 2 ( x1 x 2 ) ( 1 2 ) 1 1 s 2p n1 n2 where : s 2p (n1 1) s12 (n2 1) s22 n1 n2 2 COMPARING TWO POPULATION PROPORTIONS (1 vs. 2) H0: 1 = 2 H1: 1 2 Reject H0 if: Z > Z/2 H0: 1 2 H1: 1 < 2 Reject H0 if: Z < Z z ( p1 p2 ) ( 1 2 ) 1 1 p(1 p) n1 n2 wher e : p H0: 1 2 H1: 1 > 2 Reject H0 if: Z > Z n1 p1 n2 p2 n1 n2 COMPARING TWO POPULATION VARIANCES (21 vs. 22) H0: 21 = 22 H1: 21 22 Reject H0 if: F > F(/2,v1,v2) where : F sL2 sS2 sL2 larger of the two sample variances sS2 smaller of the two sample variances v1 (n 1), where n is the size of the sample that had the larger var iance v2 (n 1), where n is the size of the sample that had the smaller va riance Lecture Notes to Accompany Applied Business Statistics Page 69 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Learning it! Exercises 10.1 Many studies have been done comparing consumer behavior of men and women. One such ongoing study concerns take-out food. In particular, the study focuses on whether there is a difference in the mean number of times per month that men and women buy take-out food for dinner. The most recent results of the study are shown below: Population Sample Size Sample Mean Population Standard Deviation Men 34 25.6 4.2 Women 28 21.2 3.7 Because the study has so much historical data, information is known about the population standard deviations. a. Set up the hypotheses to test whether there is a difference in the mean number of times per month that a person buys take-out food for dinner for men and women. b. Use the Z test with known population variances to set up and perform the test. Use a level of significance of 0.05. c. Find the p value for the test. d. Do the data provide evidence that the mean number of times per month for men differs from that for women? e. Does the choice of α in this case affect the decision? Lecture Notes to Accompany Applied Business Statistics Page 70 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 10.2 Professional employees who work for large corporations often contend that the mean salary paid by a company differs by location in the United States. To test that claim, data were collected on financial analysts working for a large corporation at locations in New England and in the upper Midwest. Because there is an extensive history of salary data, the population standard deviations are available. The study found the following results: Sample Size Sample Mean Population Standard Deviation Population New England Upper Midwest 25 20 22.3 18.5 1.5 2.2 a) Set up the appropriate hypotheses to test whether the company’s analysts in New England were paid more, on the average, than those working in the upper Midwest. b) Use the Z test with known population variances to set up and perform the test. Use a level of significance of 0.05. c) Find the p value for the test. d) Do the data support the contention that the mean pay for analysts in New England is higher than that of analysts in the upper Midwest? Lecture Notes to Accompany Applied Business Statistics Page 71 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 10.11 Having learned about the paired t test you realize that you really should have used the test for the data on software price comparison. The data are repeated below: Top Ten Business Software Packages Windows 95 Upgrade Norton Anti-Virus McAfee ViruScan First Aid 97 Deluxe Clean Sweep III Norton Utilities Netscape Navigator MS Office Pro 97 Upgrade First Aid 97 Win Fax Pro Computability PC Connection Price ($) Price ($) 88 95 59 70 49 60 54 58 37 37 68 75 45 40 300 310 32 35 95 95 a) Calculate the differences between the prices for each type of software package. Just looking at the differences, do you think that one company charges more than the other? Why or why not? b) Calculate the average difference and the standard deviation of the differences. c) Set up the hypotheses to test whether the mean difference in price between the two companies is zero. d) Assuming that the data are normally distributed, at the 0.05 level of significance, is there a difference in the mean price of software for the two companies? e) Did these results differ from the last time you analyzed the data? Why do you think this happened? Lecture Notes to Accompany Applied Business Statistics Page 72 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Minitab Solution Stat > Basic Statistics > Paired t Lecture Notes to Accompany Applied Business Statistics Page 73 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Excel Solution Data > Data Analysis > t-test: Pair Two Sample for Means Lecture Notes to Accompany Applied Business Statistics Page 74 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 10.12 A hospital administrator is concerned about the length of time that the nursing staff washes their hands. A recent study in health care showed that longer washing greatly reduces the spread of germs. The hospital observed the amount of time that a sample of nine nurses in the Cardiac Care Unit (CCU) washed their hands. The data were colleted in such a way that the employees did not know that they were being observed. The hospital then showed the nurses an educational video on the negative effects of shortened time spent hand washing. After the video, the hospital again watched and timed the group of nurses washing their hands. The data are shown below: Observation 1 2 3 4 5 6 7 8 9 Unit CCU CCU CCU CCU CCU CCU CCU CCU CCU Time 1 (s) 3 2 0 5 2 0 2 3 0 Time 2 (s) 16 7 5 8 15 15 20 16 18 a) Calculate the differences between the times for each nurse. Just looking at the differences, do you think that, on the average, they washed their hands longer the second time? Why or why not? b) Calculate the average difference and the standard deviation of the differences. c) Set up the hypotheses to test whether there was an increase in the average amount of time spent washing hands. d) Assuming that the data are normally distributed, at the 0.05 level of significance, what can you conclude? e) Can you conclude that the video caused the nurses to wash their hands longer? Why or why not? Lecture Notes to Accompany Applied Business Statistics Page 75 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 10.7 Women who smoke suffer an increased risk of dying of breast cancer, according to a recently published study. In the study about, out of 319,000 women who never smoked there were 468 deaths from breast cancer, whereas out of 120,000 smokers, there were 187 deaths. (a) Calculate the sample proportion of women who died of breast cancer for smokers and non-smokers. (b) Ste up the hypotheses to test whether the proportion of women who die of breast cancer is higher for smokers than non-smokers. (c) At the 0.05 significance level, can you conclude that smoking causes breast cancer? If not, what can you conclude? Lecture Notes to Accompany Applied Business Statistics Page 76 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 10.18 Selling personal computers is big business and consumers are becoming increasingly aware of vendor reputation. A recent study of two vendors of desktop personal computers reports on the units that need repair for Dell Computers and Gateway 2000. Of 1584 computers manufactured by Dell Computer 427 needed repair, whereas for Gateway 2000, 825 of 2662 computers needed repair. (a) Calculate the sample proportion of computers needing repair for each company. (b) Set up the hypotheses to test whether the proportion of computers needing repairs is different for the two companies. (c) At the 0.05 level of significance, what can you conclude? Lecture Notes to Accompany Applied Business Statistics Page 77 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Minitab Solution (No Excel Solution) Lecture Notes to Accompany Applied Business Statistics Page 78 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 10.20 Consider the problem in which the Board of Realtors for Greater Bridgeport, CT, was looking at the average selling prices of homes. The data are given again below: Population 1995 1996 Sample Size 25 25 Sample Mean $151,116 $160,669 Sample Standard Deviation $5,332 $6,468 a) Assuming that the populations are normally distributed, set up the hypotheses to test whether the population variances are equal at the 0.10 level of significance. b) Was the decision to test using the pooled variance justified? Lecture Notes to Accompany Applied Business Statistics Page 79 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 10.21 In your quest for the perfect golf clubs you made an assumption about the population variances when you tested your hypotheses. The data you collected are given below: Population Sample Size Sample Mean Sample Standard Deviation Brand X 15 255 8.7 Brand Z 15 271 9.1 a) Set up the appropriate hypotheses to test whether the variance of Brand Z clubs is the same as the variance for Brand X. b) Assuming that the populations are normally distributed, at the 0.10 level of significance was your decision to pool the variances a good one? c) In general, would a difference in variation between the clubs be a factor in your purchase decision? Lecture Notes to Accompany Applied Business Statistics Page 80 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Procedures for Testing Independence Dependent Variable (y) Quantitiative Qualitative Independent Variable (x) Quantitative Qualitative Lecture Notes to Accompany Applied Business Statistics Regression and Correlation Analysis Discriminant Analysis ANOVA Oneway Twoway Chi-square Page 81 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline EXPERIMENTAL DESIGN AND ANOVA (ANALYSIS OF VARIANCE) 1. Definition of terms o Factor and response variable (428) o ANOVA and treatment (410) 2. Sources of Variance (411) Treatment or Between Groups Variation (a.k.a. explained, factor, treatment) Random or Within Groups Variation (a.k.a. unexplained, random, error) 3. One-way ANOVA (410) Review of variables - quantitative vs. qualitative - dependent vs. independent Using ANOVA as procedure for comparing means of two or more groups Using ANOVA as procedure for determining whether a qualitative independent variable and quantitative dependent variable are related 4. Two-Way ANOVA with Replication – a.k.a. Two-way ANOVA with Interaction (427) Using ANOVA as procedure for comparing means of two or more groups (Factor A and Factor B) Using ANOVA as procedure for determining whether a qualitative independent variable and quantitative dependent variable are related (Factor A and Factor B) Testing the presence of interaction between Factors A and B Lecture Notes to Accompany Applied Business Statistics Page 82 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline ANALYSIS OF VARIANCE (ANOVA) A. ONEWAY ANOVA H0: 1 = 2 =3 = … = t H1: The population means are not all the same Reject H0 if: F > F,v1,v2 Where: v1 = (t-1) v2 = (N - t) B. TWOWAY ANOVA(with replication) 1. Testing for Main Effects (Factor A) H0: 1 = 2 =3 = … = t (No level of factor A has an effect) H1: The population means are not all the same (at least 1 level has an effect) Reject H0 if: MSA/MSE > F,v1,v2 Where: v1 = (a -1) v2 = ab(r -1) 2. Testing for Main Effects (Factor B) H0: 1 = 2 =3 = … = t (No level of factor B has an effect) H1: The population means are not all the same (at least 1 level has an effect) Reject H0 if: MSB/MSE > F,v1,v2 Where: v1 = (b -1) v2 = ab(r -1) 3. Testing for INTERACTION EFFECTS (AB) H0: There are NO interaction effects H1: At least 1 combination of factor A and B levels has an effect Reject H0 if: MSAB/MSE > F,v1,v2 Where: v1 = (a -1)(b - 1) v2 = ab(r -1) Lecture Notes to Accompany Applied Business Statistics Page 83 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Learning it! Exercises 14.1 A diaper company is considering 3 different filler materials for their disposable diapers. Eight diapers were tested with each of the 3 filler materials, and 24 toddlers were randomly given a diaper to wear. As the child played, fluid was injected into the diaper every 10 minutes until the product failed (leaked). The amount of fluid (in grams) at the time of failure was recorded for each diaper. The data are shown below: Material 1 791 789 796 802 810 790 800 790 Material 2 809 818 803 781 813 808 805 811 Material 3 828 814 855 844 847 848 836 873 (a) What is the response variable and what is the factor? (b) How many levels of the factor are being studied? (c) Is there any difference in the average amount of fluid the diaper can hold using the three different filler materials? If so, which ones are different? (d) What is your recommendation to the company and why? Lecture Notes to Accompany Applied Business Statistics Page 84 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline MINITAB OUTPUT Stat > ANOVA > One-way (or One-way (Unstacked)) Results for: Problem 14_1.MTW One-way ANOVA: Grams versus Material Analysis of Variance for Grams Source DF SS MS Material 2 9808 4904 Error 21 3452 164 Total 23 13260 Level 1 2 3 N 8 8 8 Pooled StDev = Mean 796.00 806.00 843.00 StDev 7.50 11.12 17.70 12.82 F 29.83 P 0.000 Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+--------(----*----) (----*----) (----*---) -------+---------+---------+--------800 820 840 EXCEL OUTPUT Tools > Data Analysis > Oneway ANOVA Anova: Single Factor SUMMARY Groups Mat1 Mat2 Mat3 Count 8 8 8 ANOVA Source of Variation Between Groups Within Groups SS df 9864.083 2 3460.875 21 Total 13324.96 23 Lecture Notes to Accompany Applied Business Statistics Sum 6368 6448 6745 Average 796 806 843.125 Variance 56.28571 123.7143 314.4107 MS F P-value 4932.042 29.92679 0.000000711968 164.8036 F crit 3.466795 Page 85 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 14.3 Grading homework is a real problem. It takes an enormous amount of time and many students do not do a very good job or copy answers from other students or the back of the book. A teacher of elementary statistics decided to conduct a study to determine what effect grading homework had on her students’ exam scores. She taught 3 sections of Elementary Statistics and randomly assigned each class one of three conditions: (1) no homework given, (2) homework given, but not collected, and (3) homework give, collected, and graded. After the first exam, she collected the data (exam scores). They are shown in the Excel data file Homework.xls (a) What is the response variable and what is the factor? (b) How many levels of the factor are being studied? (c) Is there any difference in the average amount of fluid the diaper can hold using the three different filler materials? If so, which ones are different? (d) What is your recommendation to the company and why? MINITAB OUTPUT: Results for: Problem 14_3.MTW One-way ANOVA: C2 versus C1 Analysis of Variance for C2 Source DF SS MS C1 2 1700.4 850.2 Error 45 4295.4 95.5 Total 47 5995.8 Level 1 2 3 N 16 16 16 Pooled StDev = Mean 74.500 70.313 84.500 9.770 Lecture Notes to Accompany Applied Business Statistics StDev 11.051 9.016 9.107 F 8.91 P 0.001 Individual 95% CIs For Mean Based on Pooled StDev -------+---------+---------+--------(------*------) (------*------) (------*------) -------+---------+---------+--------70.0 77.0 84.0 Page 86 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 14.16 The manufacturer of batteries is designing a battery to be used in a device that will be subjected to extremes in temperature. The company has a choice of 3 materials to use in the manufacturing process. An experiment is designed to study the life of the battery when it is made from materials A, B, C and is exposed to temperatures of 15, 70, and 125 degree Fahrenheit. For each combination of material and temperature, 4 batteries are tested. The lifetimes in hours of the batteries are shown below: Material A Material B Material C 15F 130 155 74 180 150 188 159 126 138 110 168 160 Temperature 70F 34 40 80 75 126 122 106 115 174 120 150 139 125F 20 70 82 58 25 70 58 45 96 104 82 83 (a) Calculate the average life for each of the material types. (b) Calculate the average life for each of the 3 temperatures. (c) Calculate the average life for each of the 9 treatment groups. (d) Plot the 9 treatment means on a graph with temperature factor on the x axis, and the life of the battery in hours on the y axis. Use different color for each of the 3 materials and connect the averages for those of the same material. What do you speculate about the interaction effect based on the graph? (e) Confirm your suspicions by doing a two-way ANOVA and testing to see if there is a significant interaction effect. (f) What materials do you recommend to this manufacturer and why? Lecture Notes to Accompany Applied Business Statistics Page 87 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline MINITAB OUTPUT Stat > ANOVA > Two-way Lecture Notes to Accompany Applied Business Statistics Page 88 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Interaction Plot Stat > ANOVA > Interaction Plot Lecture Notes to Accompany Applied Business Statistics Page 89 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Excel Solution Data >Data Analysis > Two-way ANOVA (with Replication) Lecture Notes to Accompany Applied Business Statistics Page 90 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 14.20 A manufacturer of adhesive products designed an experiment to compare a new adhesive product to a competitor’s product. The adhesive product, or glue, is used by automobile manufacturers. The response variable was the strength of the glue measured by tensile strength in pounds per square inch (psi). The ability to adhere to oil-contaminated surfaces under different humidity conditions was studied. There were 2 levels for factor A: no oil or oil. Oil contamination was applied by hand dipping the samples in an oil solution and allowing them to air dry at room temperature for 2 hours. There were two levels for factor B: 50% humidity and 90% humidity. Three samples were tested for each of the combinations of factor A and factor B. The tensile values (psi) for the new product are shown below: Humidity 50% 90% No Oil 175 100 175 95 115 85 Oil 43 42 44 95 105 116 (a) Does the product behave significantly differently if the surface is oil contaminated? (b) Does the product behave significantly differently at different humidity levels? (c) Is there any significant interaction effect present? Lecture Notes to Accompany Applied Business Statistics Page 91 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Two-way ANOVA: PSI versus Surface, Humidity Analysis of Variance for PSI Source DF SS Surface 1 7500 Humidity 1 85 Interaction 1 9747 Error 8 4439 Total 11 21772 Surface No oil Oil Mean 124.2 74.2 Humidity 50% 90% Mean 96.5 101.8 MS 7500 85 9747 555 F 13.52 0.15 17.56 P 0.006 0.705 0.003 Individual 95% CI ----------+---------+---------+---------+(--------*--------) (--------*--------) ----------+---------+---------+---------+75.0 100.0 125.0 150.0 Individual 95% CI ---------+---------+---------+---------+-(-----------------*------------------) (------------------*-----------------) ---------+---------+---------+---------+-84.0 96.0 108.0 120.0 Interaction Plot - Data Means for PSI Surface No oil Oil Mean 150 100 50 50% 90% Humidity Lecture Notes to Accompany Applied Business Statistics Page 92 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Analysis of Qualitative Data (Chi-square) 1. The Chi-square test Explained (640) 2. Four Uses of the Chi-square Distribution (a) Testing for goodness-of-fit (641) is a test for comparing a theoretical distribution, such as a Normal, Poisson etc, with the observed data from a sample answers the question: “does the sample come from a specified distribution?” (b) Testing (comparing) proportions of two or more groups (651) (c) Testing whether two categorical (a.k.a. nominal, qualitative, classification) variables are independent (651) (d) Testing the variance of a population (covered in earlier chapter) Chi-square Concepts and Solved Problems are Available Lecture Notes to Accompany Applied Business Statistics Page 93 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline CHI-SQUARE (2) DISTRIBUTION 1. Goodness-of-fit Test H0: The sample comes from a specified distribution H1: The sample does not come from a specified distribution Reject H0 if: 2 > 2 ,(k - 1) (O E ) 2 E k number of categories 2 2. Test of Independence of 2 Categorical Variables (Also used for comparing proportions of 2 or more groups) H0: Variables 1 and 2 are not dependent H1: Variables 1 and 2 are dependent Reject H0 if: 2 > 2 ,(r - 1)(c - 1) (O E ) 2 E r number of rows 2 c number of columns 3. Testing A Population Variance (2) H0: 2 = value H1: 2 value Reject H0 if: 2 > 2 /2,n-1 2 < 2 1/2,n-1 H0: 2 value H1: 2 < value Reject H0 if: 2 < 2 1-,n-1 2 Lecture Notes to Accompany Applied Business Statistics H0: 2 value H1: 2 >value Reject H0 if: 2 > 2 ,n-1 (n 1) s 2 2 Page 94 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Learning it! Exercises 15.1 The administration of a university has been using the following distribution to classify ages of their students: Age Group Less than 18 18 – 19 20 – 24 Older than 24 Estimated % of Student Population 2.7 29.9 53.4 14 A recent student survey provided the following data on age of students: Age Group Less than 18 18 – 19 20 – 24 Older than 24 Frequency 6 118 102 26 Set up a table that compares the expected and observed frequencies for each group. Based on the table, do you think that the data represent the established distribution? Set up the hypothesis for the Chi-square goodness of fit test. Perform the goodness of fit test at the 0.05 significance level. Based on the chi-square test, is the estimated age distribution that the university is correct? Lecture Notes to Accompany Applied Business Statistics Page 95 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 15.2 As part of a survey on the use of Office Suites Software, the company doing the polling wanted to know whether its population was uniformly distributed over the following age distribution: under 25, 25 to 4, 44 and up. The company looked at the data it had collected so far had found the following distribution: Age Group Under 25 25 to 44 45 and up Number of Respondents 73 61 66 200 Based on the data, do you think that the respondents are uniformly distributed over the age categories? Set up the hypothesis to test whether the data are uniformly distributed over the age categories. Find the expected frequency distribution and perform the chi-square goodness of fit test. At the 0.05 level of significance, would you say that the respondents were uniformly distributed over the age groups? Lecture Notes to Accompany Applied Business Statistics Page 96 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 15.6 In an experiment to study the attitude of voters concerning term limitations in Congress, voters in Indiana, Ohio, and Kentucky were polled with the following results: Opinion Support Do Not Support Indiana 82 97 Kentucky 107 66 Ohio 93 74 (a) Set up the hypothesis to test whether the proportion of voters who support congressional term limits is the same for all three states. (b) Calculate the proportion of voters that support congressional term limits for each state individually. Based on these values, do you think there is a difference in the proportions? (c) Calculate the overall proportion of voters who support term limits for Congress. (d) Calculate the expected frequencies for each cell and find the value of the chi-square test statistic. (e) At the 0.05 level of significance, is there a difference in the proportion of voters who support congressional terms limits among the three states? Lecture Notes to Accompany Applied Business Statistics Page 97 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Minitab Stat > Table > Chi-square Test (No Excel Solution) Lecture Notes to Accompany Applied Business Statistics Page 98 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 15.7 In a survey about satisfaction with local phone service, those respondents who rated their current service as excellent and those who rated Poor to Very Poor were asked to classify their current local service provider. The results are given in the table below: Current Service Source Excellent Poor – Very Poor Long Distance 264 1394 Local Phone 444 1318 Type of Company Cable Power TV 131 215 485 431 Cellular Phone 198 572 (a) Set up the hypothesis to test whether the proportion of people who rated their company as excellent is the same for each type of company. (b) Calculate the proportion of people who rate their current phone service as excellent. (c) Calculate the expected frequencies for each cell and find the value of the chi-square test statistic. (d) If you wanted to perform the test at the 0.05 significance level, what would be the critical value of the test? (e) At the 0.05 level of significance, is there a difference in the proportion of people who rate their local phone service as excellent among the different types of companies? Lecture Notes to Accompany Applied Business Statistics Page 99 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Chi-Square Test: Long Distance, Local Phone, Power, CableTV, Cellular Phone Expected counts are printed below observed counts 1 Long Dis Local Ph 264 444 380.74 404.63 Power 131 141.46 CableTV Cellular 215 198 148.35 176.82 Total 1252 2 1394 1277.26 1318 1357.37 485 474.54 431 497.65 572 593.18 4200 Total 1658 1762 616 646 770 5452 Chi-Sq = 35.796 + 3.831 + 10.671 + 1.142 + DF = 4, P-Value = 0.000 Lecture Notes to Accompany Applied Business Statistics 0.773 + 29.947 + 0.230 + 8.927 + 2.536 + 0.756 = 94.610 Page 100 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 15.10 A report by the Department of Justice on rape victims reports on interviews with 3721 victims. The attacks ere classified by age of the victim and the relationship of the victim to the rapist. The results of the study are given here: Age of Victim Under 12 12 to 17 Over 17 Family 153 230 269 Relationship of Rapist Acquaintance or Friend Stranger 167 13 746 172 1232 739 (a) Set up the hypotheses to test whether age of victim and relationship of rapist are independent. (b) Calculate the expected frequencies for each cell. (c) How many degrees of freedom will the chi-square test for independence have? Using a level of significance of 0.01, what is the critical value for the test? (d) Calculate the value of the chi-square test statistic. (e) Is the age of the victim independent of the relationship to the rapist? Lecture Notes to Accompany Applied Business Statistics Page 101 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline MINITAB Stat > Table > Chisquare Test Chi-Square Test: C1, C2, C3 Expected counts are printed below observed counts C1 153 58.35 C2 167 191.96 C3 13 82.69 Total 333 2 230 201.15 746 661.77 172 285.07 1148 3 269 392.50 1232 1291.27 739 556.24 2240 Total 652 2145 924 3721 1 Chi-Sq =153.539 4.136 38.857 DF = 4, P-Value + 3.246 + 58.734 + + 10.720 + 44.849 + + 2.720 + 60.050 = 376.852 = 0.000 Lecture Notes to Accompany Applied Business Statistics Page 102 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 15.11 A company that manufactures cardboard boxes for packaging cereals wants to determine whether type of defect that a particular box has is related to the shift on which it was produced. It compiles the following data. In each case, if a box had multiple defects the most serious defect was recorded. Shift 1 2 3 Printing 55 58 89 Type of Defect Rips/Tears 60 63 63 Size 85 79 48 (a) Set up the appropriate hypotheses for the test. (b) Calculate the expected frequencies for each cell and calculate the value of the chi-square test statistic. (c) How many degrees of freedom will the chi-square test for independence have? (d) Using a level of significance of 0.01, are defect type and shift related? Lecture Notes to Accompany Applied Business Statistics Page 103 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Chi-Square Test: Printing, Rips/Tears, Size Expected counts are printed below observed counts 1 Printing Rips/Tea 55 60 67.33 62.00 Size 85 70.67 Total 200 2 58 67.33 63 62.00 79 70.67 200 3 89 67.33 63 62.00 48 70.67 200 Total 202 186 212 600 Chi-Sq = 2.259 1.294 6.972 DF = 4, P-Value + 0.065 + + 0.016 + + 0.016 + = 0.000 Lecture Notes to Accompany Applied Business Statistics 2.907 + 0.983 + 7.270 = 21.782 Page 104 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Simple Linear Regression and Correlation (454-491) Important Definition of Terms Test of Independence Variables Quantitative (measured) Qualitative (category, classification, nominal) Scatter plot Regression and correlation Linear vs. Curvilinear models Simple vs. Multiple Linear Models Correlation coefficient Coefficient of determination Residual (error) term Observed y vs. expected y Important Symbols Y X Sy/x R2 R a b Problems for Simple Linear Regression: 11.2 (p. 553) Problems for Multiple Linear Regression: Problem 12.1 Problem 12.5 Problem 12.9 Lecture Notes to Accompany Applied Business Statistics Page 105 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Steps in Regression/Correlation Analysis 1. Identify the response (y) and candidate predictor variables (x’s) 2. Collect y,x set of data 3. Plot each x versus y 4. From the plots in #3, select the most promising x 5. Perform Regression and Correlation Analysis a. Select a model (linear or nonlinear) that fits the plot and then generate the regression equation using Excel or Minitab b. Test the resulting model for significance using the slope (), correlation (), or the ANOVA tests (If resulting model is NOT significant, go back to Step 1) c. Test the model for appropriateness using the analysis of residuals. This tests whether the assumptions on the residual are met. These assumptions are: Normal distribution Homoscedastic Indepedent (If selected model is not appropriate, go back to Step 5a, else proceed to Step 7) 6. 7. If model generated in Step 5 is significant but not appropriate, choose a different model (perhaps use curvilinear model) and repeat Step 5 until an appropriate model is found. Use model for estimating: (1) the response variable (y) Point Estimate – substitute the value of x Into the regression equation Interval Estimates: 1. Prediction Interval Estimate 2. Confidence Interval Estimate (2) the actual slope (B) of the line CI = b ( t /2, n-2 ) sb Lecture Notes to Accompany Applied Business Statistics Page 106 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Definitions of Relevant Terms Types of Variables: y – response variable (a.k.a. dependent, predicted, explained) x – independent variable (a.k.a. predictor, explanatory) Regression – provides a ‘best-fit’ mathematical equation for the values of y,x variables -- expresses the relationship of y and x in equation form mathematical equation may be linear or curvilinear linear: Y = a + bX (Direct, linear) Y = a – bX (Inverse, linear) curvilinear: Y = a + bX + cX2 (quadratic) Y = e-X (negative expon.) Y =1/X Simple Linear Regression – regression model is linear with only ONE predictor variable y = b0 + b1X Multipe Linear Regression -- regression model is linear with only TWO OR MORE predictor variables y = b0 + b1x1+ b2x2+ b3x3 + ...+ bKxk Correlation Analysis – measures the strength of the relationship between Y,X coefficient of correlation (r) – number that measures both the direction and the strength of the linear relationship between y and x 1 r 1 coefficient of determination (r2) – the percent of the variation in y that is explained by the regression model 0% r2 100% Lecture Notes to Accompany Applied Business Statistics Page 107 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Simple Linear Model and Assumptions Models Actual Population Model Estimated (sample) Model Y i 0 1 X i yˆ i b0 b1 x Assumptions on the residuals 1) 2) 3) Normally distributed Homoscedastic (constant variance across all x values) Statistically independent of each other Lecture Notes to Accompany Applied Business Statistics Page 108 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Testing the Model for Significance a. Using the Slope ()Test H0: = 0 H1: 0 Reject H0 if t> t /2, n-k-1 t b. Using the coefficient of correlation ()Test H0: = 0 H1: 0 Reject H0 if t> t /2, n-k-1 t c. b sb r 1 r2 n2 Using the ANOVA F -Test H0: the Model is not significant H1: the Model is significant Reject H0 if F > F, v1, v2 F Lecture Notes to Accompany Applied Business Statistics MS Re ression MSError Page 109 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Using the Model for Estimation/Prediction A. Estimating the actual slope () of the model b point estimate of the actual slope () of the model Computing a confidence interval for the actual slope of the model C.I. for = b t /2, n-k-1 (sb) B. Using the model to estimate the actual value of y for a given value of x ŷ point estimate of the actual value of y for a given value of x computed by substituting the value of x into the regression equation Confidence Interval (C.I.) the interval that contains the actual average value of the response variable (y/x) for a specific value of x Prediction Interval (P.I.) the interval that contains the actual value of the response variable (Y) for a specific value of x Lecture Notes to Accompany Applied Business Statistics Page 110 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline SIMPLE LINEAR REGRESSION: A Solved Example EXAMPLE: A manufacturer of small electric motors uses an automatic milling machine to produce the slots in the shaft of the motors. A batch of shafts is run and then checked. All shafts in the batch that do not meet required dimensional tolerances are discarded. At the beginning of each new batch, the milling machine is readjusted since its cutter head wears slightly during the production of the batch. The manufacturer is trying to pick an optimal batch size, but in order to do this (s)he must know how the size of the batch affects the number of defective shafts in the batch. Thirty (30) batches were inspected, and the number of defectives in each batch was counted. The results are shown below: Batchsize 100 125 125 125 150 150 175 175 200 200 200 225 225 225 250 250 250 250 275 300 300 325 350 350 350 375 375 375 400 400 Defects 5 10 6 7 6 7 17 15 24 21 22 26 29 25 34 37 41 34 49 53 54 69 82 81 84 92 96 97 109 112 Lecture Notes to Accompany Applied Business Statistics Page 111 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline INITIAL MODEL (LINEAR) MINITAB SOLUTION A. Plot Batchsize and Number of Defects 100 defects GRAPH > PLOT Graph Variables: X: Batchsiz Y: Defects 50 0 100 200 300 400 batchsiz STAT > REGRESSION > FITTED LINE PLOT Response (Y): Defects Predictor (X): Batchsiz Type of regression model: Linear Regression Plot Y = -47.9007 + 0.367131X R-Sq = 95.3 % defects 100 50 0 100 200 300 400 batchsiz B. Generate the Regression Equation STAT > REGRESSION > REGRESSION Response: Defects Predictors (X): Batchsiz Click on Results: Select In addition, sequential sums of… Click OK Click on Storage Select Fits Select Residuals Lecture Notes to Accompany Applied Business Statistics Page 112 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Select Standardized Residuals Click OK Click OK Least squares regression equation Coefficient of determination Lecture Notes to Accompany Applied Business Statistics Page 113 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Generate the Residual Plots STAT > REGRESSION > Residual Plots Residuals: RESI1 Fits: FITS1 Click OK Residual Model Diagnostics Normal Plot of Residuals I Chart of Residuals 20 1 1 Residual Residual 1 10 0 1 51 10 5 X=0.000 0 2 5 -10 -10 -2 -1 0 1 3.0SL=8.378 1 2 11 1 1 -3.0SL=-8.378 2 30 20 10 0 2 2 2 Normal Score Observation Number Histogram of Residuals Residuals vs. Fits 5 10 Residual Frequency 4 3 2 0 1 -10 0 -10.0-7.5-5.0-2.50.0 2.5 5.0 7.510.012.515.0 Residual Lecture Notes to Accompany Applied Business Statistics 0 50 100 Fit Page 114 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline EXCEL SOLUTION Data > Data Analysis > Regression Input: Input Y range: Defects Input X range: Batchsiz Labels: <select> Output Range: <type in a cell address here> Residuals: Residuals: <do not select> Standard Residuals: <do not select> Residual Plots: < select > Line Fit Plots: <select> Normal Probability: Normal Probability Plots: < select> Lecture Notes to Accompany Applied Business Statistics Page 115 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Analysis of Residual Plots defects Normal Probability Plot 200 100 0 0 20 40 60 80 100 120 Sample Percentile Residuals batchsiz Residual Plot 20.000 0.000 -20.000 0 100 200 300 400 500 batchsiz batchsiz Line Fit Plot 150 defects 100 defects 50 Predicted defects 0 -50 0 200 400 600 batchsiz Lecture Notes to Accompany Applied Business Statistics Page 116 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline REVISED MODEL (NONLINEAR - Quadratic) Minitab Delete all non-empty columns except Defects and Batchsiz Compute C3 = Batchsiz * Batchsiz Calc > Calculator Store result in variable: C3 Expression: Batchsiz*Batchsiz Click OK Name C3 as "Batchsiz^2" STAT > REGRESSION > REGRESSION Response: Defects Predictors (X): Batchsiz Batchsiz^2 Click on Results: Select In addition, sequential sums of… Click OK Click on Storage Select Fits Select Residuals Select Standardized Residuals Click OK Click OK Regression Analysis The regression equation is defects = 6.90 - 0.120 batchsiz +0.000950 batchsiz^2 Predictor Constant batchsiz batchsiz Coef 6.898 -0.12010 0.00094954 S = 2.423 StDev 3.737 0.03148 0.00006059 R-Sq = 99.5% T 1.85 -3.82 15.67 P 0.076 0.001 0.000 R-Sq(adj) = 99.5% Analysis of Variance Source Regression Residual Error Total Source batchsiz batchsiz DF 1 1 DF 2 27 29 SS 34186 159 34345 MS 17093 6 F 2911.35 P 0.000 Seq SS 32744 1442 Lecture Notes to Accompany Applied Business Statistics Page 117 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Residual Model Diagnostics Normal Plot of Residuals I Chart of Residuals 10 Residual Residual 5 4 3 2 1 0 -1 -2 -3 -4 -5 3.0SL=8.287 X=0.000 0 -3.0SL=-8.287 -10 -2 -1 0 1 2 0 Normal Score 8 7 6 5 4 3 2 1 0 0 1 2 20 30 Residuals vs. Fits Residual Frequency Histogram of Residuals -4 -3 -2 -1 10 Observation Number 3 Residual Lecture Notes to Accompany Applied Business Statistics 4 5 5 4 3 2 1 0 -1 -2 -3 -4 -5 0 50 100 Fit Page 118 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline EXCEL SOLUTION Create Batchsiz^2 column (must be adjacent to Batchsiz), where Batchsiz^2 = Batchsiz * Batchsiz Data > Data Analysis > Regression Input: Input Y range: Defects Input X range: highlight Batchsiz Batchsiz^2 range of data Labels: <select> Output Range: <type in a cell address here> Residuals: Residuals: <do not select> Standard Residuals: <do not select> Residual Plots: < select > Line Fit Plots: <select> Normal Probability: Normal Probability Plots: < select> Lecture Notes to Accompany Applied Business Statistics Page 119 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline GENERATING PREDICTION AND CONFIDENCE INTERVALS FOR Y (Minitab) Values for batch and batch^2 to predict defect rates Stat > Regression > Regression > Option Lecture Notes to Accompany Applied Business Statistics Columns where the new values for the predictors variables can be found Page 120 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Predicted values for number of defects (Note: Excel does not have this capability) Lecture Notes to Accompany Applied Business Statistics Page 121 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 11.2 In trying to look at the effects of shopping center expansion, the Commerce Department decided to look at the relationship between the number of shopping centers and the retail sales for different states in the same region,. It collected the data for the North Central states in the US and found the following: State Illinois Indiana Michigan Minnesota Ohio Iowa Missouri Wisconsin South Dakota North Dakota Nebraska Kansas Number of Shopping Centers 2096 905 1018 471 1704 308 887 625 58 87 264 481 Retail Sales ($ billion) 41.8 21.4 25.3 13.9 41.6 7.5 22.7 14.6 1.3 2.1 5.7 11.6 (a) Create a scatter plot of the data. (b) Find the regression equation relating retail sales and number of shopping centers. (c) Plot the regression line on the same plot as the data. Do you think the line fits the data well? Why or why not? (d) Use the regression line to predict retail sales for each state. (e) Calculate the residuals for each state. Which state has the largest residual? Which state has the smallest? Do the residuals support your answer to part (d)? (f) Find the standard error of the estimate. Lecture Notes to Accompany Applied Business Statistics Page 122 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 45 Retail Sales ($billion) 40 35 30 25 Series1 20 15 10 5 0 0 500 1000 1500 2000 2500 Number of Shopping Centers SUMMARY OUTPUT Regression Statistics Multiple R 0.9866955 R Square 0.9735681 Adjusted R Square 0.9709249 Standard Error 2.3387601 Observations 12 ANOVA df 1 10 11 SS 2014.691 54.698 2069.389 Coefficients 1.492612 0.021517 Standard Error 1.071387 0.001121 Regression Residual Total Intercept X Variable 1 Lecture Notes to Accompany Applied Business Statistics MS 2014.691 5.470 F 368.330 Significance F 0.000 t Stat 1.393158 19.191926 P-value 0.193764 0.000000 Lower 95% -0.894588 0.019019 Upper 95% 3.879812 0.024015 Page 123 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 11.3 As part of an international study on energy consumption, data were collected on the number of cars in a country and the total travel in kilometers. The data for 12 of the countries are shown here: Country US Finland Denmark Britain Australia Sweden Netherlands France Norway Italy Germany Japan Total Cars Travel (million) Travel (billion km) 142.35 1.82 1.66 21.32 8.53 3.32 5.53 23.27 1.59 26.12 43.75 40.25 3140.29 34.66 30.76 352.76 138.22 53.21 83.69 348.2 23.54 367.85 608.52 439.30 (a) Create a scatterplot of the data. Do you think that there is a relationship between the number of kilometers traveled and the number of cars? (b) Find the least-squares regression line for the data. Interpret the value of the slope. (c) Does the intercept make sense for these data? Why or why not? (d) Plot the regression line on the same plot with the data. Does the line make you feel confident about predicting travel as a function of the number of cars? (e) Use the regression line to predict the number of kilometers for Sweden and Japan. How well do the predictions agree with the original data? Lecture Notes to Accompany Applied Business Statistics Page 124 of 148 School of Business, SOU Traveled (in biliion Kilometer) BA 282: Applied Business Statistics Course Outline 3500 3000 2500 2000 1500 1000 500 0 -500 0 20 40 60 80 100 120 140 160 Total Cars (in million) SUMMARY OUTPUT Regression Statistics Multiple R 0.98503096 R Square 0.97028599 Adjusted R Square 0.96731458 Standard Error 156.136088 Observations 12 ANOVA 1 10 11 SS 7960585.694 243784.7804 8204370.475 MS 7960586 24378.48 F 326.5415 Significance F 0.0000 Coefficients -106.2068 21.5814 Standard Error 55.1609 1.1943 t Stat -1.9254 18.0705 P-value 0.0831 0.0000 Lower 95% -229.1129 18.9204 df Regression Residual Total Intercept X Variable 1 Lecture Notes to Accompany Applied Business Statistics Upper 95% 16.6992 24.2425 Page 125 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 11.23 How much does advertising impact market penetration? To assess the impact of advertising in the tobacco industry, a study looked at the amount of money spent on advertising a particular brand of cigarettes and brand preference among adolescents and adults. The data are shown here: Brand Marlboro Camel Newport Kool Winston Benson & Hedges Salem Advertising ($ million) 75 43 35 21 17 4 3 Brand Preferences Adolescent Adult (%) (%) 60 23.5 13.3 6.7 12.7 4.8 1.2 3.9 1.2 3.9 1 3.0 0.3 2.5 (a) Look at the data for brand preference for adolescents and amount spent on advertising. Which variable is the dependent variable? Which is the independent variable? (b) Create a scatter plot of advertising and adolescent brand preference. Do you think that there is a linear relationship between the two variables? Why or why not? (c) Now create another scatter plot using adult brand preference instead. How does this plot compare to the one for adolescent brand preference? From the plots, do you think that adolescent or adult brand preference is more strongly related to advertising expenditures? Why? (d) Find the least squares line for adolescent brand and advertising expenditures (e) Interpret the meaning of the slope and intercept of the model. Do they make sense? (f) Use the model to predict adolescent brand preference for each brand studied. How well do the predicted values agree with the actual data? (g) Using a 0.05 significance level, is the model significant? Lecture Notes to Accompany Applied Business Statistics Page 126 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Brand Preference (%) Adolescent Market 70 60 50 40 30 20 10 0 0 10 20 30 40 50 60 70 80 60 70 80 Adevertising ($million) Brand Preference (%) Adult Market 25 20 15 10 5 0 0 10 20 30 40 50 Adevertising ($million) Lecture Notes to Accompany Applied Business Statistics Page 127 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline ADOLESCENT MARKET SUMMARY OUTPUT Regression Statistics Multiple R 0.923547 R Square 0.852939 Adjusted R Square 0.823527 Standard Error 9.063086 Observations 7 ANOVA MS F Regression df 1 2382.011 SS 2382.011 28.99957 Residual 5 410.6976 82.13953 Total 6 2792.709 Coefficients Standard Error Intercept -9.42472 5.365513 -1.75654 0.139344 -23.2172 4.367747 Advertising ($m) 0.786227 0.146 5.385125 0.002978 0.410923 1.161531 t Stat P-value Significance F 0.002978 Lower 95% Upper 95% ADULT MARKET SUMMARY OUTPUT Regression Statistics Multiple R 0.901096 R Square 0.811974 Adjusted R Square 0.774369 Standard Error 3.536488 Observations 7 ANOVA MS F Regression df 1 270.0463 SS 270.0463 21.59205 Residual 5 62.53373 12.50675 Total 6 332.58 Coefficients Standard Error Intercept -0.58794 2.093665 -0.28082 0.790098 -5.96987 4.793986 Advertising ($m) 0.264725 0.05697 4.646724 0.005599 0.118279 0.411172 Lecture Notes to Accompany Applied Business Statistics t Stat P-value Significance F 0.005599 Lower 95% Upper 95% Page 128 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Multiple Linear Regression (560 to 595) Problem 12.1 A group of legislators wanted to look at factors that affect the number of traffic fatalities. They collected some data for 1994 from the NTSB on the number of fatalities for 50 states and the District of Columbia, the number of licensed drivers, the number of registered vehicles, and the number of vehicle miles traveled. A portion of the data is shown on page 584. Full dataset is in traffat.xls SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.982548538 0.96540163 0.963193224 154.5407481 51 ANOVA Regression Residual Total df 3 47 50 SS 31321046.9 1122493.613 32443540.51 MS 10440349 23882.84 F Significance F 437.1485 2.54274E-34 Intercept Licensed Drivers Registered Vehicles Vehicle Miles Travelled Coefficients 51.7481659 0.06294764 -0.211896991 0.029349954 Standard Error 30.43306219 0.048829545 0.055989427 0.003525079 t Stat 1.700393 1.28913 -3.78459 8.326041 P-value 0.095666 0.203662 0.000436 8.34E-11 Upper Lower 95% 95% -9.475200509 112.9715 -0.035284642 0.16118 -0.324533083 -0.09926 0.022258416 0.036441 (a) How many independent variables are there in the model proposed? What are they? (b) Use the computer output to write won the regression model. (c) Interpret the coefficients of the model. (d) Use the model to predict the number of traffic fatalities for the states shown in the data table. Lecture Notes to Accompany Applied Business Statistics Page 129 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline (e) Compare the predicted values from the model to the actual values. Based on the plot, does the model do a good job of predicting the number of traffic fatalities? Lecture Notes to Accompany Applied Business Statistics Page 130 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Problem 12.9 In the problem about number of traffic fatalities the model was rerun, dropping the data on number of licensed drivers that had the lowest t statistic. The output is shown below: Regression Analysis: Traffic Fata versus Registered V, Vehicle Mile The regression equation is Traffic Fatalities = 46.0 - 0.163 Registered Vehicles + 0.0300 Vehicle Miles Travelled Predictor Constant Register Vehicle Coef 46.04 -0.16280 0.029996 S = 155.6 SE Coef 30.32 0.04132 0.003513 R-Sq = 96.4% T 1.52 -3.94 8.54 P 0.135 0.000 0.000 R-Sq(adj) = 96.3% Analysis of Variance Source Regression Residual Error Total DF 2 48 50 SS 31281357 1162183 32443541 MS 15640679 24212 F 645.98 P 0.000 (a) Write down the equation of the new two-variable model. (b) Compare the new model to the model with three variables. How much does the model change when number of licensed drivers is dropped? (c) Compare the value of R2 for both models. What does this make you think about the decision to drop number of licensed drivers from the model? (d) Would you consider a two-variable model a good model? Why or why not? (e) Based on the value of the R2 would you be satisfied with this model or would you want to consider other variables? Lecture Notes to Accompany Applied Business Statistics Page 131 of 148 School of Business, SOU BA 282: Applied Business Statistics BA 282: APPLIED BUSINESS STATISTICS Fall 1999 Midterm Exam Course Outline Name____________________ Part 1: Multiple Choice 1. a. b. c. d. e. Using the Standard Normal distribution, the area between –1.5 and –2.4 is: 0.9250 0.0586 -0.0568 -0.9250 None of the above For questions 2 to 7, select the most appropriate pair of hypotheses for each statement. 2. a. b. c. d. e. The average age of SOU students is more than 21.5 years. H0: 21.5 H1: > 21.5 H0: 21.5 H1: < 21.5 H0: = 21.5 H1: 21.5 H0: 21.5 H1: > 21.5 H0: 21.5 H1: < 21.5 3. a. b. c. d. e. A new medication for headache is touted to relieve pain in less than 5 minutes. H0: 5 H1: > 5 H0: 5 H1: < 5 H0: = 5 H1: 5 H0: 5 H1: > 5 H0: 5 H1: < 5 4. A CPA review program is advertised as “guaranteed to improve your CPA test scores.” Fifty graduating accounting students from a business school were randomly assigned to two groups – group in which students didn’t participate (D) in the review program and the other group participated (P) in the review program. a. H0: P D H1: P > D b. H0: P D H1: P < D c. H0: P = D H1: P D d. H0: P D H1: P > D e. H0: P D H1: P < D 5. A councilperson claims that there is no difference in the level of support to Measure 51 among Republican (R) and Democratic (D) voters in the Rogue Valley. a. H0: R D H1: R > D b. H0: R D H1: R < D c. H0: R = D H1: R D d. H0: R D H1: R < D e. H0: R = D H1: R D 6. a. b. c. d. A filling machine is supposed to fill an average of 12 ounces when operating properly. H0: 12 H1: > 12 H0: 12 H1: < 12 H0: = 12 H1: 12 H0: 12 H1: > 12 Lecture Notes to Accompany Applied Business Statistics Page 132 of 148 School of Business, SOU BA 282: Applied Business Statistics e. H0: 12 Course Outline H1: < 12 7. A right-tailed test of a population mean results in a p-value that is practically zero. This means that the sample represents: a. A weak evidence supporting the null hypothesis b. A weak evidence supporting the alternative hypothesis c. A strong evidence supporting the null hypothesis d. A strong evidence supporting the alternative hypothesis 8. a. b. c. d. Of the following variations, which does not belong? Common cause variation Special cause variation Explained variation Nonrandom variation 9. A confidence interval estimate has two components – a point estimate and a margin of error. Which of these two components is affected by the confidence level? a. Point estimate b. Margin of Error c. None of the above 10. A sample statistic (e.g. sample mean, sample proportion, or sample variance) is a random variable. Which type of random variable is a sample statistic? a. Continuous b. Discrete 11. "The distribution of the sample means of any type of distribution will approximate the normal distribution, as the sample size increases." This sounds like the definition of the a. Standard Normal Distribution b. Z-distribution c. Central Limit Theorem d. Binomial distribution approximated by the Normal distribution e. None of the above 12. a. b. c. d. e. Which of the following does not belong? s p x-bar All of the above belong to the same group 13. a. b. c. d. e. Which of the following is NOT true of a sample mean? It is a point estimate It is a statistic It is a continuous random variable All of the above (a-c) are true of a sample mean None of the above (a-c) are true of a sample mean 14. a. b. c. The two components of a confidence interval estimate of a population parameter are: Confidence Level and Margin of Error Sample Size and Statistic Point Estimate and Margin of Error Lecture Notes to Accompany Applied Business Statistics Page 133 of 148 School of Business, SOU BA 282: Applied Business Statistics d. e. Course Outline Sample Mean and Sample Proportion Margin of Error and Sample Size 15. The conditions in using the Normal distribution as an approximation to the binomial distribution are that np and n(1p) be both at least 5. a. True b. False 16. Which of the following will be the benefit derived from using a larger sample size in estimating an unknown population parameter? a. A larger margin of error b. A smaller margin of error c. A lower confidence level d. A higher confidence level e. (b) and/or (d) f. (a) and/or (c) 17. a. b. c. d. e. Using the Standard Normal distribution table, find the area below the z-score of –2.50. -0.4938 0.4938 0.9938 0.0062 None of the above 18. a. b. c. d. e. Which of the following pairs of hypotheses is NOT correct? H0: 3.5 H1: > 3.5 H0: 3.5 H1: < 3.5 H0: p < 0.035 H1: p > 0.035 All of the above are correct None of the above are correct 19. For a one-tailed test of a population mean the significance level has been set at 0.01. Assume that the population standard deviation is not known, sample size is 10, and the population is normally distributed. What distribution is appropriate for performing the hypothesis test? a. z-test b. t-test c. F-test d. Binomial e. None of the above 20. a. b. c. d. e. In testing the mean of a population, which of the following is a necessary condition for using a t distribution? n is small is not known The population is infinite All of these (a) and (b) but not (c) 21. Assume that you took a sample and calculated the sample mean as 100. You then calculated the lower and upper limit of a 90 percent confidence interval for to be 90 and 110, respectively. What is the margin of error of the estimate? a. 0.10 b. 90 percent c. 20 Lecture Notes to Accompany Applied Business Statistics Page 134 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline d. e. 10 100 22. a. b. c. d. e. A single value used to estimate an unknown population parameter is known as a(n) Point estimate Interval estimate Statistic Parameter (a) and (c) 23. In hypothesis testing, we conclude to reject or fail to reject the null hypothesis by comparing the computed statistic to a critical statistic. Another form of decision rule is by comparing the p-value to a significance level. Which of the following is a correct decision rule? a. Reject H0 if z > p-value b. Reject H0 if z > c. Reject H0 if p-value > d. Reject H0 if p-value < e. All of the above are correct forms of decision rule 24. Which of the following variations cannot be removed but only can be reduced by redesigning or improving the process? a. common cause variation b. special cause variation 25. If n = 45 and = 0.05, then the critical value of z for testing the hypotheses H0: 3.5 and H1: > 3.5 is a. 0.0199 b. 1.96 c. -1.96 d. -1.645 e. 1.645 26. a. b. c. d. e. When a null hypothesis is rejected, it is possible that: A correct conclusion has been made A Type II error has been made A Type I error has been made (a) or (b) is correct (a) or (c) is correct 27. a. b. c. d. e. Which of the following actions will reduce the Type I and II errors simultaneously? Decreasing the significance level of a test Increasing the confidence level of a test Decreasing Beta error Increasing the sample size Decreasing the sample size 28. One concludes whether to “reject” or “fail to reject” the null hypothesis based on a decision rule. The decision rule is nothing more than a comparison of a calculated value and a critical value. Which of the two is based on the significance level of a test? a. Calculated value b. Critical value Lecture Notes to Accompany Applied Business Statistics Page 135 of 148 School of Business, SOU BA 282: Applied Business Statistics 29. a. b. c. d. e. Course Outline Which of the following is NOT true of a critical value It marks the boundary between the “reject H0“ and the “fail to reject to reject H0“ regions It is based on the significance level of a test It is determined from the statistic derived from a sample All of the above are true None of the above are true 30. If one were to perform a hypothesis test using the following hypotheses: H0: shipment is GOOD and H1: shipment is BAD Which of the two types of errors is called the Producer’s risk? a. Type I (alpha) b. Type II (beta) c. Both (a) and (b) d. Neither (a) nor (b) 31. When the sample size as a proportion of the population size (n/N) gets larger, the value of the finite correction multiplier approaches which value? a. 0 b. 1 c. None of the above 32. In statistical process control charts are used as tools to monitor processes. All processes exhibit variability. When NOT in control, the ___________ variability is said to be present: a. common cause variability b. special cause variability c. none of the above 33. A 5-week diet program is claimed be effective in reducing the weights of participants in the program. Skeptical about the claim, you randomly select 10 applicants and weigh each one before and after the 5-week period. This problem involves: a. Comparison of two population proportions b. Comparison of means of two independent samples c. Comparison of means two dependent samples d. Comparison of variations of two independent samples 34. Which of the following sampling distributions would be used in comparing means of two populations, with n1 = 32, n2 = 40? a. Z test b. pooled t test (or equal variances t test) c. unpooled t test (or unequal variances t test) d. Binomial Use for questions 35-37 You work for a market research agency and you were asked to estimate the proportion of people with personal computers who are using Windows 97 as an operating system. How many people will you need to survey for your estimate to be within 2 percentage points of the actual value and be 90 percent confident with this estimate? 35. a. b. c. d. e. The problem described above involves: Testing a hypothesis about a population mean Computing a confidence interval estimate of a population average Computing a sample size to estimate a true population proportion Estimating a confidence interval estimate of a true population mean None of the above Lecture Notes to Accompany Applied Business Statistics Page 136 of 148 School of Business, SOU BA 282: Applied Business Statistics 36. a. b. c. d. e. Course Outline How much is the stated margin of error? 90 percent 10 percent 0.10 2 percentage points 1.645 37. Give the z-value that will be used for computing the 90 percent confidence interval estimate of the true population parameter. a. 1.96 b. 1.32 c. 0.10 d. 2 percentage points e. 1.645 Use for questions 38- 43 C. Garr Smoke claims that no more than 5 percent of the 40-60 male group smoke cigars. Of 2500 males of this age group you recently sampled, 200 said they smoke cigars. At 0.05 significance level, do the refute C. Garr Smoke’s belief? 38. a. b. c. d. e. The sample statistic in this problem is: Population proportion Population mean Sample proportion Sample mean None of the above 39. a. b. c. d. e. In this problem the statement “no more than 5 percent” is: The hypothesized population proportion The hypothesized population mean The sample proportion The sample mean None of the above 40. a. b. c. d. e. State the null and alternative hypotheses of this problem H0: 5 H1: > 5 H0: 5 H1: < 5 H0: = 5 H1: 5 H0: 0.05 H1: > 0.05 H0: 0.05 H1: < 0.05 41. a. b. c. d. e. Identify the critical value for the test (one tail). 0.0199 1.96 -1.96 -1.645 1.645 42. a. b. If the computed value for the test is 6.88, the p-value is almost 1 almost 0 Lecture Notes to Accompany Applied Business Statistics Page 137 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline c. close to the significance level 43. a. b. If the resulting p-value of the test is less than the significance level, then you would conclude to Reject the null hypothesis Fail to reject the null hypothesis Use for questions 44- 47 A torque wrench used in the final assembly of cylinder heads requires a process average of 135 lbs-ft. The process is known to have a standard deviation of 5.0 lbs-ft. For a simple random sample of 45 sample nuts that the machine has recently tightened, the sample average is 137 lbs-ft. Using a 0.05 significance level, determine whether the machine is operating at the desired level. 44. a. b. c. c. The appropriate hypotheses for the problem are: H0: 135 H1: > 135 H0: 135 H1: < 135 H0: = 135 H1: 135 None of the above are correct 45. a. b. c. The appropriate distribution for the test above is t-distribution z-distribution F-distribution 46. a. b. c. d. e. The computed value is -0.40 0.40 -2.68 2.68 None of the above 47. Assuming that the critical value for this problem is 1.645, and another sample produced a computed value of 1.55. For this sample your statistical conclusion is to: a. Reject the null hypothesis and conclude that the process is operating at the desired level b. Accept the null hypothesis and conclude that the process is operating at the desired level c. Reject the null hypothesis and conclude that the process is not operating at the desired level d. Accept the null hypothesis and conclude that the process is not operating at the desired level Use for questions 48- 50 A pharmaceutical company is testing two new compounds intended to reduce blood-pressure levels. The compounds are administered to two different sets of lab animals. In Group 1, 71 of 100 animals tested respond to drug 1 with lower blood-pressure levels. In Group 2, 58 of 90 animals tested respond to drug 2 with lower blood-pressure levels. The company wants to test at the .05 level whether drug 1 is more effective in reducing blood pressure levels than drug 2. 48. a. b. c. d. The problem involves which of the following procedures? Comparison of two population proportions Comparison of means of two populations using dependent samples Comparison of means of two populations using independent samples Comparison of variances of two populations Lecture Notes to Accompany Applied Business Statistics Page 138 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 49. Using the following subscripts for the two groups: 1- drug 1; 2- drug 2, which of the following is the most appropriate pair of hypotheses? a. H0: 1 2 H1: 1 > 2 b. H0: 1 2 H1: 1 < 2 c. H0: 1 2 H1: 1 > 2 d.H0: 1 2 H1: 1 < 2 e.H0: 1 = 2 H1: 1 2 50. a. b. c. d. e. (Bonus) On which day is Thanksgiving celebrated? Monday Tuesday Wednesday Thursday Friday Lecture Notes to Accompany Applied Business Statistics Page 139 of 148 School of Business, SOU BA 282: Applied Business Statistics BA 282: Applied Business Statistics Final Exam -- Spring 1999 Course Outline Name ______________________ 1. Find the value of 2 .05,20 2. Find the value of F.05,2,10 3. a. b. c. d. In ANOVA, we will tend to not reject the null hypothesis of equal population means whenever the calculated F is: small large equal to the critical F none of the above is correct 4. a. b. c. d. e. In a two-way ANOVA, in the xijk = + i + j + ()ijk+ ijk model, the term ()ij represents random error in the sampling process the effect that is due to factor A the effect that is due to factor B the interaction effect between level i of factor A and level j of factor B the level of significance at which the null hypothesis is rejected 5. a. b. c. d. Which of the following is a typical source of internal secondary data for business research? accounting or financial reports sales information production data all of the above For questions 6 to 10, refer to the plot on the right. 6. The equation for the line going through the points would take the form of: a. Y = a + b+ c b. Y = a - bX c. Y = x -1 d. Y = a + bX2 e. None of these is correct 7. In this particular problem, the researcher is trying to predict: a. Quantity demanded based on price b. Price based on quantity demanded c. Both price and quantity demanded d. None of these is correct 8. a. b. c. d. If computed, the sign of b in the equation would be: Either positive or negative Positive Negative None of the above 9. The correlation coefficient of the problem, if computed, could be: a. 1.00 b. 0 Lecture Notes to Accompany Applied Business Statistics Page 140 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline c. -1.0 d. None of the above 10. a. b. c. d. e. Which of the following won't be true for the regression resulting from the data in the plot above? r2 = 100% r=1 sy.x = 0 ANOVA p-value = 0 All of the above are true 11. a. b. c. d. In multiple regression analysis, multicollinearity means: High correlation between the dependent variable and the independent variables A high correlation between the response variable and some independent variables A condition where 2 or more of the independent variables are highly correlated with each other None of the above Use the following regression output to answer questions 12 to 15. The regression equation is sales = 46.5 + 52.6 ad Predictor Constant ad Coef 46.486 52.57 s = 6.837 Stdev 9.885 10.26 R-sq = 76.6% t-ratio 4.70 5.12 p 0.000 0.000 R-sq(adj) = 73.7% Analysis of Variance SOURCE Regression Error Total DF 1 8 9 SS 1226.9 374.0 1600.9 MS 1226.9 46.7 F 26.25 p 0.000 12. Identify the coefficient of determination __________ 13. Write the standard deviation of the slope (sb )____________ 14. Write the standard error of the estimate (sy.x)_________ 15. Identify the slope ______________ 16. List ONE of the 3 assumptions that underlie the simple regression model ___________ 17. Suppose you wished to investigate the effect of consumption of alcohol (Y/N) and a common over-the-counter cold medicine (Y/N) on a person's reaction time. The appropriate statistical procedure for this experiment is: a. Chi-square test of independence b. Analysis of Variance (Two factor) c. Analysis of Variance (One factor) d. Regression analysis e. Discriminant analysis 18. Which of the following is not a linear function? Lecture Notes to Accompany Applied Business Statistics Page 141 of 148 School of Business, SOU BA 282: Applied Business Statistics a. b. c. d. e. Course Outline Y = a + bX Distance = Rate of speed Travel time Total Profit = profit per unit Number units sold Total Cost = Fixed Cost + Unit Variable Cost Quantity Produced All of the above are linear functions 19. There are two main uses of a multiple linear model: 1) for slope analysis, or 2) for estimating the value of Y given a value of X's. For which of the two uses is multicollinearity not a problem? _____________ 20. The _____ interval is the interval that contains the actual average value of the response variable for a specific value of X a. Confidence b. Prediction 21. The _____ interval is the interval that contains the actual value of the response variable for a specific value of X a. Confidence b. Prediction 22. a. b. c. d. e. Residual is also known as: Error Actual Y Estimated Y Observed Y Fitted Y None of the above All of the above PROBLEM 1: Ryerson Coil Pickling manager wishes to know how the level of pickling operation (measured in tons) affects the monthly overtime expense of the plant. He collects data for the last 17 months on actual tonnage processed and overtime cost. He then performs regression analysis on the data. Using the attached output, answer questions 23 to 37. Partial Data: Production (Tons) Overtime Expense Month 29,668 23,577 27,117 $11,000 $8,000 $9,000 1 ... ... 2 19,365 $7,000 3 ... 17 23. What is the response variable in this problem? __________ 24. What is(are) the independent variable(s)? ______________ 25. Give the linear regression equation generated by the 17 observations. __________ 26. Using the regression equation, estimate the plant’s overtime expense for a month where 30,000 tons of steel is planned to be processed. __________ 27. a. b. c. d. The correlation between the response variable and the predictor variables could be best described as: Perfectly positively linear Perfectly negatively linear Positively correlated Negatively correlated 28. For the planned production described in #26, give the 95% interval estimate for the plant’s actual overtime expense. ____________________ Lecture Notes to Accompany Applied Business Statistics Page 142 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 29. How much of the variation in overtime expense is explained by production level? _____________ 30. a. b. c. d. e. For every ton increase in processing level, plant overtime expense salary is expected to: increase by $0.587 increase by $5.87 decrease by $0.587 decrease by $5.87 None of the above 31. If for the year 2000 the plant plans to process 30,000 tons of coils each month, give the 95% interval estimate of the actual average monthly overtime expense. ____________ 32. What is the coefficient of correlation of this model?_____________ 33. Using the slope test set up the null and alternative hypotheses to determine whether the model you identified in #25 is significant. 34. Write the decision rule for the hypotheses in #33 ______________ 35. Identify the computed and critical values corresponding to the decision rule in #34. ________, ________ 36. Based on #35, is the model significant? __________ 37. Compute the 95% confidence interval for the actual change in overtime expense for every ton increase in production level. _____________ PROBLEM 2: In a recent survey of winter 1999 BA 282 students, 41 students from the Medford section and the Ashland sections responded. One of the objectives of the survey was to investigate what factors could possibly affect students’ success in the course's midterm exam. A regression analysis was performed in which 4 explanatory variables were included in the model. The variables were the following: GPA – student’s overall GPA to date 243GRADE – student’s final grade in the prerequisite course, MA 243 243WHEN – the number of terms ago the student took MA 243 WHERE – where the student is currently taking BA 282 0 code for Ashland section, 1 for Medford Use the attached regression output in answering questions 38 to 49 38. What is the dependent variable in this problem? __________ 39. What are the predictor variables? ______________ 40. Give the linear regression equation generated by the 41 observations. __________ 41. Give an estimated midterm grade for a student in the Medford section who earned 3.5 in MA 243 grade a term ago, and currently holds a 3.25 overall GPA. __________ 42. How much of the variation in BA 282 midterm exam can be explained by the regression equation? _____________ 43. a. b. c. For every unit increase in MA 243 grade, BA 282 midterm grade is estimated to increase by .330 units decrease by .330 units increase by .781 units Lecture Notes to Accompany Applied Business Statistics Page 143 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline d. decrease by .781 units e. None of the above 44. Compute the 80% confidence interval for the actual slope of the variable MA 243. ___________ 45. What is the multiple coefficient of determination of this model? __________ 46. Using the ANOVA test set up the null and alternative hypotheses to determine whether the model is significant. 47. Write the decision rule for the hypotheses in #46 ______________ 48. Identify the computed and critical values (use a 0.05 significance level) corresponding to the decision rule in #47 ________, ________ 49. Based on #48, is the model significant? __________ PROBLEM 3: Given a significance level of 0.05, is there a significant difference in the average midterm grades of the students in the 3 sections of BA 282? (output attached) 50. a. b. c. d. e. The most appropriate testing procedure for the problem stated above is: Chi-square test of independence Regression and correlation Discriminant analysis Oneway ANOVA Twoway ANOVA 51. Write the appropriate hypotheses and decision rule for comparing the average test scores of the three sections. 52. Give the computed and critical F values for carrying the test in #51____________, ____________ 53. At 0.05 significance level, which of the three sections has the largest average test scores (note: your answer here should be consistent with your answer to #51 and 52)? _______ PROBLEM 4: A test was conducted to determine if grade in MA243, or when MA 243 was taken, has any effect on the midterm grades in BA 282. Also of interest in the survey was to determine whether grade in MA 243 and the time it was taken have some interaction effect on the BA 282 midterm grades. But before the ANOVA test was conducted, the raw data were recoded MA 243 grades were re-classified into two groups – A’s and Non A’s. The term it was taken was also reclassified into two groups – one term ago and two or more terms ago. Grades in BA 282 midterm (not changed) are in 4 to 0 scale, representing A to F letter grades. Also, since twoway ANOVA with replication requires that each cell contain equal samples, six students from combination of MA 243 grade and term group were randomly selected to fill the cells – the resulting crosstabulation of the observations is shown below: MA 243 Grade One Term Ago A Non A's When MA 243 was Taken Two or More Ago 3,4,3,2,4,3 2,3,3,2,4,3 3,0,2,1,2,3 1,1,1,2,1,1 Use 0.05 significance level for the following tests. 54. Write the appropriate hypotheses for testing whether MA 243 grade has an effect on BA 282 midterm exam. ________________ Lecture Notes to Accompany Applied Business Statistics Page 144 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline 55. Does MA 243 grade have a significant effect on BA 282 midterm grades? If yes, which grade category performs better? _____________ 56. Write the appropriate hypotheses for testing whether when MA 243 was taken has an effect on BA 282 midterm exam. ________________ 57. Does the time when MA 243 was taken have a significant effect on BA 282 midterm grade? If yes, which time has higher average midterm grades? _____________ 58. Is there a significant interaction between MA 234 grade and the time when it was taken on BA 282 midterm scores? 59 and 60 BONUS Lecture Notes to Accompany Applied Business Statistics Page 145 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline PROBLEM 1: MTB > Regress 'P_overt' 1 'Ton_prod'; SUBC> Predict 30000. Regression Analysis The regression equation is P_overt = - 6776 + 0.587 Ton_prod Predictor Constant Ton_prod Coef -6776 0.5868 S = 2566 StDev 4178 0.1760 R-Sq = 42.6% T -1.62 3.33 P 0.126 0.005 R-Sq(adj) = 38.7% Analysis of Variance Source Regression Residual Error Total DF 1 15 16 SS 73217721 98782279 172000000 MS 73217721 6585485 F 11.12 P 0.005 Predicted Values Fit 10828 StDev Fit 1306 ( 95.0% CI 8044, 13611) ( 95.0% PI 4691, 16965) PROBLEM 2: MTB > Regress 'MIDTERM' 4 'GPA' '243GRADE' '243WHEN' 'WHERE'; SUBC> Constant; SUBC> Brief 2. The regression equation is MIDTERM = - 0.92 + 0.781 GPA + 0.330 243GRADE - 0.242 243WHEN + 0.601 WHERE 41 cases used 5 cases contain missing values Predictor Constant GPA 243GRADE 243WHEN WHERE S = 0.9111 Coef -0.923 0.7813 0.3302 -0.2421 0.6011 StDev 1.020 0.3569 0.1977 0.1339 0.3957 R-Sq = 40.7% T -0.90 2.19 1.67 -1.81 1.52 P 0.371 0.035 0.104 0.079 0.137 R-Sq(adj) = 34.1% Analysis of Variance Source DF Lecture Notes to Accompany Applied Business Statistics SS MS F P Page 146 of 148 School of Business, SOU BA 282: Applied Business Statistics Regression Residual Error Total Problem 3 4 36 40 Course Outline 20.5079 29.8823 50.3902 5.1270 0.8301 Analysis of Variance for MIDTERM Source DF SS MS SECTION 2 4.48 2.24 Error 42 49.17 1.17 Total 44 53.64 Level MW TR-A TR-M N 12 21 12 Pooled StDev = Mean 1.750 2.000 2.583 StDev 1.055 1.225 0.793 1.082 F 1.91 6.18 0.001 P 0.160 Individual 95% CIs For Mean Based on Pooled StDev --+---------+---------+---------+---(---------*----------) (-------*-------) (---------*----------) --+---------+---------+---------+---1.20 1.80 2.40 3.00 Problem 4 Two-way Analysis of Variance Analysis of Variance for MIDTERM Source DF SS MS ma243 1 13.500 13.500 ma243whe 1 1.500 1.500 Interaction 1 0.167 0.167 Error 20 13.333 0.667 Total 23 28.500 ma243 A B,C,D Mean 3.00 1.50 ma243whe One Term Two and Mean 2.50 2.00 F 20.25 2.25 0.25 P 0.000 0.149 0.623 Individual 95% CI ----+---------+---------+---------+------(-------*-------) (-------*-------) ----+---------+---------+---------+------1.20 1.80 2.40 3.00 Individual 95% CI ---+---------+---------+---------+-------(------------*-----------) (-----------*-----------) ---+---------+---------+---------+-------1.60 2.00 2.40 2.80 Lecture Notes to Accompany Applied Business Statistics Page 147 of 148 School of Business, SOU BA 282: Applied Business Statistics Course Outline Average BA 282 Midterm Grades Interaction Plot - Means for MIDTERM MA 243 Grade A 3.2 B,C,D 2.2 1.2 One Term Ago Two and More Ago When MA 243 Was Taken Lecture Notes to Accompany Applied Business Statistics Page 148 of 148 School of Business, SOU