Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Psychometrics wikipedia , lookup
Degrees of freedom (statistics) wikipedia , lookup
Foundations of statistics wikipedia , lookup
Confidence interval wikipedia , lookup
History of statistics wikipedia , lookup
Bootstrapping (statistics) wikipedia , lookup
Taylor's law wikipedia , lookup
Regression toward the mean wikipedia , lookup
German tank problem wikipedia , lookup
Misuse of statistics wikipedia , lookup
Lecture Notes for Applied Business Statistics A Training Program for BCBS Professor Ahmadi, Ph.D. Chapter 1 Glossary of Terms: Statistics Data Data Set Elements Variable Observations Sample and Population Descriptive Statistics Statistical Inference Qualitative and Quantitative Data Scales of Measurement: Nominal Scale Ordinal Scale Interval Scale Ratio Scale Chapter 2 Summarizing Quantitative Data Problem 1. Daily earnings of a sample of twelve individuals are shown below: 100, 126, 138, 142, 148, 150, 168, 182, 191, 193, 195, 199 Summarize the above data by constructing: a. b. c. d. e. f. a frequency distribution a cumulative frequency distribution a relative frequency distribution a cumulative relative frequency distribution a histogram an ogive Class 100 - 119 120 - 139 140 - 159 160 - 179 180 - 199 frequency cumulative frequency relative frequency cumulative relative frequency DOT PLOT Problem 2. In a recent campaign, many airlines reduced their summer fares in order to gain a larger share of the market. The following data represent the prices of round-trip tickets from Atlanta to Boston for a sample of nine airlines: 120 140 140 160 160 160 160 180 180 Construct a dot plot for the above data. STEM-AND-LEAF DISPLAY Problem 3. The test scores of 14 individuals on their first statistics examination are shown below: 95 75 87 63 52 92 43 81 77 83 84 91 78 88 a. Construct a stem-and-leaf display for these data. Professor Ahmadi’s Lecture Notes Page 3 b. What does the above stem-and-leaf show? Professor Ahmadi’s Lecture Notes Page 4 CROSSTABULATION Problem 4. The following is a crosstabulation of starting salaries (in $1,000's) of a sample of business school graduates by their gender. Starting Salary Gender Less than 30 30 up to 35 35 and more Total Female 12 84 24 120 Male 20 48 12 80 Total 32 132 36 200 a. What general comments can be made about the distribution of starting salaries and the gender of the individuals in the sample? b. Compute row percentages and comment on the relationship between starting salaries and gender. SCATTER DIAGRAM Problem 5. The average grades of 8 students in professor Ahmadi’s statistics class and the number of absences they had during the semester are shown below: Student Number of Absences (x) Average Grade (y) 1 2 3 4 5 6 7 8 1 2 2 1 3 4 8 3 94 78 70 88 68 40 30 60 Develop a scatter diagram for the relationship between the number of absences (x) and their average grade (y). Professor Ahmadi’s Lecture Notes Page 5 Chapter 3 Formulas Ungrouped Data SAMPLE POPULATION Mean Xi X n where n = sample size Xi N where N = size of population Interquartile Range IQR = Q3 - Q1 (Same as for sample) where: Q3 = third quartile (i.e., 75th percentile) Q1 = first quartile (i.e., 25th percentile) S2 X i X Variance 2 2 n -1 or: X i 2 N or: S 2 X 2i nX n -1 2 X 2i N 2 N 2 Standard Deviation 2 S S2 Coefficient of Variation (C.V.) C.V. 100 S C.V. 100 X Covariance S xy ( X i X)(Yi Y) n 1 Professor Ahmadi’s Lecture Notes XY ( X i X )(Yi Y ) N Page 6 Pearson Product Moment Correlation Coefficient rXY SAMPLE S XY S XSY POPULATION XY where r XY = Sample correlation coefficient S XY = Sample covariance SX = Sample standard deviation of X S Y = Sample standard deviation of Y XY XY where XY = Population correlation coefficient XY = Population covariance X Population standard deviation of X Y Population standard deviation of Y Weighted Mean X w i Xi wi w i Xi wi fi M i N where Xi = data value i wi = weight for data value i Grouped Data X Mean fi M i n where fi = frequency of class i Mi = midpoint of class i S 2 fi M i X Variance 2 fi M i ) N 2 2 n 1 or f i M i nX S n 1 2 2 2 Professor Ahmadi’s Lecture Notes 2 f i M i N N 2 2 Page 7 Chapter 3 Measures of Location & Dispersion (Ungrouped Data) Problem 1. Hourly earnings (in dollars) of a sample of eight employees of Ahmadi, Inc. is shown below: Individual Earning (X) 1 12 2 15 3 15 4 17 5 18 6 19 7 22 8 26 I. Measures of location a. Compute the mean and explain and show its properties. b. Determine the median and explain its properties. c. Determine the 70th percentile. d. Determine the 25th percentile. e Find the mode. Professor Ahmadi’s Lecture Notes Page 8 II. Compute the following measures of dispersion for the above data: a. Range b. Interquartile range c. Variance & the Standard deviation d. Coefficient of variation e. A sample of Chatt, Inc. employees had a mean of $21 and a standard deviation of $5. Which company shows a more dispersed data distribution? f. Use “Descriptive Statistics” in Excel and determine all the statistical measures. Professor Ahmadi’s Lecture Notes Page 9 Chapter 3 Five-Number Summary Problem 2. The weights of 12 individuals who enrolled in a fitness program are shown below: Individual 1 2 3 4 5 6 7 8 9 10 11 12 Weight (Pounds) 100 105 110 130 135 138 142 145 150 170 240 300 a. Provide a five-number summary for the data. b. Show the box plot for the weight data. Professor Ahmadi’s Lecture Notes Page 10 Chapter 3 Covariance & Coefficient of Correlation Problem 3. The average grades of a sample of 8 students in professor Ahmadi’s statistics class and the number of absences they had during the semester are shown below. Student Number of Absences ( X i) Average Grade ( Yi ) 1 2 3 4 5 6 7 8 TOTAL 1 2 2 1 3 4 8 3 24 94 78 70 88 68 40 30 60 528 a. Compute the sample covariance and interpret its meaning. b. Compute the sample coefficient of correlation and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 11 Chapter 3 Weighted Mean Problem 4. The M&A Oil Company has purchased barrels of oil from several suppliers. The purchase price per barrel and the number of barrels purchased are shown below. Supplier A B C D Price Per Barrel ($) 55 49 48 50 Number of Barrels 4,000 3,000 9,000 20,000 Compute the weighted average price per barrel. Professor Ahmadi’s Lecture Notes Page 12 Chapter 3 Measures of Location & Dispersion (Grouped Data) Problem 5. The yearly income distribution for a sample of 30 Ahmadi, Inc. employees is shown below. Yearly Income Frequency (In $10,000) fi 4- 6 7- 9 10 - 12 13 - 15 16 - 18 Totals 2 6 7 10 5 n = 30 a. Compute the mean yearly income. b. Compute the variance and the standard deviation of the sample. c. A sample of Chatt, Inc. employees had a mean income of $132,000 with a standard deviation of $36,000. Which company shows a more dispersed income distribution? Professor Ahmadi’s Lecture Notes Page 13 Chapter 4 Formulas Counting Rule for Multiple-step Experiments: Total number of outcomes = n1 n 2 n k The number of Combinations of N objects taken n at a time: N N! n n! N - n! Sum of the probability of Event A and its Complement: P(A) + P(Ac) = 1.0 Addition Law (the probability of the union of two events): P(A B) = P(A) + P(B) - P(A B) Multiplication Law (the probability of the intersection of two events): P(A B) = P(A) P(B|A) or P(A B) = P(B) P(A|B) Two Events A and B are Independent if: P(A|B) = P(A) or Multiplication Law for Independent Events: P(B|A) = P(B) P(A B) = P(A) P(B) Conditional Probability: P(A|B) = P(A B) P(B) or P(B|A) = P(A B) P(A) Bayes' Theorem in General: P(Ai|B) = P(A i ) P(B|A i ) P(A 1 ) P(B|A 1 ) + P(A 2 ) P(B|A 2 ) +...+ P(A n ) P(B|A n ) Summary of Bayes' Theorem Calculations: Event Prior Probabilities P(Ai) Professor Ahmadi’s Lecture Notes Conditional Probabilities P(B|Ai) Joint Probabilities P(Ai B) Posterior Probabilities P(Ai|B) Page 14 Chapter 4 - Basic Probability Concepts Problem 1. Assume you have applied to two different universities (let's refer to them as universities A and B) for your graduate work. In the past, 25% of students (with similar credentials as yours) who applied to university A were accepted; while university B had accepted 35% of the applicants (Assume events are independent of each other). a. What is the probability that you will be accepted in both universities? b. What is the probability that you will be accepted to at least one graduate program? c. What is the probability that one and only one of the universities will accept you? d. What is the probability that neither university will accept you? Problem 2. An individual has applied to two different insurance companies for health insurance coverage. The probability that company A will approve her application is 0.63, and the probability that company B will approve her application is 0.55. The probability that both companies will approve her application is 0.3465. a. What is the probability that company A will approve her application, given that company B has approved her application? b. Are the approval outcomes independent events? Explain; and using the probability concepts, substantiate your answer. c. Are the approval outcomes mutually exclusive? Explain;` and using the probability concepts, substantiate your answer. c. What is the probability that her application will be approved by at least one of the companies? Professor Ahmadi’s Lecture Notes Page 15 Chapter 4 - Conditional Probability Problem 3. A research study investigating the relationship between smoking and heart disease in a sample of 500 individuals provided the following data: Record of Heart Disease No Record of Heart Disease Total Smoker 50 Nonsmoker 40 Total 90 100 310 410 150 350 500 a. Show the joint probability table. b. What is the probability that an individual is a smoker and has a record of heart disease? c. Compute and interpret the marginal probabilities. d. Given that an individual is a smoker, what is the probability that this individual has heart disease? e. Given that an individual is a nonsmoker, what is the probability that this individual has heart disease? f. Does the research show that heart disease and smoking are independent events? Use probabilities to justify your answer. g. What conclusion would you draw about the relationship between smoking and heart disease? Professor Ahmadi’s Lecture Notes Page 16 Chapter 4 BAYES' THEOREM Problem 4. When Ahmadi, Inc. sets up their drill press machine, 70% of the time it is set up correctly. It is known that if the machine is set up correctly it produces 90% acceptable parts. On the other hand, when the machine is set up incorrectly, it produces 20% acceptable parts. One item from the production is selected and is observed to be acceptable. a. What is the probability that the machine is set up correctly? That is, we are interested in computing: P(Correct set up Acceptable part). Let the following symbols represent the various events: E1 = Correct set up E2 = Incorrect set up G = Good part (i.e., Acceptable part) With the above notations we want to determine P(E1 G). b. Compute all the posterior probabilities. Professor Ahmadi’s Lecture Notes Page 17 Chapter 5 Formulas Required Conditions for a Discrete Probability Function f(x) > 0 f(x) = 1 Discrete Uniform Probability Function f(x) = 1/n where n = the number of values the random variable may assume Expected Value of a Discrete Random Variable E(x) = µ = (x f(x)) Variance of a Discrete Random Variable Variance (x) = 2 = (x - µ) 2 f(x Number of Experimental Outcomes Providing Exactly x Successes in n Trials n = x where n! x!( n - x )! n! = n (n - 1) (n - 2) . . . (2)(1) (Remember: 0! = 1) Binomial Probability Function f(x) = n! p x (1 - p) n – x x!( n - x )! where x = 0 ,1, 2, ..., n The Mean of a Binomial Distribution µ=np The Variance of a Binomial Distribution 2 = n p (1 - p) Professor Ahmadi’s Lecture Notes Page 18 Chapter 5 Discrete Probability Distributions Problem 1. The manager of the university bookstore has kept records of the number of diskettes sold per day. She provided the following information regarding diskettes sales for a period of 60 days: Number of Diskettes Sold 0 1 2 3 4 5 Number of Days 6 9 12 18 12 3 a. Identify the random variable b. Is the random variable discrete or continuous? c. Develop a probability distribution for the above data. d. Is the above a proper probability distribution? e. Develop a cumulative probability distribution. f. Determine the expected number of daily sales of diskettes. g. Determine the variance and the standard deviation. h. If each diskette yields a net profit of 50 cents, what are the expected yearly profits from the sales of diskettes? Professor Ahmadi’s Lecture Notes Page 19 Chapter 5 Introduction to Binomial Distribution Problem 2. A production process has been producing 10% defective items. A random sample of four items is selected from the production process. a. What is the probability that the first 3 selected items are non-defective and the last item is defective? b. If a sample of 4 items is selected, how many outcomes contain exactly 3 non-defective items? c. What is the probability that a random sample of 4 contains exactly 3 non-defective items? d. Determine the probability distribution for the number of non-defective items in a sample of four. e. Determine the expected number (mean) of non-defectives in a sample of four. f. Find the standard deviation for the number of non-defectives. Professor Ahmadi’s Lecture Notes Page 20 Chapter 5 POISSON PROBABILITY DISTRIBUTION Problem 3. During the registration period, students consult their advisor for course selection. A particular advisor noted that during each half hour an average of eight students came to see him for advising. a. What is the probability that during a half hour period exactly four students will consult him? b. What is the probability that during a half hour period less than three students will consult him? c. What is the probability that during an hour period ten students will consult him? d. What is the probability that during an hour and fifteen minute period thirty students will consult him? Professor Ahmadi’s Lecture Notes Page 21 Chapter 6 Formulas Uniform Probability Density Function for a Random Variable x: 1 b - a f(x) = 0 for a x b elsewhere Mean and Variance of a Uniform Continuous Probability Distribution: a + b 2 = 2 (b - a) 2 12 The Z Transformation Formula: z= (x - ) Solving for x using the Z transformation formula: x Z Professor Ahmadi’s Lecture Notes Page 22 Chapter 6 - Continuous Probability Distributions I. - The Uniform Distribution Problem 1. The driving time for an individual from her home to her work is uniformly distributed between 300 to 480 seconds. a. Give a mathematical expression for the probability density function. b. Compute the probability that the driving time will be less than or equal to 435 seconds. c. Determine the probability that the driving time will be exactly 400 seconds. d. Determine the expected driving time. e. Determine the standard deviation of the driving time. Professor Ahmadi’s Lecture Notes Page 23 Chapter 6 II. - The Normal Distribution Problem 2. Given that Z is the standard normal random variable, give the probabilities associated with the following: a. P(Z < - 2.09) = ? b. P(Z > -0.95) = ? c. P(-2.55 < Z < -2.33) = Problem 3. ? Z is a standard normal variable. Find the value of Z in the following: a. The area between -Z and zero is 0.4929. b. The area to the right of Z is 0.0192. Z = c. The area between -Z and Z is 0.668. Professor Ahmadi’s Lecture Notes Z= ? ? Z= ? Page 24 Problem 4. The weight of certain items produced is normally distributed with a mean weight of 60 ounces and a standard deviation of 8 ounces. a. What percentage of the items will weigh between 50.4 and 72 ounces? b. What percentage of the items will weigh between 42 and 52 ounces? c. What percentage of the items will weigh at least 74.4 ounces? d. What are the minimum and the maximum weights of the middle 60% of the items? Professor Ahmadi’s Lecture Notes Page 25 Problem 5 Sun Love grapefruit growers have determined that the diameter of their grapefruits is normally distributed with a mean of 4.5 inches and a standard deviation of 0.3 inches. (You can find the step-by-step solution to this problem in my workbook.) a. What is the probability that a randomly selected grapefruit will have a diameter of at least 4.14 inches? b. What percentage of grapefruits has a diameter between 4.8 to 5.04 inches? c Sun Love packs their largest grapefruits in a special package called "Super Pack." If 5% of all their grapefruits are packed in "Super Packs," what is the smallest diameter of the grapefruits, which are in the "Super Packs?" d In this year's harvest, there were 111,500 grapefruits, which had a diameter over 5.01 inches. How many grapefruits has Sun Love harvested this year? Professor Ahmadi’s Lecture Notes Page 26 Problem 6. In grading eggs, 30% are marked small, 45% are marked medium, 15% are marked large, and the rest are marked extra-large. If the average weight of the eggs is normally distributed with a mean of 3.2 ounces and a standard deviation of 0.6 ounces: a What are the smallest and the largest weights of the medium size eggs? b What is the weight of the smallest egg, which will be in the extra-large category? Professor Ahmadi’s Lecture Notes Page 27 Chapter 7 Formulas SAMPLING AND SAMPLING DISTRIBUTIONS The number of different simple random samples of size n that can be selected from a finite population of size N: N! n! N - n! FINITE POPULATION INFINITE POPULATION Expected Value of x E( x ) = E( x ) = where: E( x ) = the expected value of the random variable x = the population mean x Nn N 1 n Standard Deviation of the Distribution of x Values (Standard Error of the Mean) x n Z Score Z X X where: X n Expected Value of p E( p ) = p E( p ) = p where: E( p ) = the expected value of the random variable p p = the population proportion Standard Deviation of the Distribution of p Values (Standard Error of the Proportion) p = N - n N - 1 p (1 - p) n p = p (1 - p) n Z Score Z pp p where: p = Professor Ahmadi’s Lecture Notes p (1 - p) n Page 28 Chapter 7 SAMPLING AND SAMPLING DISTRIBUTIONS Problem 1. Consider a population of four weights identical in appearance but weighing 2, 4, 6, and 8 grams. The mean () and the standard deviation () of the population can be computed to be 5 and 2.236 grams respectively. ( X ) 2 X 2 4 6 8 Samples of size two (with replacement) are drawn from this population. Show the Sampling Distribution of X . The list of all possible samples and the sample means can be shown as: Possible Samples Sample Means 2&2 2 2&4 3 2&6 4 2&8 5 4&2 3 4&4 4 4&6 5 4&8 6 6&2 4 6&4 5 6&6 6 6&8 7 8&2 5 8&4 6 8&6 7 8&8 8 The frequency of each mean can be shown as follows: Possible Sample Means 2 3 4 5 6 7 8 Professor Ahmadi’s Lecture Notes Frequency 1 2 3 4 3 2 1 Page 29 Chapter 7 SAMPLING DISTRIBUTION OF X Problem 2. The average yearly starting salary (µ) of MBA’s is $60,000 with a standard deviation () of $16,000. A random sample of 64 MBAs is selected. a. Show the sampling distribution of the sample means. b. What is the probability that the sample mean will be greater than $56,000? SAMPLING DISTRIBUTION OF P Problem 3. Twenty percent of the students at UTC are business majors. A random sample of 100 students is selected. a. Show the sampling distribution of the sample proportions b. What is the probability that the sample proportion (the proportion of business majors) is between 0.1 and 0.3? c. What is the probability that the sample proportion (the proportion of business majors) is more than 0.25? Professor Ahmadi’s Lecture Notes Page 30 Chapter 8 Formulas I. Interval Estimation of a Population Mean () A. B. When the standard deviation of the population is known, x Z 2 n where the standard error of the mean is x n and the margin of error = Z 2 n When the standard deviation is unknown, S x t 2 n S where the standard error of the mean is S x n S and the margin of error = t 2 n n Z 2 2 Sample Size for an Interval Estimate of a Population Mean 2 2 E where E = the desired margin of error II. Interval Estimation of a Population Proportion (P) p p Z 2 p 1 p n where the standard error of proportion is S p and the margin of error = Z 2 p 1 p n p 1 p n Sample Size for an Interval Estimate of a Population Proportion 2 * * Z 2 p 1 p n 2 E If the value of p* is not known and a good estimate of p* is not available, use p* = 0.50. Professor Ahmadi’s Lecture Notes Page 31 Chapter 8 – Interval Estimation I. Interval Estimation of a Population Mean () A. The standard deviation of the population ( ) is known: Problem 1. In order to estimate the average electric usage per month, a sample of 169 houses was selected; and their electric usage was determined. a. Assume a population standard deviation of 260-kilowatt hours. Determine the standard error of the mean. b. With a 0.90 probability, what can be said about the size of the sampling error? c. If the sample mean is 1834 KWH, what is the 90% confidence interval estimate of the population mean? B. The standard deviation of the population ( ) is unknown: Problem 2. Chattanooga Paper Company makes various types of paper products. One of their products is a 30 mils thick paper. In order to ensure that the thickness of the paper meets the 30 mils specification, random cuts of paper are selected and the thickness of each cut is measured. A sample of 256 cuts had a mean thickness of 30.3 mils with a standard deviation of 4 mils. a. Develop a 95% confidence for the thickness of the paper. b. The company considers the production in control if the thickness does not deviate from the desired 30 mils by more than + 3%. Is the production in control? Explain. Professor Ahmadi’s Lecture Notes Page 32 Problem 3. The cost of a roll of camera film (35 mm, 24 exposure) in a sample of 12 cities worldwide is shown below. City Rio de Janeiro Stockholm Tokyo Moscow Paris London New York Mexico City Sydney Honolulu Cairo Hong Kong Cost (in dollars) 12.14 7.47 6.56 5.69 5.62 5.41 4.33 4.00 3.62 3.43 3.40 2.73 a. Using Excel, compute the basic descriptive statistics (the mean, the median, the mode, the standard deviation, and the standard error of the mean) for the cost of film. b. Determine a 95% confidence interval for the population mean. II. Interval Estimation of a Population Proportion (P) Problem 4. Many people who bought Xbox gaming systems, have complained about having received defective systems. In a sample of 1200 units sold, 18 units were defective. a. Determine a 95% confidence interval for the percentage of defective systems. b. If 1.5 million Xboxes were sold, determine an interval for the number of defectives. Professor Ahmadi’s Lecture Notes Page 33 Chapter 9 Formulas I. Hypothesis Tests about a Population Mean () A. The standard deviation of the population ( ) is known: Test Statistic: Z X 0 n Decision Rule for P-Value Approach: In All Cases Reject Ho if P-Value Decision Rule for Critical Value Approach: Lower One-Tailed Test of the Form Upper One-Tailed Test of the Form Two-Tailed Test of the Form Ho: > o Ho: < o Ho: = o Ha: < o Ha: > o Ha: o Reject Ho if: Z -Z Reject Ho if: Z Z Reject Ho if: Z -Z/2 or Z Z/2 B. The standard deviation of the population ( ) is unknown: X o Test Statistic: t S n The decision rules are the same as those shown in Part A (above) with the t statistic substituted for the Z statistic. Professor Ahmadi’s Lecture Notes Page 34 CHAPTER FORMULAS (Continued) II. Hypothesis Tests about a Population Proportion (P) Test Statistic: Z p po p where p = p o (1 - p o ) n thus Z will have the form: Z p po p o (1 p o ) n Decision Rule for P-Value Approach: In All Cases Reject Ho if P-Value Decision Rule for Critical Value Approach: Lower One-Tailed Test of the Form Upper One-Tailed Test of the Form Two-Tailed Test of the Form Ho: p > po Ho: p < po Ho: p= po Ha: p < po Ha: p> po Ha: p po Reject Ho if: Z -Z Professor Ahmadi’s Lecture Notes Reject Ho if: Z Z Reject Ho if: Z -Z/2 or Z Z/2 Page 35 Chapter 9 HYPOTHESIS TESTING PROCEDURE Assume we are interested in testing whether or not the mean of the population is 70. Then the null and the alternative hypotheses can be written as: Ho: µ = 70 Ha: µ 70 Possible hypothesis-testing errors will be: SITUATION IN THE POPULATION DECISION Do not reject Ho (Conclude µ = 70) Reject Ho (Conclude µ 70) Ho is true (µ = 70) Ho is false (µ 70) Correct Decision Type II Error Type I Error Correct Decision Steps of Hypothesis Testing Step 1: Develop the null and the alternative hypotheses. Step 2:S Specify the level of significance. Step 3: Compute the test statistic (t or Z) from the sample data. Rejection Rule: p-Value Approach Step 4: Compute the p-value by using the test statistic (t or Z) from step 3. Step 5: Reject Ho if p-value . Rejection Rule: Critical Value Approach Step 4: Determine the critical value(s) of t or Z at the specified level of significance and set up the rejection rule. Step 5: Compare the test statistic from step 3 to that of the critical value(s) from step 4. If the test statistic is beyond the critical value(s), reject the null hypothesis. Professor Ahmadi’s Lecture Notes Page 36 Chapter 9 I. Hypothesis Tests about a Population Mean () A. THE STANDARD DEVIATION OF THE POPULATION IS KNOWN: Problem 1. The Chamber of Commerce of a Florida gulf coast community advertises area commercial property available at a mean cost of under $40,000 per acre. A sample of 49 properties provided a sample mean of $38,000 per acre. Assume the standard deviation of the population is known to be $7000. a. At 95% confidence, test the validity of their advertisement. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 37 B. THE STANDARD DEVIATION OF THE POPULATION IS UNKNOWN: Problem 2. A soft drink filling machine, when in perfect adjustment, fills the bottles with 12 ounces of soft drink. A random sample of 64 bottles is selected, and the contents are measured. The sample yielded a mean content of 11.88 ounces with a standard deviation of 0.8 ounces. a. With a 0.05 level of significance (i.e., 95% confidence), test to see if the machine is in perfect adjustment. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 38 Problem 3. Chattanooga public transportation operates a fleet of electric powered shuttle buses for downtown services. Daily mean maintenance costs have been $76 per bus. A recent random sample of 25 buses shows a sample mean maintenance cost of $83.50 per day with a sample standard deviation of $30. Management would like to determine whether or not there has been a significant increase in the mean daily maintenance cost. a. At 95% confidence, test to determine whether or not the mean cost has increased. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 39 II. Hypothesis Tests about a Population Proportion (P) Problem 4. A supplier claims that more than 80% of the parts it supplies meet the product specifications. In a sample of 800 parts received, 664 met the specifications. a. At 93.7% confidence, test the supplier's claim. Ho: Ha: Conclusion: b. Compute the p-value and interpret its meaning. Professor Ahmadi’s Lecture Notes Page 40 Chapter 9 final examples: Your turn 1. For each of the following, read the t statistic from the table and write its value in the space provided. a. A two-tailed test, a sample of 31 at 80% confidence t = b. A one-tailed test (upper tail), a sample size of 22 at 99% confidence t = c. A one-tailed test (lower tail), a sample size of 16 at 95% confidence t = 2. For each of the following, read the Z statistic from the table and write its value in the space provided. a. A two-tailed test at 85.3% confidence Z = b. A one-tailed test (lower tail) at 87.7% confidence Z = c. A one-tailed test (upper tail) at 97.61% confidence Z = 3. The average dinner bill for one person in Chattanooga has been $24. It is believed there has been a significant increase in the average dinner prices. A sample of 36 dinner bills showed a mean of $27 with a standard deviation of $9. a. At 95% confidence test to determine if there has been a significant increase in the average dinner prices. Ho: Ha: Conclusion: b. Determine the p-value for the above and use it for the test Professor Ahmadi’s Lecture Notes Page 41 4. The ACT scores of a random sample of 6 UTC students are given below. Student 1 ACT Score 28 2 22 3 18 4 23 5 29 6 24 At 95% confidence test to see if the average ACT scores of UTC students is significantly different from 27. Professor Ahmadi’s Lecture Notes Page 42 CHAPTER 10 FORMULAS I. Inferences About the Difference Between Two Population Means:1 and 2 Known Point Estimator of the Difference Between the Means of Two Populations: x 1 x 2 Standard Error of x 1 x 2 (the Standard Deviation of the sampling distribution of x 1 x 2 ) x A. 1 x 2 12 22 n1 n 2 Interval Estimate of the Difference Between the Means of Two Populations ( x1 x 2 ) Z 2 12 22 n1 n 2 Margin of Error = Z 2 x B. 1 x2 Z 2 12 22 n1 n 2 Hypothesis Testing (Means), Independent Samples D0 x1 x 2 D0 Test Statistic Z x1 x 2 x x 12 22 1 2 n1 n 2 Do is the hypothesized difference between 1 and 2 . In most situations, Do = 0. Decision Rules for P-Value Approach: When Using the P-Value Approach, In All Cases Reject Ho if P-Value Decision Rules for Critical Value Approach: Lower one-tailed test of the form Ho: 1 2 D 0 Upper one-tailed test of the form Ho: 1 2 D 0 Two-tailed test of the form Ho: 1 2 = D 0 Ha: 1 2 D 0 Reject Ho if: Z -Z Ha: 1 2 D 0 Reject Ho if: Z Z Ha: 1 2 D 0 Reject Ho if: Z -Z/2 or Z Z/2 Professor Ahmadi’s Lecture Notes Page 43 CHAPTER FORMULAS (Continued) II. Inferences about the Difference Between Two Population Means: 1 and 2 Unknown A. Interval Estimate of the Difference Between the Means of Two Populations x1 x 2 t 2 B. S12 S22 n1 n 2 Hypothesis Testing (Means), Independent Samples D0 Test Statistic t x1 x 2 S12 S22 n1 n 2 The degrees of freedom for t are given by 2 S12 S22 n1 n 2 df 2 2 1 S12 1 S22 n1 1 n1 n 2 1 n 2 When computing the degrees of freedom, round to the lower integer value. Decision rules are the same as those given above for 1 and 2 Known cases, Simply substitute t for Z III. Inferences About the Difference Between Two Population Means: Matched Samples A. Interval Estimate d t B. 2 Sd n Hypothesis Test Test statistic t = d d sd Professor Ahmadi’s Lecture Notes n where Sd = (d i - d) 2 n - 1 Page 44 CHAPTER FORMULAS (Continued) IV. Analysis of Variance: Testing for the Equality of k Population Means Hypotheses to be tested: Ho: 1 = 2 = . . . = k Ha: Not all the population means are equal where j = the mean of the jth population k = the number of populations or treatments nT = Total Number of Observations The General Form of the ANOVA Table - Completely Randomized Design: Source of Variation Sum of Squares Degrees of Freedom Mean Squares Between Treatments SSTR K-1 MSTR Test Statistic F MSTR MSE Within Treatments SSE nT - K Total SST nT - 1 MSE Decision rules: When using the p-value approach, reject Ho if the p-value When using the critical value approach, reject Ho if F = Professor Ahmadi’s Lecture Notes MSTR > F MSE Page 45 CHAPTER FORMULAS (Continued) Sample Mean for Treatment j nj xj xij i 1 nj Sample Variance for Treatment j nj S2j x ij x j i 1 2 nj 1 where xij = the value of observation i for Treatment j nj = the number of observations for treatment j The Overall Sample Mean (Grand Mean) k nj x ij x j1i 1 nT n T n1 n 2 ... n k where: Mean Square due to Treatments (Between Treatments) n j x j x k MSTR SSTR k 1 where: SSTR = n j x j x k j1 2 Therefore: MSTR j1 k 1 Mean Square due to Error (Within Treatments) n j 1 S2j k MSE = SSE nT k where: SSE = n j 1 S2j k j1 SSE also can be computed as: SSE x ij x j Therefore: MSE 2 j1 nT K j i k nj SST x ij x j1i 1 Total Sum of Squares 2 or: SST = SSTR + SSE General Form of an Interval Estimate for a Population Mean s x t 2 n Professor Ahmadi’s Lecture Notes Page 46 2 I. Inferences About the Difference Between Two Population Means:1 and 2 Known A. Interval Estimate of the Difference Between the Means of Two Populations Problem 1. In order to estimate the difference between the age (in months) of computer consulting firms in the East and the West of the United States, the following information is gathered: East 40 70 5 Sample size Sample mean (months) Population Standard deviation (months) West 45 75 7 Develop an interval estimate for the difference between the average age of the firms in the East and the West. Let = 0.03. B. Hypothesis Testing (Means), Independent Samples Problem 2. Independent random samples taken at two local malls provided the following information regarding purchases of the patrons at the two malls: Hamilton Place Sample Size 80 Average purchase $43 Population Standard deviation $ 8 Northgate 75 $40 $ 6 a. Use the critical value approach and at 95% confidence test to determine whether or not there is a significant difference between the average purchases of the patrons at the two malls. Professor Ahmadi’s Lecture Notes Page 47 b. Compute the p-value and interpret its meaning. Use it to answer the question in part “a”. Professor Ahmadi’s Lecture Notes Page 48 II. Inferences about the Difference Between Two Population Means: 1 and 2 Unknown A. Interval Estimate of the Difference Between the Means of Two Populations Problem 3. In order to estimate the difference between the average daily sales of two branches of a department store, the following data has been gathered. Downtown Store North Mall Store Sample size Sample mean (in $1,000) Sample standard deviation (in $1,000) n1 = 23 days x 1 = 37 S1 = 4 n2 = 26 days x 2 = 34 S2 = 5 Develop a 95% confidence interval for the difference between the two population means. B. Hypothesis Testing (Means), Independent Samples Problem 4. Refer to Problem 3 (above) and at 95% confidence test to determine if the average daily sales of the Downtown Store (1) is significantly more than the average sales of the North Mall Store (2). Use both the critical value approach and the p-value approach. Professor Ahmadi’s Lecture Notes Page 49 III. Inferences About the Difference Between Two Population Means: Matched Samples A. Interval Estimate Problem 5. The daily production rates of a sample of workers in a factory before and after a training program are shown below: Worker 1 2 3 4 5 6 Before 6 10 10 8 7 11 After 10 13 9 11 9 12 Provide a 95% confidence interval for the difference between the mean production rates of before and after the training program. B. Hypothesis Test Problem 6. Refer to Problem 5 (above) and at 95% confidence test to see if the training program was effective. That is, did the training program actually increase the production rates? Professor Ahmadi’s Lecture Notes Page 50 IV. Analysis of Variance: Testing for the Equality of k Population Means Completely Randomized Design Problem 7. Ahmadi, Inc. uses three types of advertising (radio, newspaper, and television) in three different geographical areas. The company is interested in determining whether there is a significant difference in the effectiveness among the three different methods of advertising. Sales (in $ millions) over a six-day period for the three geographical areas are shown below: Area 1 (Radio) 48 40 36 50 51 45 Area 2 (Paper) 48 46 42 50 48 48 Area 3 (T.V.) 44 52 54 52 50 60 At 95% confidence test to determine whether there is a significant difference in the effectiveness among the three different methods of advertising. Professor Ahmadi’s Lecture Notes Page 51 Problem 8. Three universities in your state have decided to administer the same comprehensive examination to the recipients of MBA degrees from the three institutions. From each institution, a random sample of MBA recipients has been selected and given the test. The following table shows the scores of the students from each university. Northern University Central University Southern University 56 85 65 86 93 62 97 91 82 94 72 93 78 54 77 77.0 83.0 78.0 Sample Variance ( s2j ) 246.5 234.0 218.8 Sample Mean ( x j ) At = 0.01, test to see if there is any significant difference in the average scores of the students from the three universities. Note that the sample sizes are not equal. Professor Ahmadi’s Lecture Notes Page 52 Problem 9. Part of an ANOVA table involving 8 groups for a study is shown below. Source of Variation Sum of Squares Degrees of Freedom Mean Square Between Treatments 126 ? ? Within Treatments 240 ? ? Total a. b. ? F ? 67 Complete all the missing values in the above table and fill in the blanks. Use = 0.05 to determine if there is any significant difference among the means of the eight groups. Problem 10. In a completely randomized experimental design, 11 experimental units were used for each of the 4 treatments. Part of the ANOVA table is shown below. Source of Variation Sum of Squares Degrees of Freedom Mean Square Between Treatments 1500 ? ? Within Treatments ? ? ? Total a. b. F ? 5500 Fill in the blanks in the above ANOVA table. Use = 0.05 to determine if there is any significant difference among the means of the four groups. Professor Ahmadi’s Lecture Notes Page 53 CHAPTER 11 FORMULAS A. Interval Estimation of the Difference Between the Proportions of Two Populations ( P1 P 2 ) Z 2 S p p 1 B. Where Sp 2 p 1 2 p p p p 1 1 2 2 Where SP Assuming p 1 = p 2 , the pooled proportion is computed as p Sp 1 p 2 1 p2 1 1 p (1 - p) n1 n 2 n1 p1 n2 p 2 X 1 X 2 n1 n2 n1 n2 Goodness of Fit Test The Test Statistic “ ” is: 2 D. Hypothesis Test about the Difference Between the Proportions of Two Populations The Test Statistic “Z” is: Z C. p1 1 p1 p 2 1 p 2 n1 n2 k f e 2 χ i i i 1 ei 2 Test of Independence The Test Statistic “ ” is: 2 Professor Ahmadi’s Lecture Notes i j 2 f ij eij 2 eij Page 54 Chapter 11 A. Interval Estimation of the Difference Between the Proportions of Two Populations Problem 1. In a sample of 400 Democrats, 60 said that they support the president's new tax proposal. While of 500 Republicans, only 80 said they support it. Determine a 90% confidence interval estimate for the difference between the proportions of the opinions of the individuals in the two parties. B. Hypothesis Test about the Difference Between the Proportions of Two Populations Problem 2. In a sample of 600 Republicans, 480 were in favor of the President's foreign policies. While in a sample of 900 Democrats, 675 were in favor of his policies. a. At 95% confidence, test to see if there is a significant difference in the proportions of the Democrats and the Republicans who are in favor of the President's foreign policies. b. Compute the p-value and use it to test to determine if the percentage of Republicans who favored the president’s foreign policies is significantly more than the percentage of Democrats. Professor Ahmadi’s Lecture Notes Page 55 C. Goodness of Fit Test Problem 3. The AMA Journal reported the following frequencies of deaths due to cardiac arrest for each day of the week. Cardiac Death by Day of the Week Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday f _ 40 17 16 29 15 20 17 At 95% confidence, determine whether the number of deaths is uniform over the week. Professor Ahmadi’s Lecture Notes Page 56 D. Test of Independence - Contingency Tables Problem 4. Dr. Ahmadi’s diet pills are supposed to cause significant weight loss. The following table shows the results of a recent study where some individuals took the diet pills and some did not. No weight loss Weight loss Total Diet pills 80 100 180 No Diet pills 20 100 120 Total 100 200 300 With 95% confidence, test to see if losing weight is dependent on taking the diet pills. Professor Ahmadi’s Lecture Notes Page 57 CHAPTER 12 FORMULAS Simple Linear Regression Model y = 0 + 1 x + Simple Linear Regression Equation E(y) = 0 + 1 x Least Squares Criterion Min y i y i 2 Estimated Simple Linear Regression Equation y b o b 1 x = the estimated value of the dependent variable y where b1 = the slope of the line b1 ( x i x)( y i y) 2 (x i x) (x SSR and b0 = the y-intercept and b o y b1 x Sum of Squares Due to Regression x )(y i y ) 2 (x i x ) i 2 Total Sum of Squares SST = y i y 2 SSE = y i y i Also: SST = SSR + SSE Sum of Squares Due to Error 2 Coefficient of Determination r2 SSR SST Professor Ahmadi’s Lecture Notes Also r 2 1 SSE SST Page 58 CHAPTER FORMULAS (Continued) Sample Correlation Coefficient r = (the sign of b1) where Coefficient of Determination =+ r2 b1 = the slope of the regression equation Mean Square Error (Estimate of 2 ) s 2 MSE SSE n-2 Standard Error of the Estimate s MSE t Test for significance of the slope of the regression equation H o : 1 0 H a : 1 0 t statistic: t b1 s b1 where s b1 (Estimated Standard Deviation of b1) is: Reject Ho if s b1 s Σ(x i x)2 t t 2 or: t t 2 (degrees of freedom = n – p – 1) Professor Ahmadi’s Lecture Notes Page 59 CHAPTER FORMULAS (Continued) F Test for Significance of the Linear Regression Model (ANOVA) H o : 1 0 (i.e., the regression model is NOT significant) H a : 1 0 (the regression model IS significant) ANOVA Table Source of Variation Sum of Squares Regression SSR Degrees of Freedom p Mean Square Test Statistic F MSR MSR MSE Error (Residual) SSE n-p-1 Total SST n-1 Where: p = Number of independent variables MSE n = The sample size Reject Ho if the Test statistic F > Critical F Confidence Interval Estimate for the Mean Value of y, that is E(yp) y p t 2 sy p Estimated Standard Deviation of y p s ŷ p s 2 1 (x p x) n ( x i x ) 2 Remember: s MSE Professor Ahmadi’s Lecture Notes Page 60 Chapter 12 Simple (Bivariate) Linear Regression and Correlation Problem 1. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume and their advertising expenditure over a period of 8 years. (Y) Sales Year (In $1,000,000) (X) Advertising (In $10,000) 1996 15 32 1997 16 33 1998 18 35 1999 17 34 2000 16 36 2001 19 37 2002 19 39 2003 24 42 a. b. c. d. e. f. g. h. i. j. k. Develop a scatter diagram of sales versus advertising. Use the method of least squares to compute an estimated regression line between sales and advertising. If the company's advertising expenditure is $400,000, what is the predicted sales? Give the answer in dollars. What does the slope of the estimated regression line indicate? Compute the coefficient of determination and fully interpret its meaning. Use the F test to determine whether or not the regression model is significant. Let = 0.05. Use the t test to determine whether the slope of the regression model is significant. Let = 0.05 Explain the basic assumptions about the error term in regression. Develop a 95% confidence interval for predicting the average sales for the years when $400,000 was spent on advertising. Use Excel and solve the above problems. Using Excel determine the regression equation between sales an time (where 1996 = 1). Professor Ahmadi’s Lecture Notes Page 61 CHAPTER 13 FORMULAS Multiple Regression Model y = 0 + 1x1 + 2x2 + . . . pxp + Multiple Regression Equation E(y) = 0 + 1x1 + 2x2 + . . . pxp Estimated Multiple Regression Equation ŷ = b0 + b1x1 + b2x2 + . . . + bpxp Least Squares Criterion Min y i y i 2 where Relationship among SST, SSR, and SSE SST = SSR + SSE Multiple Coefficient of Determination r2 = SSR SST Also r 2 1 SSE SST Adjusted Multiple Coefficient of Determination n 1 R a2 1 1 R 2 n p 1 Excel’s ANOVA Table ANOVA Regression Residual Total df p n-p-1 n-1 Professor Ahmadi’s Lecture Notes SS SSR SSE SST MS MSR = SSR/p MSE = SSE/(n-p-1) F Significance F F = MSR/MSE Page 62 CHAPTER FORMULAS (Continued) F Test for Overall Significance in Multiple Regression Ho: 1 2 ... p 0 (the model is not significant) Ha: One or more of the coefficients is not equal to zero (the model is significant) Test Statistic F= MSR (See Excel’s ANOVA table) MSE When using the p-value approach, reject Ho if the p-value When using the critical value approach, reject Ho if the test statistic F F where F is based on an F distribution with p numerator degrees of freedom and (n – p – 1) denominator degrees of freedom t Test for Individual Significance in Multiple Regression Ho: i 0 Ha: i 0 for any parameter i Test Statistic b t= i s bi When using the p-value approach, reject Ho if the p-value When using the critical value approach, reject Ho if the test statistic t t 2 or if t t 2 , where t 2 is based on a t distribution with (n - p – 1) degrees of freedom Professor Ahmadi’s Lecture Notes Page 63 Chapter 13 Multiple Regression and Correlation Problem 1. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume, their advertising expenditure, and the number of individuals in the sales force over a period of 15 years: (Y) X1 X2 X3 Sales Advertising Sales Force Time Year ($1,000,000) ($10,000) (100) 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 15 16 18 17 16 19 19 24 25 27 30 33 38 40 45 32 33 35 34 36 37 39 42 44 40 45 50 49 50 55 10 12 11 14 16 18 17 20 25 22 27 28 30 30 35 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 a. Using Excel, enter the above data in a file and save the file. Print the file as well as the results of all of the following parts. b. Run the correlation analysis relating sales (Y) and all of the independent variables. (Do not include the column of Year.) Explain the results. Discuss the concept of multicollinearity. c. d. Run the Regression analyses relating sales (Y) and advertising (X1). Explain the results. Run a regression analysis relating sales (Y) and two independent variables X1 and X2. Explain the results. Run a regression analysis relating sales (Y) and two independent variables X1 and X3. Explain the results. Using the model developed in part "e", predict sales for 2004 assuming we are planning to advertise $700,000. Run a regression analysis relating sales (Y) and Time (X3). Explain the results. Using the model developed in part "g" predict sales for 2008. Run a regression analysis relating sales (Y) and three independent variables X1, X2, and X3. Explain the results. e. f. g. h. i. Professor Ahmadi’s Lecture Notes Page 64 Chapter 13 Multiple Regression & Correlation With Dummy Variables Problem 2. Ahmadi, Inc. is a microcomputer producer. The following data represent Ahmadi's yearly sales volume, their advertising expenditure, and whether in a given year they used all Television advertising (X2 = 0) or used Multimedia advertising (X2 = 1). (Y) X1 X2 Sales Advertising Dummy Variable Year ($1,000,000) ($10,000) (0,1) 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 15 16 18 17 16 19 19 24 25 27 30 33 38 40 45 32 33 35 34 36 37 39 42 44 40 45 50 49 50 55 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1 Regression procedure of Excel was used on the above data and parts of the results are shown on the next page. a. Fill in all the blanks on the next page. b. Write the estimated regression equation. c. Using the results shown on the next page, predict sales for the year 2004 assuming we are planning to use $700,000 for television advertising only. d. Using the results shown on the next page, predict sales for the year 2004 assuming we are planning to use $700,000 for multimedia advertising. Professor Ahmadi’s Lecture Notes Page 65 SUMMARY OUTPUT Multiple R R Square Adjusted R Square Standard Error Observations ___________? ___________? ___________? 2.715 ___________? ANOVA Regression Residual Total df ___________? ___________? ___________? Intercept Advertising Dummy Coefficients -28.462401 1.31332227 -0.8296375 Professor Ahmadi’s Lecture Notes SS 1243.274 ___________? ___________? Standard Error 4.285592715 0.10113336 1.406029116 MS ___________? ___________? t Stat ___________? ___________? ___________? F ___________? Significance F 8.59E-08 P-value ___________? ___________? ___________? Page 66 Your Turn – One Final Example Significance of variables and other issues Problem 3. Ahmadi, Inc. produces several models of computer printers. Data on a few variables for one of the company’s printers are presented below. Sales (Y) (In $1,000,000) 1578 1741 2295 2134 2035 2408 2337 2468 2533 2800 2729 2799 3264 3367 3289 3453 5031 6125 6519 4586 4876 4675 3473 3669 4167 a. b. c. Advertising (X1) (In $1,000) 588 600 600 780 750 820 810 840 700 970 920 950 980 1167 800 1255 1706 1890 1996 1700 1706 1888 1300 1500 1400 Price (X2) (In $100) 21 20 17 21 21 19 20 25 25 16 15 24 17 19 12 17 17 12 17 15 21 14 19 18 24 Competitor's Price (X3) (In $100) 20 22 19 21 21 21 20 22 24 18 21 23 23 17 18 16 25 26 28 18 24 23 24 21 23 Time (X4) (In Years) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Rating (X5) (0 to 10) 4 2 4 8 6 8 8 6 8 8 6 6 6 4 6 6 8 8 8 10 4 6 10 8 4 Enter the above data into an Excel file and save the file. Print the file and the results of all of the following parts. Run a correlation analysis (among all variables) and print the results. Fully discuss the meaning of the correlation coefficients. Be sure to discuss the concept of multicollinearity. Run a regression analysis relating sales (Y) and ALL the independent variables. Fully explain the results. Professor Ahmadi’s Lecture Notes Page 67 d. Drop the variable(s) that at 95% confidence were not significant in part “c” and run a new regression analysis. Fully explain your results. Professor Ahmadi’s Lecture Notes Page 68