* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Intro Basic Statistics Part I
Survey
Document related concepts
Transcript
Introduction to Business Analytics Dr. Barry A. Wray 1 Measures of Central Tendency • Average/Arithmetic mean • Median • Mode 2 Measures of Central Tendency Average/Arithmetic mean or “Average” value for a variable. denoted by the Greek small m: m Population Mean (mu), m= 𝑥𝑖 𝑁 = 𝑥1 + 𝑥2 + · · · + 𝑥𝑁 𝑁 • 𝑥1 = value of variable x for the first observation • 𝑥2 = value of variable x for the second observation • 𝑥𝑛 = value of variable x for the nth observation Where N = the Population size 3 Measures of Central Tendency Same mean denoted by 𝑥. Sample mean, 𝑥 = 𝑥𝑖 𝑛 = 𝑥1 + 𝑥2 + · · · + 𝑥𝑛 𝑛 where n = sample size 4 Measures of Central Tendency Sample median (middle value) 𝑋𝑛+1 = location of the median 2 where n = sample size *First you MUST order the data from smallest to largest! 5 Measures of Central Tendency Audit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Days 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 To locate the mean we need to find: 𝑥= = 𝑥𝑖 𝑛 = 385 20 𝑥1 + 𝑥2 + · · · + 𝑥𝑛 𝑛 = 12 + 13 + · · · + 33 20 = 19.25𝑎𝑢𝑑𝑖𝑡𝑠 6 Measures of Central Tendency Audit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Days 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 To locate the median we need to find: 𝑋𝑛+1 = 𝑋20+1 = 𝑋21 = 𝑋10.5 2 2 2 When you get a location like 10.5 then you add the 10th and 11th observations and divide by 2 to get the median. 18+18 = 2 18audits 7 Measures of Central Tendency Audit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Days 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 To find the mode we need to find the value that occurs most frequently: First you MUST order the data! Mode = 18audits • Note: if there was one more 14 then we would have two modes: 14 and 18 8 Measures of Dispersion Audit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Days 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 The Standard Deviation is a measure of the typical difference (distance) between each value for a variable and the mean of all values. Note the difference in the formulas for the Population Standard Deviation and the sample standard deviation. 9 Audit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Days 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 Measures of Dispersion Population Std. Dev. 𝑁 1 𝜎= 𝑋𝑖 −𝜇 2 where m is the Population mean 𝑁 and N is the Population size s= 𝑛 1 𝑋𝑖 −𝑋 2 where 𝑋 is the sample mean 𝑛−1 and n is the sample size 10 Audit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Days 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 Measures of Dispersion Because this is sample data we use: s= = = 𝑛 1 𝑋𝑖 −𝑋 2 𝑛−1 12−19.25 2 + 13−19.25 2 +⋯+ 33−19.25 2 20−1 561.75 19 = 29.5658 =5.4374days 11 Audit 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Days 12 13 14 14 15 15 16 17 18 18 18 19 20 21 22 22 23 27 28 33 Parameters and statistics Because this is sample data we use: So what is your best guess for m? 𝑥 = 19.25𝑑𝑎𝑦𝑠 So, what is your best guess for s? s=5.4374days 12 Practice • For the following 3 sample data sets compute the mean, median, and mode. Also compute the standard deviation for each. 13 Practice • Data Set #1: Observation Data 40 1 43 2 48 3 50 4 52 5 53 6 55 7 59 8 14 Practice • Data Set #2: Observation Data 40 1 43 2 48 3 50 4 52 5 53 6 55 7 159 8 15 Practice • Data Set #3: Observation Data 18 1 22 2 25 3 48 4 54 5 65 6 75 7 93 8 16 Measures of Association Between Two Variables • Scatter Charts: Useful graph for analyzing the relationship between two variables. • Covariance: Descriptive measure of the linear association between two variables. • Sample covariance for a sample of size n with the observations (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), and so on: 𝑠𝑥𝑦 = • 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 𝑛−1 Population covariance, 𝜎𝑥𝑦 = 𝑥𝑖 − µ𝑥 𝑦𝑖 − µ𝑦 𝑁 17 Measures of Association Between Two Variables • Correlation coefficient: Measures the relationship between two variables. • • Not affected by the units of measurement for x and y. Sample correlation coefficient denoted by 𝑟𝑥𝑦 . • 𝑟𝑥𝑦 = 𝑠𝑥𝑦 𝑠𝑥 𝑠𝑦 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 • 𝑠𝑥𝑦 = sample covariance = • 𝑥𝑖 − 𝑥 𝑛−1 2 𝑠𝑥 = sample standard deviation of x = 2 𝑠𝑦 = sample standard deviation of y = 𝑦𝑖 − 𝑦 𝑛−1 • 𝑛−1 18 Interpretation of Correlation Coefficient • –1 ≤ r ≤ +1 r value Relationship between the x and y variables <0 Negative linear Near 0 No linear relationship >0 Positive linear 19 Data for Bottled Water Sales at Queensland Amusement Park for a Sample of 14 Summer Days 20 Chart Showing the Positive Linear Relation Between Sales and High Temperatures 21 Sample Covariance Calculations for Daily High Temperature and Bottled Water Sales at Queensland Amusement Park 22 Scatter Diagrams and Associated Covariance Values for Different Variable Relationships (a) 𝑠𝑥𝑦 Positive: (x and y are positively linearly related) (b) (c) 𝑠𝑥𝑦 Approximately 0: 𝑠𝑥𝑦 Negative: (x and y are not linearly related) (x and y are negatively linearly related) 23 Computation of Correlation Coefficient • Illustration - To determine the sample correlation coefficient for bottled water sales at Queensland Amusement Park: 𝑟𝑥𝑦 = 𝑠𝑥𝑦 𝑠𝑥 𝑠𝑦 = 12.8 = 0.93 (4.36)(3.15) • There is a very strong linear relationship between high temperature and sales. R2 is the coefficient of determination. 24 Example of Nonlinear Relationship Producing a Correlation Coefficient Near Zero 25 Calculating Covariance and Correlation Coefficient for Bottled Water Sales Using Excel 26