Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
MEASURES OF VARIABILITY • Variance – Population variance – Sample variance • Standard Deviation – Population standard deviation – Sample standard deviation • Coefficient of Variation (CV) – Sample CV – Population CV 1 MEASURES OF VARIABILITY POPULATION VARIANCE • The population variance is the mean squared deviation from the population mean: N 2 • • • • • (x i 1 i ) N Where 2 stands for the population variance is the population mean N is the total number of values in the population xi is the value of the i-th observation. represents a summation 2 MEASURES OF VARIABILITY SAMPLE VARIANCE • The sample variance is defined as follows: N s2 • • • • • (x i 1 i x) n 1 Where s2 stands for the sample variance x is the sample mean n is the total number of values in the sample xi is the value of the i-th observation. represents a summation 3 MEASURES OF VARIABILITY SAMPLE VARIANCE • A sample of monthly advertising expenses (in 000$) is taken. The data for five months are as follows: 2.5, 1.3, 1.4, 1.0 and 2.0. Compute the sample variance. 4 MEASURES OF VARIABILITY SAMPLE VARIANCE • Notice that the sample variance is defined as the sum of the squared deviations divided by n-1. • Sample variance is computed to estimate the population variance. • An unbiased estimate of the population variance may be obtained by defining the sample variance as the sum of the squared deviations divided by n-1 rather than by n. • Defining sample variance as the mean squared deviation from the sample mean tends to underestimate the population variance. 5 MEASURES OF VARIABILITY SAMPLE VARIANCE • A shortcut formula for the sample variance: 2 n xi n 1 i 1 2 s2 x i n 1 i 1 n • • • • Where s2 is the sample variance n is the total number of values in the sample xi is the value of the i-th observation. represents a summation 6 MEASURES OF VARIABILITY SAMPLE VARIANCE • A sample of monthly sales expenses (in 000 units) is taken. The data for five months are as follows: 264, 116, 165, 101 and 209. Compute the sample variance using the short-cut formula. 7 MEASURES OF VARIABILITY SAMPLE VARIANCE • The shortcut formula for the sample variance: 2 n xi n 1 i 1 2 s2 x i n 1 i 1 n • n x If you have the sum of the measurements i 1 i already computed, the above formula is a shortcut because you need only to compute the sum of the squares, n x i 1 i 8 MEASURES OF VARIABILITY POPULATION/SAMPLE STANDARD DEVIATION • The standard deviation is the positive square root of the variance: 2 Population standard deviation: 2 s s Sample standard deviation: • Compute the standard deviations of advertising and sales. 9 MEASURES OF VARIABILITY POPULATION/SAMPLE STANDARD DEVIATION • Compute the sample standard deviation of advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0 • Compute the sample standard deviation of sales data: 264, 116, 165, 101 and 209 10 MEASURES OF VARIABILITY POPULATION/SAMPLE CV • The coefficient of variation is the standard deviation divided by the means Population coefficient of variation: CV s Sample coefficient of variation: cv x 11 MEASURES OF VARIABILITY POPULATION/SAMPLE CV • Compute the sample coefficient of variation of advertising data: 2.5, 1.3, 1.4, 1.0 and 2.0 • Compute the sample coefficient of variation of sales data: 264, 116, 165, 101 and 209 12 MEASURES OF ASSOCIATION • Scatter diagram plot provides a graphical description of positive/negative, linear/non-linear relationship • Some numerical description of the positive/negative, linear/non-linear relationship are obtained by: – Covariance • Population covariance • Sample covariance – Coefficient of correlation • Population coefficient of correlation • Sample coefficient of correlation 13 MEASURES OF ASSOCIATION: EXAMPLE • A sample of monthly advertising and sales data are collected and shown below: Month Sales (000 units) Advertising (000 $) 1 2 3 4 5 264 116 165 101 209 2.5 1.3 1.4 1.0 2.0 • How is the relationship between sales and advertising? Is the relationship linear/non-linear, positive/negative, etc. 14 POPULATION COVARIANCE • The population covariance is mean of products of deviations from the population mean: N COV ( X ,Y ) • • • • x i 1 i x yi y N Where COV(X,Y) is the population covariance x, y are the population means of X and Y respectively N is the total number of values in the population xi , yi are the values of the i-th observations of X and Y respectively. 15 • represents a summation SAMPLE COVARIANCE • The sample covariance is mean of products of deviations from the sample mean: x x y y n 1 cov( X ,Y ) • • • • i 1 i i n 1 Where cov(X,Y) is the sample covariance x , y are the sample means of X and Y respectively n is the total number of values in the population xi , yi are the values of the i-th observations of X and Y respectively. 16 • represents a summation SAMPLE COVARIANCE Sales Advertising Month (in 000$) (in 000 units) 264 2.5 1 116 1.3 2 165 1.4 3 101 1 4 209 2 5 171 1.64 Mean 0.602495 67.18258703 SD Total= cov = 17 POPULATION/SAMPLE COVARIANCE • If two variables increase/decrease together, covariance is a large positive number and the relationship is called positive. • If the relationship is such that when one variable increases, the other decreases and vice versa, then covariance is a large negative number and the relationship is called negative. • If two variables are unrelated, the covariance may be a small number. • How large is large? How small is small? 18 POPULATION/SAMPLE COVARIANCE • How large is large? How small is small? A drawback of covariance is that it is usually difficult to provide any guideline how large covariance shows a strong relationship and how small covariance shows no relationship. • Coefficient of correlation can overcome this drawback to a certain extent. 19 POPULATION COEFFICIENT OF CORRELATION • The population coefficient of correlation is the population covariance divided by the population standard deviations of X and Y: COV ( X ,Y ) x y • Where is the population coefficient of correlation • COV(X,Y) is the population covariance • x, y are the population means of X and Y respectively 20 SAMPLE COEFFICIENT OF CORRELATION • The sample coefficient of correlation is the sample covariance divided by the sample standard deviations of X and Y: COV ( X ,Y ) x y • Where r is the sample coefficient of correlation • cov(X,Y) is the sample covariance • sx, sy are the sample means of X and Y respectively 21 SAMPLE COEFFICIENT OF CORRELATION Sales Advertising Month (in 000$) (in 000 units) 264 2.5 1 116 1.3 2 165 1.4 3 101 1 4 209 2 5 171 1.64 Mean 0.602495 67.18258703 SD Total= cov = r= 22 POPULATION/SAMPLE COEFFICIENT OF CORRELATION • The coefficient of correlation is always between -1 and +1. – Values near -1 or +1 show strong relationship – Values near 0 show no relationship’ – Values near 1 show strong positive linear relationship – Values near -1 show strong negative linear relationship 23 EXAMPLE • Salary and expenses for cultural activities, and sports related activities are collected from 100 households. Data of only 5 households shown below: Salary and expenses data for 100 households Salary Culture Sports $54,600 $1,020 $990 $57,500 $1,100 $460 $53,300 $900 $780 $43,500 $570 $860 $57,200 $900 $1,390 How are the relationships (linear/nonlinear, positive/negative) between (i) salary and culture, (ii) salary and sports, and (iii) sports and culture? 24 Expenses for Cultural Activities SALARY-CULTURE $1,600 $1,200 $800 $400 $0 $35,000 $55,000 $75,000 Salary cov = 1094787, r = 0.5065 (positive, linear) $95,000 25 Expenses for cultural activities SPORTS-CULTURE 1600 1200 800 400 0 $500 $1,000 $1,500 $2,000 Expenses for sports related activities cov = -33608, r = -0.5201 (negative, linear) 26 SALARY-SPORTS Expenses for sports related activities $1,900 $1,400 $900 $400 $35,000 $55,000 $75,000 $95,000 Salary cov = -219026, r = -0.08122 (no linear relationship) 27