Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Survey

Document related concepts

no text concepts found

Transcript

Statistics © Copyright 2001, Alan Marshall 1 Statistics Branch of Mathematics that deals with the collection and analysis of data Descriptive Statistics: used to analyze and describe data Inferential Statistics: used to use the information to make statements regarding the relationships between variables or the expectations about future events. © Copyright 2001, Alan Marshall 2 Measures of Central Tendency © Copyright 2001, Alan Marshall 3 Measures of Central Tendency Arithmetic Mean Median Mode Geometric © Copyright 2001, Alan Marshall Mean 4 Arithmetic Mean Other names Average Mean Sum of the measuremen ts Mean Number of measuremen ts © Copyright 2001, Alan Marshall 5 Arithmetic Mean Sample Population n x x i1 n N i x xi i 1 N The calculation is identical, just the notation varies slightly © Copyright 2001, Alan Marshall 6 Summation Notation N x x t t t 1 N t 1 Notice that the first form uses less vertical space on the page This makes accountants very happy The first can also be easier to fit into a line of text © Copyright 2001, Alan Marshall 7 Example Ten second year BBA students wrote the CSC exam last month Their scores were: 71, 72, 88, 69, 77, 63, 91, 81, 83, 75 71 72 88 69 77 63 91 81 83 75 x 77 10 © Copyright 2001, Alan Marshall 8 Calculating the Mean Arithmetic mean sum the observations and divide by the number of observations Example: 5%, 7%, -2%, 12%, 8% N r t 30% r 6% N 5 t 1 © Copyright 2001, Alan Marshall 9 Problem with the Arithmetic Mean Arithmetic mean is incorrect for variables that are related multiplicatively, like rates of growth, rates of return and rates of change $1,000 at 6% for 5 years should be $1,338.23 (1000)(1.06)5 =1338.23 © Copyright 2001, Alan Marshall Starting Value 1000.00 1050.00 1123.50 1101.03 1233.15 Rate of Return 5% 7% -2% 12% 8% Ending Value 1050.00 1123.50 1101.03 1233.15 1331.81 10 Geometric Mean The Geometric Mean should be used for rates of change, like rates of return N r (1 rt ) t 1 1 N 1 1 5 1.05 1.07 0.98 1.121.08 (1.3318059) © Copyright 2001, Alan Marshall 0. 2 1 1 5.898% 11 Geometric Mean The Geometric Mean should be used for rates of change, like rates of return N r (1 rt ) t 1 1 N Means: The product of these factors from 1 to N 1 1 5 1.05 1.07 0.98 1.121.08 (1.3318059) © Copyright 2001, Alan Marshall 0. 2 1 1 5.898% 12 Geometric vs. Arithmetic Mean The more variable the underlying data, the greater the error using the Arithmetic mean The Geometric Mean is often easier to calculate: Stock prices: 1992: $20; 1999: $40, R = 10.41% © Copyright 2001, Alan Marshall 13 Geometric vs. Arithmetic Mean For analysis of past performance, use the Geometric mean The past returns have averaged 5.898% To use the past returns to estimate the future expected return, use the Arithmetic mean The expected return is 6% © Copyright 2001, Alan Marshall 14 Median and Mode Median: Midpoint If odd number of observations: Middle observation If even number of observations: Average of middle 2 observations Mode: Most frequent © Copyright 2001, Alan Marshall 15 Example Our CSC mark data was (sorted): 63, 69, 71, 72, 75, 77, 81, 83, 88, 91 The median is 76 There is no mode © Copyright 2001, Alan Marshall 16 Example x 71 72 88 69 77 63 91 81 83 75 77 Deviation -6 -5 11 -8 0 -14 14 4 6 -2 0 © Copyright 2001, Alan Marshall The Deviation is the difference between each observation and the mean The sign indicates whether the observation is above (+) or below (-) the mean 17 Example x 71 72 88 69 77 63 91 81 83 75 77 Deviation -6 -5 11 -8 0 -14 14 4 6 -2 0 © Copyright 2001, Alan Marshall The average deviation is always zero If it isn’t, you must have made a mistake! 18 Measures of Dispersion © Copyright 2001, Alan Marshall 19 Measures of Dispersion So far, we have look at measures of central tendency What about measuring the tendency of the data to vary from these centre? © Copyright 2001, Alan Marshall 20 Measures of Dispersion Range Highest - Lowest Variance Standard Deviation © Copyright 2001, Alan Marshall 21 Example x 71 72 88 69 77 63 91 81 83 75 77 Deviation -6 -5 11 -8 0 -14 14 4 6 -2 0 © Copyright 2001, Alan Marshall The range is 91-63=28 The range can be extremely sensitive to outlier observations Suppose one of these students had a very bad day and scored 8. The range would now be 91-8=83 22 Mean Absolute Deviation x 71 72 88 69 77 63 91 81 83 75 77 Deviation -6 -5 11 -8 0 -14 14 4 6 -2 0 © Copyright 2001, Alan Marshall |D| 6 5 11 8 0 14 14 4 6 2 7 The Mean Absolute Deviation is a measure of average dispersion that is not used very much It has some undesirable mathematical properties beyond the level of this course 23 Mean Squared Deviation x 71 72 88 69 77 63 91 81 83 75 77 Deviation -6 -5 11 -8 0 -14 14 4 6 -2 0 © Copyright 2001, Alan Marshall D2 36 25 121 64 0 196 196 16 36 4 694 The Mean Squared Deviation is very commonly used The MSD in this example is 694/10=69.4 The more common name of the MSD is the VARIANCE 24 Variance Variance measures the amount of dispersion from the mean. For Populations: For Samples: x 2 i N © Copyright 2001, Alan Marshall x x 2 2 ˆ s 2 2 i n 1 25 Standard Deviation Standard Deviation measures the amount of dispersion from the mean. For Populations: For Samples: x x i N © Copyright 2001, Alan Marshall x x 2 2 ˆ s i N1 26 Standard Deviation Example Using the previous example The data is sample data ˆ s x i x 2 N1 694 8.78 10 1 © Copyright 2001, Alan Marshall 27 Interpreting the Std. Dev. You have heard of the Bell Shaped or Normal Distribution The properties of the Normal Distribution are well known and give us the EMPIRICAL RULE © Copyright 2001, Alan Marshall 28 Normal Distribution 2/ 3 95% 99.7% 0.15% Tail -4 2.5% Tail 2.5% Tail -3 -2 -1 0 1 2 0.15% Tail 3 4 Z Value- Standard Deviations from Mean © Copyright 2001, Alan Marshall 29 Empirical Rule For approximately Normally Distributed data: Within 1 of the mean: approx.. 2/3s Within 2 of the mean: approx. 95% (19/20) Within 3 of the mean: virtually all © Copyright 2001, Alan Marshall 30 Quartiles, Percentiles, etc. The Median splits the data in half Quartiles split the data into quarters Deciles split the data into tenths Percentiles split the data into onehundredths © Copyright 2001, Alan Marshall 31 Rank Measures “That was a top-half performance” “WTG Special fund has been a top quartile performer for the past 5 years” “Our programme accepts only students proven to be top decile performers” “I was in the 92nd percentile on the GMAT” © Copyright 2001, Alan Marshall 32 Using Excel Full Descriptive Statistics Tools Data Analysis Descriptive Statistics © Copyright 2001, Alan Marshall 33 Measures of Association © Copyright 2001, Alan Marshall 34 Bivariate Statistics So far, we have been dealing with statistics of individual variables We also have statistics that relate pairs of variables © Copyright 2001, Alan Marshall 35 Interactions Sometimes two variables appear related: smoking and lung cancers height and weight years of education and income engine size and gas mileage GMAT scores and MBA GPA house size and price © Copyright 2001, Alan Marshall 36 Interactions Some of these variables would appear to positively related & others negatively If these were related, we would expect to be able to derive a linear relationship: y = a + bx where, b is the slope, and a is the intercept © Copyright 2001, Alan Marshall 37 Linear Relationships We will be deriving linear relationships from bivariate (two-variable) data Our symbols will be: y 0 1x or ŷ 0 1x ˆ 1 Slope ˆ 0 Intercept Error term © Copyright 2001, Alan Marshall 38 Example Consider the following example comparing the returns of Consolidated Moose Pasture stock (CMP) and the TSE 300 Index The next slide shows 25 monthly returns © Copyright 2001, Alan Marshall 39 Example Data TSE CMP TSE CMP TSE CMP x y x y x y 3 4 -4 -3 2 4 -1 -2 -1 0 -1 1 2 -2 0 -2 4 3 4 2 1 0 -2 -1 5 3 0 0 1 2 -3 -5 -3 1 -3 -4 -5 -2 -3 -2 2 1 1 2 1 3 -2 -2 2 -1 © Copyright 2001, Alan Marshall 40 Example From the data, it appears that a positive relationship may exist Most of the time when the TSE is up, CMP is up Likewise, when the TSE is down, CMP is down most of the time Sometimes, they move in opposite directions Let’s graph this data © Copyright 2001, Alan Marshall 41 Graph Of Data 6 CMP 4 2 0 -6 -4 -2 0 2 4 TSE 6 -2 -4 -6 © Copyright 2001, Alan Marshall 42 Example Summary Statistics The data do appear to be positively related Let’s derive some summary statistics about these data: 2 Mean s CMP 0.00 7.25 2.69 TSE 0.00 6.25 2.50 © Copyright 2001, Alan Marshall s 43 Observations Both have means of zero and standard deviations just under 3 However, each data point does not have simply one deviation from the mean, it deviates from both means Consider Points A, B, C and D on the next graph © Copyright 2001, Alan Marshall 44 Graph of Data 6 CMP 4 A 2 B 0 -6 -4 -2 0 -2 2 TSE 4 6 D C -4 -6 © Copyright 2001, Alan Marshall 45 Implications When points in the upper right and lower left quadrants dominate, then the sums of the products of the deviations will be positive When points in the lower right and upper left quadrants dominate, then the sums of the products of the deviations will be negative © Copyright 2001, Alan Marshall 46 An Important Observation The sums of the products of the deviations will give us the appropriate sign of the slope of our relationship © Copyright 2001, Alan Marshall 47 Covariance (Showing the formula only to demonstrate a concept) x i x y i y N COV ( X, Y ) XY i1 N x i x y i y x i y i i1 n cov( X, Y ) s XY © Copyright 2001, Alan Marshall n 1 x i y i n 1 n 48 Covariance 6 CMP 4 A 2 B 0 -6 -4 -2 0 -2 2 TSE 4 6 D C -4 -6 © Copyright 2001, Alan Marshall 49 Covariance In the same units as Variance (if both variables are in the same unit), i.e. units squared Very important element of measuring portfolio risk in finance © Copyright 2001, Alan Marshall 50 Covariance in Excel Tools Data Analysis Covariance Column 1 Column 2 Column 1 7.25 Column 2 4.875 6.25 © Copyright 2001, Alan Marshall 51 Interpreting the Result Column 1 Column 2 Column 1 7.25 Column 2 4.875 6.25 This gives us the variances (7.25 & 6.25) and the covariance between the variables, 4.875 In fact, variance is simply the covariance of a variable with itself! © Copyright 2001, Alan Marshall 52 Using Covariance Very useful in Finance for measuring portfolio risk Unfortunately, it is hard to interpret for two reasons: What does the magnitude/size imply? The units are confusing © Copyright 2001, Alan Marshall 53 A More Useful Statistic We can simultaneously adjust for both of these shortcomings by dividing the covariance by the two relevant standard deviations This operation Removes the impact of size & scale Eliminates the units © Copyright 2001, Alan Marshall 54 Correlation Correlation measures the sensitivity of one variable to another, but ignoring magnitude Range: -1 to 1 +1: Implies perfect positive co-movement -1: Implies perfect negative co-movement 0: No relationship © Copyright 2001, Alan Marshall 55 Calculating Correlation XY COV ( X, Y ) X Y rXY ˆ XY © Copyright 2001, Alan Marshall cov(X, Y) sXsY 56 Correlation in Excel Tools Data Analysis Correlation Column 1 Column 2 Column 1 1 Column 2 0.724212 1 © Copyright 2001, Alan Marshall 57 Interpreting the Result Column 1 Column 2 Column 1 1 Column 2 0.724212 1 The correlation of a variable with itself is 1 The correlation between CMP and the TSE Index in this example is 0.724 This is positive, and relatively strong © Copyright 2001, Alan Marshall 58 Estimating Linear Relationships © Copyright 2001, Alan Marshall 59 Estimating Linear Relationships Often the data imply that a linear relationship exists We can estimate this relationship using the Least Squares Method of Regression We will just learn to use the Excel output and interpret it © Copyright 2001, Alan Marshall 60 TSE-CMP Regression Output (Abridged) SUMMARY OUTPUT Regression Statistics Multiple R 0.724211819 R Square 0.524482759 Adjusted R Square 0.503808096 Standard Error 1.76102226 Observations 25 Intercept X Variable 1 © Copyright 2001, Alan Marshall Coefficients Standard Error t Stat P-value 0 0.352204452 0 1 0.672413793 0.133502753 5.036704 4.26E-05 61 Interpreting the Output SUMMARY OUTPUT rCMP = 0 + 0.6724(rTSE) + e Regression Statistics Multiple R 0.724211819 R Square 0.524482759 Adjusted R Square 0.503808096 Standard Error 1.76102226 Observations 25 Intercept X Variable 1 Correlation Coefficient Intercept Coefficients Standard Error t Stat P-value 0 0.352204452 0 1 0.672413793 0.133502753 5.036704 4.26E-05 Slope © Copyright 2001, Alan Marshall 62 Where We Are Going We will develop the use of the regression technique more fully Multiple Some explanatory variables Time-Series Applications © Copyright 2001, Alan Marshall 63