Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Quantitative Methods for Economics, Finance and Management (A86050 – F86050) Matteo Manera – [email protected] Marzio Galeotti – [email protected] 1 This material is taken and adapted from Guy Judge’s Introduction to Econometrics page at the University of Portsmouth Business School http://judgeg.myweb.port.ac.uk/ECONMET/ 2 Statistical Review Statistics are tools: Statistics + Economics = Econometrics Descriptive statistics: methods used to summarize or describe observations Measures of central tendency Measures of variability Measures of association Inferential Statistics Random Variables and Theoretical Distributions Estimation Hypothesis Testing 3 Measures of Central Tendency Mean, µ The average (balancing point in the distribution) Median, Md The value of the middle point of the ordered measurements Mode, Mo The most frequent value Measures of Variability Knowing the variability tells us how typical the mean is of the scores as a set If the variability is small then the mean is a good representation of the scores Measures of variability are statistics that convey information about the spread of variability of a set of data Measures of Variability Range High score minus low score Variance The average of the squared deviations of all the population measurements from the population mean Standard Deviation The square root of the variance Measures of Variability Range High score minus low score Variance The average of the squared deviations of all the population measurements from the population mean Standard Deviation The square root of the variance Skewness Skewed distributions are not symmetrical about their center. Rather, they are lop-sided with a longer tail on one side or the other. The skew is the side with the longer tail Left/Negative Skewed Symmetric Right/Positive Skewed Relationships Among Mean, Median and Mode Correlation Coefficient • A single number indicating the strength of association between 2 variables. – To what extent does the association resemble a straight line? • Pearson’s Product-Moment Coefficient r – Most commonly used measure to indicate the degree of linear relationship between two variables – Range +1 to –1 – Coefficients reveal both the magnitude and direction of the relationship Hours Studying Grades Grades Grades Patterns for Linear Correlation Hours Watching TV Hours Gardening Copyright © 2000 - 2009 by Michael J. Miller. All rights reserved. Problems with r • Does not, by itself, demonstrate causation!!! – Causation implies correlation, but correlation does not imply causation • Third (‘lurking’) variable problem – Often difficult to uncover • Directionality problem – Does Variable A cause a change in Variable B or does Variable B cause a change in Variable A? • Unstable with small sample sizes Copyright © 2000 - 2009 by Michael J. Miller. All rights reserved. Inferential Statistics • Random Variables and Theoretical Distributions • Estimation • Hypothesis Testing Copyright © 2000 - 2009 by Michael J. Miller. All rights reserved. what is a distribution?? • describes the ‘shape’ of a batch of numbers • the characteristics of a distribution can sometimes be defined using a small number of numeric descriptors called ‘parameters’ why?? • can serve as a basis for standardized comparison of empirical distributions • can help us estimate confidence intervals for inferential statistics • form a basis for more advanced statistical methods – ‘fit’ between observed distributions and certain theoretical distributions is an assumption of many statistical procedures Normal (Gaussian) distribution • “known” distributions (discrete vs continuous) • “known” distributions (Student T; Chisquare; Fisher’s F) • Normal: continuous distribution • Normal: tails stretch infinitely in both directions µ 180 168 156 144 132 120 σ 108 σ 96 84 72 60 48 36 24 12 0 1 2 3 4 5 6 7 8 9 10 11 12 • symmetric around the mean (µ) • maximum height at µ • standard deviation (σ) is at the point of inflection 13 • a single normal curve exists for any combination of µ, σ – these are the parameters of the distribution and define it completely • a family of bell-shaped curves can be defined for the same combination of µ, σ, but only one is the normal curve • lots of natural phenomena in the real world approximate normal distributions—near enough that we can make use of it as a model • e.g. height • phenomena that emerge from a large number of uncorrelated, random events will usually approximate a normal distribution • standard probability intervals (proportions under the curve) are defined by multiples of the standard deviation around the mean • true of all normal curves, no matter what µ or σ happens to be • P(µ-σ <= µ <= µ+σ) = .683 • µ+/-1σ = .683 • µ+/-2σ = .955 • µ+/-3σ = .997 µ 180 168 156 144 132 120 σ 108 • 50% = µ+/-0.67σ • 95% = µ+/-1.96σ • 99% = µ+/-2.58σ 96 84 72 60 48 36 24 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 • the logic works backwards • if µ+/-σ < > .68, the distribution is not normal z-scores (Standardization) • standardizing values by re-expressing them in units of the standard deviation • measured away from the mean (where the mean is adjusted to equal 0) xi − x Zi = s • z-scores = “standard normal deviates” • converting number sets from a normal distribution to z-scores: presents data in a standard form that can be easily compared to other distributions mean = 0 standard deviation = 1 • z-scores often summarized in table form as a CDF (cumulative density function) • can use in various ways, including determining how different proportions of a batch are distributed “under the curve” Neanderthal stature • population of Neanderthal skeletons • stature estimates appear to follow an approximately normal distribution… – mean = 163.7 cm – sd = 5.79 cm Quest. 1: what proportion of the population is >165 cm? • z-score = ? • z-score = (165-163.7)/5.79 = .23 (+) mean = 163.7 cm sd = 5.79 cm .48803 .48405 .48006 .47608 Quest. 1: what proportion of the population is >165 cm? • z-score = .23 (+) • using Table C-2 – cdf(.23) = .40905 – 40.9% Quest. 2: 98% of the population fall below what height? • Cdf(x)=.98 • can use either table – Table C-1; look for .98 – Table C-2; look for .02 .48803 .48405 .48006 .47608 Quest. 2: 98% of the population fall below what height? • Cdf(x)=.98 • can use either table – Table C-1; look for .98 – Table C-2; look for .02 – both give you a value of 2.05 for z • solve z-score formula for x: xi • x = 2.05*5.79+163.7 = 175.6cm = Z iσ + x “sample distribution of the mean” • we don’t know the shape of the distribution an underlying population • it may not be normal • we can still make use of some properties of the normal distribution • envision the distribution of means associated with a large number of samples… central limits theorem • distribution of means derived from sets of random samples taken from any population will tend toward normality • conformity to a normal distribution increases with the size of samples • these means will be distributed around the mean of the population Xx = µ • we usually have one of these samples… • we can’t know where it falls relative to the population mean, but we can estimate odds about how far it is likely to be… • this depends on – sample size – an estimate of the population variance • the smaller the sample and the more dispersed the population, the more likely that our sample is far from the population mean • this is reflected in the equation used to calculate the variance of sample means: s = 2 x σ 2 n • the standard deviation of sample means is the standard error of the estimate of the mean: se = σ 1 σ =σ = n n n 2 • you can use the standard error to calculate a range that contains the population mean, at a particular probability, and based on a specific sample: x ± Zα s n (where Z might be 1.96 for .95 probability, for example) example • 50 arrow points – mean length = 22.6 mm – sd = 4.2 mm • • • • σ 4.2 s = = = .594 n 50 standard error = ?? 22.6 +/- 1.96*.594 22.6 +/- 1.16 95% probability that the population mean is within the range 21.4 to 23.8 INEMET [U13783]