Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Class 9 Where are we going from here? Introduction to Measures of Association and Multivariate Analysis The road map 1. Familiarize you with the terms and concepts of measuring association and testing causality. 2. You will carry out and interpret a basic Ordinary Least Squares regression equation. 3. You will cement what you learned in the first part of this class by completing your presentation and project. Going Back to Central Tendency and the distribution of data • Measures of Central Tendency – mean, median, and mode • • • • • Skew (lack of symmetry) Kurtosis (peakedness) Outliers Variability Normal distribution Normal Distribution • Normal distributions are symmetric with scores more concentrated in the middle than in the tails. They are defined by two parameters: the mean (μ) and the standard deviation (σ). Many kinds of behavioral data are approximated well by the normal distribution. Many statistical tests assume a normal distribution. Most of these tests work well even if the distribution is only approximately normal and in many cases as long as it does not deviate greatly from normality. • Source: http://davidmlane.com/hyperstat/normal_distribu tion.html Measures of Dispersion • Range - Sensitive to outliers – Largest observation minus the smallest • IQR - Controls for outliers. – Distance from the top threshold of the lower quartile to the bottom threshold of the upper quartile of a distribution. • Standard Deviation and Variance Source: Statistics Canada www.statcan.ca/english/edu/power/ch12/plots.htm Chart 5: Effective average tax rates on FDI between OECD countries, 1996 and 2001(a) Standard Deviation • A measure of the “typical” distance from the mean. • It is the square root of the variance (somewhat circularly, the variance is the standard deviation squared) Standard Deviation • Variance: S2 2 ( X X ) i N • So therefore standard deviation is: S (X i X) N 2 How to “follow” the formula 1. Subtract the mean from the value of each individual observation to get “deviations” from the mean. 2. Square each deviation. 3. Sum all the squared deviations. 4. Divide by the number of observations • • Samples have slightly different rules (N-1) This is the variance. 5. Take the square root and this is the standard deviation. How to interpret “S” • If means of two variables within a population are close but their standard deviations differ than the one with a larger standard deviation is more dispersed. • There are other deviation and dispersion measures but Standard Deviation has a special relationship to the normal curve. Standard Deviation and the Normal Distribution Normalizing Using St. Dev. • Z-scores (an Index using St. Dev) Xi X Z S • Z= number of standard deviation units from the mean = z-score Univariate, Bivariate and Multivariate Univariate – (Analysis of) one variable’s distribution. – Measures of central tendency, freq. dist., st.dev, etc. Bivariate and Multivariate – (Analysis of) two or more variables. • What is the relationship between variables? – – – – – Are they statistically different from one another? Do they change together? In the same direction? Does one proceed the other? Does one cause the other? • Two questions to ask about the relationship: 1. What is the chance that a particular finding happened by chance? - Statistical significance 2. How strong is the relationship between two (or more) variables? - Measures of association, correlation coefficient, regression coefficient etc. Measures of Association • Chi-square • T-test • Correlation Coefficient – Also known as, Pearson’s r, r, zero order correlation coefficient. – Varies from 1.0, indicating they move perfectly together in the same direction to -1.0 indicating they move perfectly in the opposite direction. – 0.00 indicates a null linear relationship. – For social sciences .4 to .6 is considered sufficient, though as low as .3 may be worth looking at. Statistical Significance • Research and null hypotheses – – Hypothesis states the relationship between two variables. The null hypothesis state that there is NO (or a random) relationship between two variables. • • – H: Democracies trade more with each other than with non-democracies. H0: Status as a democracy is not related to trade volume You are testing to reject H0 not accept H. Types of Error State of Nature Decision based on Sample Reject H0 Do not Reject H0 H0 true H0 Untrue Type 1 error (false alarm) Correct Correct Type 2 error Alpha level =.05, 5% chance of committing Type 1 error, or 95% chance of the decision to reject the null hypothesis being correct. Causality • In establishing causality there is a dependent variable, which you are trying to explain, and one or more independent variables that are assumed to be factors in the variation of the dependent variable. • You need a logical model to “explain” this relationship or causality Thinking in Models (again) • What is a model? – Explains which elements relate to each other and how. – Describing Relationships in a model • Covariation – move in the same direction – Direct or Positive – Inverse or Negative – Nonlinear • False of spurious – Control (confounding) variables • Are you looking for the best model or testing someone else’s? Developing models • Where does a model come from? – From your own assessment and observation of the problem, or from talking to others. – From the literature. • Elements others include or consider important • Definitions of these elements • Descriptions of the “expected” relationships among variables • Results and explanations • Sources and strategies for data • Suggestions of models or variations to be tested in the future Types of Models 1. Schematic Capital Econ Growth Labor 2. Symbolic a) Economic growth is a function of changes to the amount of capital (K) and changes to the amount of Labor (L). b) G=f(K,L) The basic linear model (equation) You can express many relationships as the linear equation: y = a + bx, where • • • • • y is the dependent variable x is the independent variable a is a constant b is the slope of the line For every increase of 1 in x, y changes by an amount equal to b A Perfectly linear relationship is where each change results in exactly the same change. i.e. a strict ad valorem tariff.