Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Week 1 Review of basic concepts in statistics handout available at http://homepages.gold.ac.uk/aphome Trevor Thompson 30-9-2007 1 Review of following topics: Population vs. sample Measurement scales Plotting data Mean & Standard deviation Degrees of freedom Transforming data Normal distribution - Howell (2002) Chap 1-3. ‘Statistical Methods for Psychology’ 2 Population vs. sample Population - an entire collection of measurements (e.g. reaction times, IQ scores, height or even height of male Goldsmiths students) Sample – smaller subset of observations taken from population sample should be drawn randomly to make inferences about population. Random assignment to groups improves validity 3 Population vs. sample In general: population parameters =Greek letters sample statistics=English letters Population mean variance μ (mu) σ2 (sigma) Sample X s2 -worth learning glossary of other symbols now to avoid later confusion (e.g. Σ=the sum of) 4 Measurement scales Categorical or ‘Nominal’ e.g. male/female, or catholic/protestant/other Continuous Ordinal - e.g. private/sergeant/admiral Interval- e.g. temperature in celsius Ratio - e.g. weight, height etc 5 Plotting data Basic rule is to select plot which represents what you want to say in the clearest and simplest way Avoid ‘chart junk’ (e.g. plotting in 3D where 2D would be clearer) Popular options include bar charts, histograms, pie charts etc - see any text book. SPSS charts discussed in workshop 6 Summary statistics Two essential components of data are: (i) central tendency of the data & (ii) spread of the data (e.g. standard deviation) Although mean (central tendency) and standard deviation (spread) are most commonly used, other measures can also be useful 7 Measures of central tendency Mode the most frequent observation: 1, 2, 2, 3, 4 ,5 Median the middle number of a dataset arranged in numerical order: 0, 1, 2, 5, 1000 (average of middle two numbers when even number of scores exist) relatively uninfluenced by outliers Mean = 8 Measures of dispersion Several ways to measure spread of data: Range (max-min), IQR or Inter-Quartile Range (middle 50%), Average Deviation, Mean Absolute Deviation Variance – average of the squared deviations Variance for population of 3 scores (-10,0,10) is 66.66 (200/3) Standard deviation is simply the square root of the variance 9 Calculating sample variance Population variance (2) is the true variance of the population calculated by -this equation is used when we have all values in a population (unusual) However, the variance of a sample (S2) tends to be smaller than the population from which it was drawn. So, we use this equation: The correction factor of ‘N-1’ increases the variance to be closer to the true population variance (in fact, the average of all possible sample variances exactly equals 2) 10 Degrees of freedom Why is ‘N-1’ used to calculate sample variance? When calculating sample variance, we calculate the sample mean thus making make the last number in the dataset redundant – i.e. we lose a ‘degree of freedom’ (last no. is not free to vary) e.g. M=10, sample data: 12, 9, 10, 11, 8 Calculating the sample mean (10) means that we have already (implicitly) included the last number in our calculations. If we (knew and) used the population mean rather than the sample mean this would not be the case so we could use N not N-1. Howell illustrates this with a worked example (and mathematical proof can be retrieved with internet search) Bottom line is whenever we have to estimate a statistic 11 (e.g. mean) we lose a degree of freedom Transforming data One reason we might ‘transform’ data is to convert from one scale to another e.g. feet into inches, centigrade into fahrenheit, raw IQ scores into standard IQ scores Scale conversion can usually be achieved by simple linear transformation (multiplying/dividing by a constant and adding/subtracting a constant) Xnew = b*Xold + c So to convert centigrade data into fahrenheit we would apply the following: 12 Transforming data Z-transform (standardisation) is one common type of linear transform, which produces a new variable with M=0 & SD=1 Z -scores= X Standardisation is useful when comparing the same dimension measured on different scales (e.g. anxiety scores measured on a VAS and questionnaire) After standardisation these scales could also be added together (adding two quantities on different scales is obviously problematic) 13 Normal Distribution Many real-life variables (height, weight, IQ etc etc) are distributed like this Mathematical equation mimics this normal (or Gaussian) distribution 14 Normal Distribution The mathematical normal distribution is useful as its known mathematical properties give us useful info about our real-life variable (assuming our real-life variable is normally distributed) For example, 2 standard deviations above the mean represent the extreme 2.5% of scores (calculus equations used to derive this) Consequently, a person with an IQ score of 130 (M=100, SD=15), would be in the top 2.5% (assuming IQ is normally distributed) 15 Normal Distribution Normality is important assumption (though more about this next week). Violations of normality generally take two forms: SKEWNESS KURTOSIS 16