* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Central tendency, dispersion
Survey
Document related concepts
Transcript
Tools for Science Observation Measurement Hypothesis generation Hypothesis testing Must know “what” something is before you can ask questions about “how” or “why” something happens Descriptive Statistics Measures of Central Tendency Describing the middle of a distribution Arithmetic mean X= sample mean = ‘X bar’ Xi n sum of values of all observations total number of observations Sample mean provides an estimate of the population mean, . Excel demo Median = the middle measurement in a ranked list of values, half the values are below the median and half are above. (same as mean for symmetric distribution) Mode = the most frequent value (most fashionable) unimodal median mean mode frequency of occurrence Symmetric frequency of occurrence How measures of central tendency relate to each other positive mode median mean IQ of college professors mode mode mean median IQ of college professors frequency of occurrence Skewed frequency of occurrence IQ of college professors bimodal negative mode mean median IQ of college professors Geometric mean Geometric mean used when quantities multiply together to produce a product Ex. An investment earns 10% yr 1, 50% yr 2, and 30% yr 3. What is its average rate of return? Not the arithmetic mean, because in year 1 your investment was multiplied (not added to) by 1.10, etc…… Ask "by what constant factor would your investment need to be multiplied by each year in order to achieve the same effect as multiplying by 1.10 one year, 1.50 the next, and 1.30 the third?" The answer is the geometric mean . In this case geometric mean=1.283, with average rate of return ~ 28% (not 30% which is what the arithmetic mean of 10%, 50%, and 30% would give you) See formula pg 28 Zar Adapted from: http://www.math.toronto.edu/mathnet/ Geometric mean & log transformed variables Geometric mean = back-transformed mean of a logarithmically transformed variable more on transformation later -Log transform a variable and then take the mean (x ----t) -Take antilog of (xt) -This mean will not = the simple mean of original variables GMx = antilog 1 Log Y n Harmonic mean = reciprocal of the simple mean of reciprocals Measures of Dispersion Range = difference between minimum and maximum value, crude measure of dispersion 40 same mean and range, but have different patterns of dispersion (variability) 30 20 10 0 A B 40 30 20 10 0 A B Deviation from the mean = difference between each point and the mean (sums to zero) Sum of squares = square the deviation of each data point from the mean added together S of S = (XI – X)2 for the above example: (A) 225+1+0+0+0+1+225= 452 (B) 225+100+25+0+25+100+225= 700 Sample SS Variance of a sample = mean S of S of the sample s2 = (Xi – X)2 n-1 gives unbiased estimate = degrees of freedom -If all observations equal, then s2 = 0 -Can never be negative Excel demo Standard Deviation =positive square root of variance of a sample s= s2 -same units as the original data - it does not vary with sample size Excel demo Coefficient of variation = standard deviation relative to the mean s CV= X Population parameters Greek letters Statistic (sample estimate) Latin letters Unbiased: not over or under estimate true parameter value Efficient: close to the true parameter value Consistent: estimate becomes better as sample (N) increases Coding data Adding, subtracting, multiplying or dividing ALL numbers by a constant value to simplify computations Addition & subtraction affect mean, but not ss, variance, or SD Multiplication & division affect mean, and also ss, variance, or SD