Descriptive Characteristics of Statistical Sets Parameters – describe characteristic features of populations (exact, but we are not able to calculate them for endless number of individuals in the population – we can only estimate them by means of sample data) - represented by Greek letters ( , etc.) Statistics – describe characteristic features of samples (we calculate them from the sample data and they serve as an estimate of exact population parameters) - represented by Latin letters ( x, s etc.) Descriprive Characteristics A) Measures of Central Tendency - describe the middle of range of values in sample or population B) Measures of Dispersion and Variability - describe dispersion of values around the middle in sample or population A) Measures of Central Tendency (describe where a majority of measurements occurs) 1) The Arithmetic Mean: (Average – AVG) x (population) (sample) - „x bar“ n N x x i i x i 1 i 1 n N Example: A sample of 24 from a population of butterfly wing lengths: xi (cm): 3.3, 3.5, 3.6, 3.6, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 4.0, 4.0, 4.0, 4.1, 4.1, 4.1, 4.2, 4.2, 4.3, 4.3, 4.4, 4.4, 4.5. n = 24 x i 95.0 cm x x i n 95.0 cm 3.96 cm 24 The Arithmetic Mean – Properties: - is affected by extreme values it should be used in homogenous regular distributions (Gaussian) only (to describe the middle of the population correctly) - has the same units of measurement as do the individual observations - ( x i x ) 0 (the sum of all deviations from the mean will be always 0) ~ 2) The Median: (population), ~ x (sample) - „x wave“ = the middle value in an ordered set of data (there are just as many values bigger than the median as there are smaller) - if the sample size (n) is odd there is only 1 middle value in an ordered sample data and indicates the median (its rank: integer) - if n is even there are two middle values, and the median is a midpoint (mean) between them (its rank is a halfinteger). Rank of the median: n 1 2 Example: Body weights in two species of birds in captivity: Species A xi (g) 34 36 37 39 40 41 42 43 79 __________ n=9 median : x5 40 g x 43.4 g Species B xi (g) 34 36 37 39 40 41 42 43 44 ______45______ n = 10 median : x5.5 40 41 40.5 g 2 x 40.1 g The Median - Properties: - is not affected by extreme values - 50% quantile (divides distribution curve into 2 halves ) - it may be used in irregular (asymetric) distributions (is better characteristic of the middle of the set than the average) 50% 50% ~ 50% 50% ~ 3) The Mode: (population), x (sample) – „x hat“ = most frequently occuring measurement in a set of data (top of distribution curve) Properties: - Is not affected by extremes - is not very exact measure of the middle of set (not often used in biological and medical data) ̂ ˆ ~ B) Measures of Variability - spread (dispersion) of measurements around the center of the distribution 1) The Range: R= xmax – xmin - is dependent on 2 extreme values of data - relatively rough measure of variability – it does not take into account any measurements between the highest and lowest value. Variability expressed in terms of deviations from the mean: As the sum of all deviations from the mean ( xi x ) is always equal to 0 summation would be useless as a measure of variability. The method to eliminate the signs of the deviations from the mean: to square the deviations. Then we can define the sum of squares: population SS ( xi )2 sample SS ( xi x ) 2 2 2 s 2) The Variance: (population), (sample) = the mean sum of squares about a mean N Population variance 2 x i 1 n 2 i N s2 2 x x i i 1 n 1 „Estimated variance“ Variance has the square units as do the original measurements. 3) The Standard Deviation (SD): (population), s (sample) = square-root of the variance (it has the same units as the original measurements) 4) The Coefficient of Variability: (relative standard deviation) – a relative measure, not dependent on units of measurement V 100 % V s 100 x % „Estimated V“ Used for comparison of variability in data sets with different magnitude of their units (e.g.weight in mice and cows). 5) The Standard Error of the Mean (SEM, SE): = measure of the precision with which a sample mean true population mean x estimates the (True mean value of population will lie within the interval AVG SEM) s SEM n • If the sample size increases -> SEM decreases (precision with which we can estimate the true mean increases) • The more variability in the sample -> SEM increases (as the standard deviation increases) Example: Calculation of measures of dispersion for body weights in a sample of 7 from a population of broilers: x i xi (kg) xi x (kg) 1.2 1.4 1.6 1.8 2.0 2.2 2.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 x x 0.0 12.6 (kg) i n Variance: s2 x x i 1 Standard Deviation: Range : xi x 2 (kg2 ) 0.36 0.16 0.04 0.00 0.04 0.16 0.36 x x 2 i n=7 x 1.8 (kg) 1.12 (kg 2 ) „sum of squares“ 2 i n 1 1.12 0.1867 (kg 2 ) 6 s 0.1867 0.43 (kg) range x7 x1 1.2 (kg) Coefficient of Variability: s 0.43(kg) V 0.24 24% x 1.8(kg) Standard error of mean (SEM): s 0.43(kg) SEM 0.16 (kg) 2.646 n Conclusion: True mean value of body weights in the broiler population will lie within the interval: 1.8 0.16 kg (approximately).