Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

no text concepts found

Transcript

Descriptive Characteristics of Statistical Sets Parameters – describe characteristic features of populations (exact, but we are not able to calculate them for endless number of individuals in the population – we can only estimate them by means of sample data) - represented by Greek letters ( , etc.) Statistics – describe characteristic features of samples (we calculate them from the sample data and they serve as an estimate of exact population parameters) - represented by Latin letters ( x, s etc.) Descriprive Characteristics A) Measures of Central Tendency - describe the middle of range of values in sample or population B) Measures of Dispersion and Variability - describe dispersion of values around the middle in sample or population A) Measures of Central Tendency (describe where a majority of measurements occurs) 1) The Arithmetic Mean: (Average – AVG) x (population) (sample) - „x bar“ n N x x i i x i 1 i 1 n N Example: A sample of 24 from a population of butterfly wing lengths: xi (cm): 3.3, 3.5, 3.6, 3.6, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 4.0, 4.0, 4.0, 4.1, 4.1, 4.1, 4.2, 4.2, 4.3, 4.3, 4.4, 4.4, 4.5. n = 24 x i 95.0 cm x x i n 95.0 cm 3.96 cm 24 The Arithmetic Mean – Properties: - is affected by extreme values it should be used in homogenous regular distributions (Gaussian) only (to describe the middle of the population correctly) - has the same units of measurement as do the individual observations - ( x i x ) 0 (the sum of all deviations from the mean will be always 0) ~ 2) The Median: (population), ~ x (sample) - „x wave“ = the middle value in an ordered set of data (there are just as many values bigger than the median as there are smaller) - if the sample size (n) is odd there is only 1 middle value in an ordered sample data and indicates the median (its rank: integer) - if n is even there are two middle values, and the median is a midpoint (mean) between them (its rank is a halfinteger). Rank of the median: n 1 2 Example: Body weights in two species of birds in captivity: Species A xi (g) 34 36 37 39 40 41 42 43 79 __________ n=9 median : x5 40 g x 43.4 g Species B xi (g) 34 36 37 39 40 41 42 43 44 ______45______ n = 10 median : x5.5 40 41 40.5 g 2 x 40.1 g The Median - Properties: - is not affected by extreme values - 50% quantile (divides distribution curve into 2 halves ) - it may be used in irregular (asymetric) distributions (is better characteristic of the middle of the set than the average) 50% 50% ~ 50% 50% ~ 3) The Mode: (population), x (sample) – „x hat“ = most frequently occuring measurement in a set of data (top of distribution curve) Properties: - Is not affected by extremes - is not very exact measure of the middle of set (not often used in biological and medical data) ̂ ˆ ~ B) Measures of Variability - spread (dispersion) of measurements around the center of the distribution 1) The Range: R= xmax – xmin - is dependent on 2 extreme values of data - relatively rough measure of variability – it does not take into account any measurements between the highest and lowest value. Variability expressed in terms of deviations from the mean: As the sum of all deviations from the mean ( xi x ) is always equal to 0 summation would be useless as a measure of variability. The method to eliminate the signs of the deviations from the mean: to square the deviations. Then we can define the sum of squares: population SS ( xi )2 sample SS ( xi x ) 2 2 2 s 2) The Variance: (population), (sample) = the mean sum of squares about a mean N Population variance 2 x i 1 n 2 i N s2 2 x x i i 1 n 1 „Estimated variance“ Variance has the square units as do the original measurements. 3) The Standard Deviation (SD): (population), s (sample) = square-root of the variance (it has the same units as the original measurements) 4) The Coefficient of Variability: (relative standard deviation) – a relative measure, not dependent on units of measurement V 100 % V s 100 x % „Estimated V“ Used for comparison of variability in data sets with different magnitude of their units (e.g.weight in mice and cows). 5) The Standard Error of the Mean (SEM, SE): = measure of the precision with which a sample mean true population mean x estimates the (True mean value of population will lie within the interval AVG SEM) s SEM n • If the sample size increases -> SEM decreases (precision with which we can estimate the true mean increases) • The more variability in the sample -> SEM increases (as the standard deviation increases) Example: Calculation of measures of dispersion for body weights in a sample of 7 from a population of broilers: x i xi (kg) xi x (kg) 1.2 1.4 1.6 1.8 2.0 2.2 2.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 x x 0.0 12.6 (kg) i n Variance: s2 x x i 1 Standard Deviation: Range : xi x 2 (kg2 ) 0.36 0.16 0.04 0.00 0.04 0.16 0.36 x x 2 i n=7 x 1.8 (kg) 1.12 (kg 2 ) „sum of squares“ 2 i n 1 1.12 0.1867 (kg 2 ) 6 s 0.1867 0.43 (kg) range x7 x1 1.2 (kg) Coefficient of Variability: s 0.43(kg) V 0.24 24% x 1.8(kg) Standard error of mean (SEM): s 0.43(kg) SEM 0.16 (kg) 2.646 n Conclusion: True mean value of body weights in the broiler population will lie within the interval: 1.8 0.16 kg (approximately).

Related documents