Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Descriptive Characteristics of Statistical Sets Parameters – describe characteristic features of populations (exact, but we are not able to calculate them for endless number of individuals in the population – we can only estimate them by means of sample data) - represented by Greek letters (  ,  etc.) Statistics – describe characteristic features of samples (we calculate them from the sample data and they serve as an estimate of exact population parameters) - represented by Latin letters ( x, s etc.) Descriprive Characteristics A) Measures of Central Tendency - describe the middle of range of values in sample or population B) Measures of Dispersion and Variability - describe dispersion of values around the middle in sample or population A) Measures of Central Tendency (describe where a majority of measurements occurs) 1) The Arithmetic Mean: (Average – AVG)  x (population) (sample) - „x bar“ n N x x i i  x i 1 i 1 n N Example: A sample of 24 from a population of butterfly wing lengths: xi (cm): 3.3, 3.5, 3.6, 3.6, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 4.0, 4.0, 4.0, 4.1, 4.1, 4.1, 4.2, 4.2, 4.3, 4.3, 4.4, 4.4, 4.5. n = 24 x i  95.0 cm x x i n  95.0 cm  3.96 cm 24 The Arithmetic Mean – Properties: - is affected by extreme values  it should be used in homogenous regular distributions (Gaussian) only (to describe the middle of the population correctly)  - has the same units of measurement as do the individual observations -  ( x i  x )  0 (the sum of all deviations from the mean will be always 0) ~ 2) The Median:  (population), ~ x (sample) - „x wave“ = the middle value in an ordered set of data (there are just as many values bigger than the median as there are smaller) - if the sample size (n) is odd  there is only 1 middle value in an ordered sample data and indicates the median (its rank: integer) - if n is even  there are two middle values, and the median is a midpoint (mean) between them (its rank is a halfinteger). Rank of the median: n 1 2 Example: Body weights in two species of birds in captivity: Species A xi (g) 34 36 37 39 40 41 42 43 79 __________ n=9 median : x5  40 g x  43.4 g Species B xi (g) 34 36 37 39 40 41 42 43 44 ______45______ n = 10 median : x5.5  40  41  40.5 g 2 x  40.1 g The Median - Properties: - is not affected by extreme values - 50% quantile (divides distribution curve into 2 halves ) - it may be used in irregular (asymetric) distributions (is better characteristic of the middle of the set than the average) 50% 50% ~ 50% 50% ~   3) The Mode:   (population), x  (sample) – „x hat“ = most frequently occuring measurement in a set of data (top of distribution curve) Properties: - Is not affected by extremes - is not very exact measure of the middle of set (not often used in biological and medical data) ̂ ˆ  ~   B) Measures of Variability - spread (dispersion) of measurements around the center of the distribution 1) The Range: R= xmax – xmin - is dependent on 2 extreme values of data - relatively rough measure of variability – it does not take into account any measurements between the highest and lowest value. Variability expressed in terms of deviations from the mean: As the sum of all deviations from the mean  ( xi  x ) is always equal to 0  summation would be useless as a measure of variability. The method to eliminate the signs of the deviations from the mean: to square the deviations. Then we can define the sum of squares: population SS   ( xi   )2 sample SS   ( xi  x ) 2 2 2  s 2) The Variance: (population), (sample) = the mean sum of squares about a mean N Population variance   2  x    i 1 n 2 i N s2  2   x  x  i i 1 n 1 „Estimated variance“ Variance has the square units as do the original measurements. 3) The Standard Deviation (SD):  (population), s (sample) = square-root of the variance (it has the same units as the original measurements) 4) The Coefficient of Variability: (relative standard deviation) – a relative measure, not dependent on units of measurement V   100 %  V s  100 x % „Estimated V“ Used for comparison of variability in data sets with different magnitude of their units (e.g.weight in mice and cows). 5) The Standard Error of the Mean (SEM, SE): = measure of the precision with which a sample mean true population mean  x estimates the (True mean value of population will lie within the interval AVG  SEM) s SEM  n • If the sample size increases -> SEM decreases (precision with which we can estimate the true mean increases) • The more variability in the sample -> SEM increases (as the standard deviation increases) Example: Calculation of measures of dispersion for body weights in a sample of 7 from a population of broilers: x i xi (kg) xi  x (kg) 1.2 1.4 1.6 1.8 2.0 2.2 2.4 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6  x  x   0.0  12.6 (kg) i n Variance: s2   x  x  i 1 Standard Deviation: Range : xi  x 2 (kg2 ) 0.36 0.16 0.04 0.00 0.04 0.16 0.36  x  x  2 i n=7 x  1.8 (kg)  1.12 (kg 2 ) „sum of squares“ 2 i n 1  1.12  0.1867 (kg 2 ) 6 s  0.1867  0.43 (kg) range  x7  x1  1.2 (kg) Coefficient of Variability: s 0.43(kg) V   0.24  24% x 1.8(kg) Standard error of mean (SEM): s 0.43(kg) SEM    0.16 (kg) 2.646 n Conclusion: True mean value of body weights in the broiler population will lie within the interval: 1.8  0.16 kg (approximately).