• Study Resource
• Explore

Survey

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
```Descriptive
Characteristics of
Statistical Sets
Parameters – describe characteristic features of
populations
(exact, but we are not able to calculate them for endless
number of individuals in the population – we can only estimate
them by means of sample data)
- represented by Greek letters (  ,  etc.)
Statistics – describe characteristic features of
samples
(we calculate them from the sample data and they serve as an
estimate of exact population parameters)
- represented by Latin letters ( x, s etc.)
Descriprive Characteristics
A) Measures of Central Tendency
- describe the middle of range of values in sample or
population
B) Measures of Dispersion and Variability
- describe dispersion of values around the middle in
sample or population
A) Measures of Central Tendency
(describe where a majority of measurements occurs)
1) The Arithmetic Mean:
(Average – AVG)

x
(population)
(sample) - „x bar“
n
N
x
x
i
i

x
i 1
i 1
n
N
Example: A sample of 24 from a population of butterfly wing lengths:
xi (cm): 3.3, 3.5, 3.6, 3.6, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 4.0, 4.0, 4.0, 4.1, 4.1, 4.1, 4.2,
4.2, 4.3, 4.3, 4.4, 4.4, 4.5.
n = 24
x
i
 95.0 cm
x
x
i
n

95.0 cm
 3.96 cm
24
The Arithmetic Mean – Properties:
- is affected by extreme values  it should be used in homogenous
regular distributions (Gaussian) only
(to describe the middle of the population correctly)

- has the same units of measurement as do the individual observations
-  ( x i  x )  0 (the sum of all deviations from the mean will be always 0)
~
2) The Median:  (population),
~
x
(sample) - „x wave“
= the middle value in an ordered set of data
(there are just as many values bigger than the median as there are
smaller)
- if the sample size (n) is odd  there is only 1 middle value in an
ordered sample data and indicates the median (its rank: integer)
- if n is even  there are two middle values, and the median is a
midpoint (mean) between them (its rank is a halfinteger).
Rank of the median:
n 1
2
Example: Body weights in two species of birds in captivity:
Species A
xi (g)
34
36
37
39
40
41
42
43
79
__________
n=9
median : x5  40 g
x  43.4 g
Species B
xi (g)
34
36
37
39
40
41
42
43
44
______45______
n = 10
median : x5.5 
40  41
 40.5 g
2
x  40.1 g
The Median - Properties:
- is not affected by extreme values
- 50% quantile (divides distribution curve into 2 halves )
- it may be used in irregular (asymetric) distributions (is better
characteristic of the middle of the set than the average)
50%
50%
~
50%
50%
~  
3) The Mode:

 (population), x

(sample) – „x hat“
= most frequently occuring measurement in a set of data
(top of distribution curve)
Properties:
- Is not affected by extremes
- is not very exact measure of the middle of set (not often
used in biological and medical data)
̂
ˆ  ~  
B) Measures of Variability
- spread (dispersion) of measurements around the center of
the distribution
1) The Range:
R= xmax – xmin
- is dependent on 2 extreme values of data
- relatively rough measure of variability – it does not take into
account any measurements between the highest and lowest value.
Variability expressed in terms of deviations from the mean:
As the sum of all deviations from the mean  ( xi  x ) is always
equal to 0  summation would be useless as a measure of
variability.
The method to eliminate the signs of the deviations from the
mean: to square the deviations.
Then we can define the sum of squares:
population SS   ( xi   )2
sample SS   ( xi  x ) 2
2
2

s
2) The Variance:
(population),
(sample)
= the mean sum of squares about a mean
N
Population
variance
 
2
 x   
i 1
n
2
i
N
s2 
2


x

x
 i
i 1
n 1
„Estimated
variance“
Variance has the square units as do the original measurements.
3) The Standard Deviation (SD):

(population),
s (sample)
= square-root of the variance
(it has the same units as the original measurements)
4) The Coefficient of Variability:
(relative standard deviation)
– a relative measure, not dependent on units of measurement
V

 100 %

V
s
 100
x
% „Estimated V“
Used for comparison of variability in data sets with different
magnitude of their units (e.g.weight in mice and cows).
5) The Standard Error of the Mean (SEM, SE):
= measure of the precision with which a sample mean
true population mean

x
estimates the
(True mean value of population will lie within the interval AVG  SEM)
s
SEM 
n
• If the sample size increases -> SEM decreases (precision with which
we can estimate the true mean increases)
• The more variability in the sample -> SEM increases (as the standard
deviation increases)
Example: Calculation of measures of dispersion for body weights in a
sample of 7 from a population of broilers:
x
i
xi (kg)
xi  x (kg)
1.2
1.4
1.6
1.8
2.0
2.2
2.4
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
 x  x   0.0
 12.6 (kg)
i
n
Variance:
s2 
 x  x 
i 1
Standard Deviation:
Range :
xi  x 2 (kg2 )
0.36
0.16
0.04
0.00
0.04
0.16
0.36
 x  x 
2
i
n=7
x  1.8 (kg)
 1.12 (kg 2 ) „sum of squares“
2
i
n 1

1.12
 0.1867 (kg 2 )
6
s  0.1867  0.43 (kg)
range  x7  x1  1.2 (kg)
Coefficient of Variability:
s 0.43(kg)
V 
 0.24  24%
x 1.8(kg)
Standard error of mean (SEM):
s
0.43(kg)
SEM 

 0.16 (kg)
2.646
n
Conclusion:
True mean value of body weights in the broiler population will lie within
the interval: 1.8  0.16 kg (approximately).
```
Related documents