Download Prezentace aplikace PowerPoint

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Descriptive
Characteristics of
Statistical Sets
Parameters – describe characteristic features of
populations
(exact, but we are not able to calculate them for endless
number of individuals in the population – we can only estimate
them by means of sample data)
- represented by Greek letters (  ,  etc.)
Statistics – describe characteristic features of
samples
(we calculate them from the sample data and they serve as an
estimate of exact population parameters)
- represented by Latin letters ( x, s etc.)
Descriprive Characteristics
A) Measures of Central Tendency
- describe the middle of range of values in sample or
population
B) Measures of Dispersion and Variability
- describe dispersion of values around the middle in
sample or population
A) Measures of Central Tendency
(describe where a majority of measurements occurs)
1) The Arithmetic Mean:
(Average – AVG)

x
(population)
(sample) - „x bar“
n
N
x
x
i
i

x
i 1
i 1
n
N
Example: A sample of 24 from a population of butterfly wing lengths:
xi (cm): 3.3, 3.5, 3.6, 3.6, 3.7, 3.8, 3.8, 3.8, 3.9, 3.9, 3.9, 4.0, 4.0, 4.0, 4.1, 4.1, 4.1, 4.2,
4.2, 4.3, 4.3, 4.4, 4.4, 4.5.
n = 24
x
i
 95.0 cm
x
x
i
n

95.0 cm
 3.96 cm
24
The Arithmetic Mean – Properties:
- is affected by extreme values  it should be used in homogenous
regular distributions (Gaussian) only
(to describe the middle of the population correctly)

- has the same units of measurement as do the individual observations
-  ( x i  x )  0 (the sum of all deviations from the mean will be always 0)
~
2) The Median:  (population),
~
x
(sample) - „x wave“
= the middle value in an ordered set of data
(there are just as many values bigger than the median as there are
smaller)
- if the sample size (n) is odd  there is only 1 middle value in an
ordered sample data and indicates the median (its rank: integer)
- if n is even  there are two middle values, and the median is a
midpoint (mean) between them (its rank is a halfinteger).
Rank of the median:
n 1
2
Example: Body weights in two species of birds in captivity:
Species A
xi (g)
34
36
37
39
40
41
42
43
79
__________
n=9
median : x5  40 g
x  43.4 g
Species B
xi (g)
34
36
37
39
40
41
42
43
44
______45______
n = 10
median : x5.5 
40  41
 40.5 g
2
x  40.1 g
The Median - Properties:
- is not affected by extreme values
- 50% quantile (divides distribution curve into 2 halves )
- it may be used in irregular (asymetric) distributions (is better
characteristic of the middle of the set than the average)
50%
50%
~
50%
50%
~  
3) The Mode:

 (population), x

(sample) – „x hat“
= most frequently occuring measurement in a set of data
(top of distribution curve)
Properties:
- Is not affected by extremes
- is not very exact measure of the middle of set (not often
used in biological and medical data)
̂
ˆ  ~  
B) Measures of Variability
- spread (dispersion) of measurements around the center of
the distribution
1) The Range:
R= xmax – xmin
- is dependent on 2 extreme values of data
- relatively rough measure of variability – it does not take into
account any measurements between the highest and lowest value.
Variability expressed in terms of deviations from the mean:
As the sum of all deviations from the mean  ( xi  x ) is always
equal to 0  summation would be useless as a measure of
variability.
The method to eliminate the signs of the deviations from the
mean: to square the deviations.
Then we can define the sum of squares:
population SS   ( xi   )2
sample SS   ( xi  x ) 2
2
2

s
2) The Variance:
(population),
(sample)
= the mean sum of squares about a mean
N
Population
variance
 
2
 x   
i 1
n
2
i
N
s2 
2


x

x
 i
i 1
n 1
„Estimated
variance“
Variance has the square units as do the original measurements.
3) The Standard Deviation (SD):

(population),
s (sample)
= square-root of the variance
(it has the same units as the original measurements)
4) The Coefficient of Variability:
(relative standard deviation)
– a relative measure, not dependent on units of measurement
V

 100 %

V
s
 100
x
% „Estimated V“
Used for comparison of variability in data sets with different
magnitude of their units (e.g.weight in mice and cows).
5) The Standard Error of the Mean (SEM, SE):
= measure of the precision with which a sample mean
true population mean

x
estimates the
(True mean value of population will lie within the interval AVG  SEM)
s
SEM 
n
• If the sample size increases -> SEM decreases (precision with which
we can estimate the true mean increases)
• The more variability in the sample -> SEM increases (as the standard
deviation increases)
Example: Calculation of measures of dispersion for body weights in a
sample of 7 from a population of broilers:
x
i
xi (kg)
xi  x (kg)
1.2
1.4
1.6
1.8
2.0
2.2
2.4
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
 x  x   0.0
 12.6 (kg)
i
n
Variance:
s2 
 x  x 
i 1
Standard Deviation:
Range :
xi  x 2 (kg2 )
0.36
0.16
0.04
0.00
0.04
0.16
0.36
 x  x 
2
i
n=7
x  1.8 (kg)
 1.12 (kg 2 ) „sum of squares“
2
i
n 1

1.12
 0.1867 (kg 2 )
6
s  0.1867  0.43 (kg)
range  x7  x1  1.2 (kg)
Coefficient of Variability:
s 0.43(kg)
V 
 0.24  24%
x 1.8(kg)
Standard error of mean (SEM):
s
0.43(kg)
SEM 

 0.16 (kg)
2.646
n
Conclusion:
True mean value of body weights in the broiler population will lie within
the interval: 1.8  0.16 kg (approximately).
Related documents