Download THEME: VARIATION ROWS. AVERAGES

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
THEME: VARIATION ROWS.
AVERAGES
THEME TOPIC:.: Averages and variable rows are very important in every day medical practice and in
the scientific activity. We use them for analyzing of medical establishments in qualitative way, for physical
development estimation (average height, average weight, ) for calculate social and demographical index.
Averages are using for finding central tendency of phenomenon, making conclusions about dispersion one
phenomenon over the space.
SEMINAR GOAL:
LEARNING : Students must be able to: find central tendency of the phenomenon, analyze event dispersion and
use averages in practical activity.
EDUCATIONAL: A generalizing average process is very important to the health care. Only general processes
examination allows make conclusions about population.
LEARNING OBJECTIVES
Student must know:
Student must be able to:
Practical usage of descriptive statistic;
Main rules of making variable rows;
Main average values, methods of their calculation;
Practical usage of averages;
Build and describe variable row
Calculate arithmetic mean.
Find standard deviation and coefficient of variation.
GENERAL INFORMATION
The information gathered in a study can often take different forms, such as frequency data (for example, the
number of votes cast for a candidate in elections) and scale data. These data are often initially arranged or
organized in such a way that they are difficult to read and interpret.
Descriptive statistics offers us some procedures that allow us to represent data in a readable and worthwhile form.
Some of these procedures allow us to obtain a graphical representation of the data, while others allow us to obtain a
set of parameters that summarize important properties of the basic data.
Every independent variable with the same qualitative characteristics and different or same relative frequency
could be organized (grouped) into the table or it is possible or by the building a variation row.
Simple variation row –when every variable relative frequency is one (1) and the total number of
observations is no more then 30.
Grouped variation row –when every variable relative frequency is more than one (1) and the total number of
observations is more then 30. Other worlds we can say that every variable has each own weight inside that
variation row.
Every variation row has:
Variable(x) --is a measurable characteristic of data taken in correct way.
Relative frequency (f )—shows how often is every variable in the one group.
Number of observations—n , mathematically it is n=∑f
There are such kinds of variation row:
Ranged –variables are grouped and systemized in order of increasing or decreasing of their numerical
value
Unranged-- variables are grouped and but not systemized in order of increasing or decreasing of their
numerical value
Interval —variable value represented by an interval
Uninterval-- variable value represented without an interval
Discrete – variable are taken by the counting and can by represented by the whole numbers
Increte (continues)-- variable are represented by the fractions numbers, and are results of measurements
Range or the row or Amplitude of the row-- Consider a set of observations relative to a quantitative variable
X. If we denote by Xmax the value of the highest observation in a set of observations and by Xmin the lowest value, then
the range is given by: range = Xmax - Xmin
When observations are grouped into classes, the range is equal to the difference between the center of the two
extreme classes. Let S1 be the center of the first class and Sk the center of the last class. The range is equal to:
range = Sk — S1
CENTRAL LIMITTHEOREM
1
The central limit theorem states that, under conditions of repeated sampling from a population, the sample
means of random measurements tend to possess an approximately normal distribution. This is true for
population distributions that are normal and decidedly not normal.THECENTRALLIMITTHEOREMISAFUNDAMENTAL
THEOREM OF STATISTICS---PRESCRIBES THAT THE SUM OF ASUFFICIENTLY LARGE NUMBEROF INDEPENDENT IDENTICALY DISTRIBUTED
RANDOM VAIABLES APPROXIMATELY FOLLOWS ANORMALDISTRIBUTION.
NORMALDISTRIBUTION
A simple way of organizing the data is to list all the possible values between the highest and the lowest in order
recording the frequency (f)with which each score occurs. This forms a frequency distribution. Normal
distribution plays a central role in the theory of probability and its statistical applications. Many measurements
such as the size or weight of individuals, IQ, etc. approximately follow a normal distribution. The normal
distribution is frequently used as an approximation , either when the normality is attributed to a distribution in
the construction of a model or when a known dist
NORMAL DISTRIBUTION CURVE-- A Brief History of the Normal Curve
The discovery of the normal curve, also known as the “bell-shape” curve or the Gaussian curve, can be dated to the 17th
century, when Galileo Galilei, an Italian physicist and astronomer, noted that the measurement errors in astronomical
observations were very systematic and that small errors were more likely to occur than large errors. In 1778, Pierre-Simon
Laplace, while working on his famous central limit theorem, noted that the sampling distribution of the sample mean
approximated a normal distribution and that the larger the sample size, the closer the distribution would be to a normal
distribution, no matter what the population distribution might be. Also in the 18th century, a French statistician, Abraham de
Moivre, who was often asked to do statistical consulting for gamblers, found that when the number of events (e.g., coin flips)
increased, the shape of the binomial distribution would approximate a symmetrical and smooth curve. However, the
mathematical formula for this curve was not discovered until the 19th century, by Adrian Marie Legendre in 1808 and Carl
Friedrich Gauss in 1809. The German 10 deutsche mark bill had Gauss’s picture on it, along with the well-known bell-shaped
normal curve and its formula
Important Properties of a Normal Curve (normal distribution curve)
This curve has the following characteristics, which are important to know..
The mode, the mean, and the median are all
at the same point on the abscissa, the horizontal axis
of the curve That is to say, mode = mean = median
for a normal distribution
The curve is symmetrical about the point on
the abscissas that denotes the mean, the mode, or
the median, with equal numbers of observations
above and below the point
95% of the distribution falls between
approximately ±2 standard deviation of the mean.
This leaves the remaining 5% split into two equal
parts at the two tails of the distribution (normal
distribution is symmetrical) therefore ,only 2,5% of the distribution falls more than 2 standard deviation above
the mean, and another 2,5% falls more than 2 standard deviations below the mean.
The standard deviation is particularly useful in normal distributions, because the pro portion of elements in the
normal distribution (i.e., the proportion of the area under the curve) is a constant for a given number of
standard deviations above or below the mean of the distribution, as
approximately 68% of the distribution falls within ± 1 standard deviation of the mean,
approximately 95% of the distribution falls within ±2 standard deviations of the mean
and approximately 99.7% of the distribution falls within ±3 standard deviations of the mean.
Because these proportions hold true for every normal distribution, they should be memorized.
Averages
Measure of central tendency
A measure of central tendency is a statistic that summarizes a set of data relative to a quantitative variable.
More precisely, it allows determining a fixed value, called a central value, around which the set of data has
tendency to group. We use averages to measure central tendency. We can say that an entire distribution can be
characterized by one typical measure that represents all the observations—measure of central tendency. The
principal measures of central tendencies and central distribution are:
Arithmetic mean—when there is arithmetical progression
Median
Mode
Geometric mean —when there is geometric series (progression)
__
Arithmetic mean (or mean µ or X)
2
—allows us to characterize the center of the frequency distribution of a quantitative variable by
considering all of the observations with the same weight afforded to each ( in contrast to the weighted
arithmetic mean). It is calculated by summing the observations and then dividing by the number of
observations.
_
∑x
X= -------------n
_
X-- Arithmetic mean, simple
n—number of observations
x—every observation
Arithmetic mean, simple— when relative frequency for every variable is no more then 1 by the other
words all observations have the same importance—no more then 1.
Weight Arithmetic mean—when relative frequency for every variable is more then 1, by the other
words all observations doesn’t have the same importance. We must assign a weight to each observation
depending on its importance relative to other observations. The weighted arithmetic mean equals the sum of
observations multiplied by their weights divided by the sum of their weights.
__
_
∑x∙f
X--Weight Arithmetic mean,
X= -------------n—Total number of observations
n
x—every observation
f—relative frequency (or relative
importance or weight of variable)
The main criteria of arithmetic mean are:
 Depends on the value of all observations.
 Is simple to interpret.
 Is the most familiar and the most used measure
 Is frequently used as an estimator of the mean of the population
 Has a value that can be falsified by the outliers.
 the sum of squared deviations of each observation xi of a set of data and a value α is minimal when a
equals the arithmetic mean , see formula below
Median (Md or Xe )
The median is a measure of central tendency defined as the value that is in the center of a set of ordered observations
when it is in increasing or decreasing order
We find then 50% of the observations on each side of the median
• Is easy to determine because only one data classification is needed.
• Is easy to understand but less used than the arithmetic mean.
Is not influenced by outliers, which gives it an advantage over the arith metic mean, if the series really have
outliers
Is used as an estimator of central values of a distribution, especially when it is asymmetric or has outliers. The
sum of square deviations in absolute value between each observation xi of a set of data and a value α is minimal
when α equals the median
Mode (X0 or Mo )
The mode is a measure of central tendency. The mode of a set of observations is the value of the observation that
have the highest frequency. According to this definition a distribution can have a unique mode (called the
unimodal distribution). In some situations a distribution may have many modes (called the bimodal, trimodal,
multimodal, etc. distribution).
Has practical interest because it is the most represented value of a set
3
Is in any event a rarely used measure
Has a value that is little influenced by outliers
Has a value that is strongly influenced by the fluctuations of a sampling. It can strongly vary from one sample to
another
In addition, there can be many (or no) modes in a data set
Geometric mean
The geometric mean is defined as the root of the product of n non-negative numbers.
or
We note that the logarithm of the geometric mean of a set of positive numbers is the arithmetic mean of the logarithms
of these numbers (or the weighted arithmetic mean in the case of grouped observations)
or
Dispersion
A measure of dispersion allows to describe a set of data concerning a particular variable, giving an
indication of the variability of the values inside the data set. The measure of dispersion completes the
description given by the measure of central tendency of a distribution.
If we observe different distributions, we can say that for some of them, all the data are grouped in a
more or less short distance from the central value; for others the distance is much greater.
There are such criteria of dispersion:
Standard deviation
Coefficient of variation
Standard Deviation ( δ or S )
The standard deviation is a measure of dispersion. It corresponds to the positive square root of the variance, where
the variance is the mean of the squared deviations of each observation with respect to the mean of the set of
observations
It is usually denoted by δ when it is relative to a population and by S when it is relative to a sample
In practice, the standard deviation δ of a population will be estimated by the standard deviation S of a sample of this
population
For the simple variation row
For the grouped variation row.
2
d f

n 1
d

n 1
2
_
d = x –Х – a difference of each variants from arithmetic mean or d= ( xi –x )
for n-1 is probably more appropriate for small samples, whereas the use of n is preferable for large samples.
if you replace d by the ( xi –x )
For the simple variation row

1 
1

x 2  ( x ) 2 


n 1 
n

For the grouped variation row.
x f nX
2

2
n 1
Coefficient of variation
The coefficient of variation is a measure of relative dispersion. It describes standard deviation as a percentage
of the arithmetic mean. This coefficient can be used to compare the dispersions of quantitative variables that are
not expressed in the same units ( for example , when comparing the salaries in different countries, given in
different currencies), or the dispersions of variables that have very different means.
The coefficient of variation (CV) is defined as the ratio of the standard deviation to the arithmetic mean
for a set of observations. This coefficient is independent of the unit of measurement used for the variable.
S
δ
CV= -------------- ×100%
or CV=---------- ×100%
X
µ
S—standard deviation for the sample or δ-- standard deviation for the population
X—is arithmetic mean of the sample or µ--is the mean of the population
4
Standard deviation represents …n.% of the arithmetic mean and show where the distribution is more
homogeneous.
CV<10%....the low level of dispersion
CV=10-20%.medium level of dispersion..
CV>20%....high level of dispersion
PRACTICAL SKILLS
THEORETICAL QUESTIONS
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Describe notion «variation rows», name its components.
Describe the types of variation rows, what is the difference between them.
Methods of calculating arithmetic mean and weight arithmetic mean.
Name main mean fetchers.
Standard deviation—notion, and practical use.
Standard deviation calculation—for mean and weight mean.
Coefficient of variation, notion and practical use.
What is the median and middle? How do we find them?
Dispersion—explain the term and how do we use it in practice.
How to find central variable in the interval row of variable.
11. .Normal distribution—explain it.
SITUATION MODEL TASKS
TASK № 1
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions.
Quantity of population in the district connected with clinic №1 : 1240, 1350,1210,1305, 1116.
TASK № 2
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
Results of weight measuring in 35 new-born boys (in kg): 4,0; 3,2; 3,7; 4,5; 4,4; 3,0; 4,3; 3,3; 3,2; 4,1;
3,8; 3,8; 4,2; 4,1; 4,0; 3,3; 3,1; 2,5; 2,8; 3,2; 4,2; 3,5; 2,9; 3,5; 3,2; 3,1; 5,0; 2,7; 3,1; 3,3; 3,2; 3,0; 3,0; 3,2;
3,8.
TASK № 3
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
The height of 7-yares school boys (cm) in school № K.
Growth
Amount of
boys
114-115,9 116-117,9 118-119,9 120-121,9 122-123,9 124-125,9 126-127,9 128-129,9 130-131,9
4
7
9
12
16
14
8
6
3
TASK № 4
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
Number of appeals during a day (for period there are 12 months of calendar year) after services of
medicare of city N quick and exigent. made: 165, 161, 167, 164, 163, 142, 143, 137, 156, 151, 147, 149.
TASK № 5
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
The girlies weight (kg) of the first form of school № 5 m. The girls age is seven years old.
Mass of body
20,0-21,9 22,0-23,9 24,0-25,9 26,0-27,9 28,0-29,9 30,0-31,9 32,0-33,9
Amount of girlies
6
8
12
16
9
5
4
TASK № 6
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
Amount of patients with the breaks of lower jaw after the medical treatment in the first-aid stomatology hospital
L.
Terms of medical treatment
38-40
41-43
44-46
47-49
50-52
53-55
56-58
(days)
Amount of patients
3
6
10
12
11
6
2
TASK №7
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
5
Number of children over one year, which are examining by 34 doctors-pediatrics: 58, 62, 60, 55, 62,
63, 65, 48, 49, 52, 54, 42, 51, 59, 57, 47, 48, 48, 40, 45, 51, 60, 39, 50, 58, 40, 51, 42, 54, 49, 47, 38, 45, 45.
TASK № 8
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
The period of medical treatment (in days) in the poulmonology department of hospital № 2 m. . patients
with pneumonia: 25, 11, 12, 13, 24, 23, 23, 24, 21, 22, 21, 14, 14, 22, 20, 20, 15, 15, 16, 20, 20, 16, 16, 20, 17,
17, 19, 19, 19, 18, 18, 18, 18, 19, 19, 17, 17, 18, 18, 19, 26.
TASK № 9
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
Number of patients
with burns after the
1-5
6-10
11-15 16-20 21-25 26-30 31-35 36-40 41-45
term of stay in burning
department,
Treatment interval (day)
Amount of patients
1297
718
884
658
297
200
118
54
40
TASK № 10
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions v
Fever period (period of height temperature) for pneumonia at 32 patients (number of days with height
temperature): 3, 8, 14, 14, 7, 6, 4, 12, 13, 3, 4, 5, 10, 11, 5, 10, 10, 11, 12, 8, 9, 7, 7, 8, 9, 9, 7, 8, 12, 6, 10, 9.
TASK № 11
Describe the row of variables; Calculate mode , median, mean ;Find standard deviation; Make conclusions
Number of patients with strokes (paralization of the whole body) after medical treatment period
in the neurological department in hospital L.
Terms of medical treatment
38-40
41-43
44-46
47-49
50-52
53-55
56-58
(days)
Amount of patients
5
8
12
18
15 1
7
3
RECOMMENDED LITERATURE
1.
“Social medicine and health care organization, recommendations for the practical lessons”—
Acad. Voronenko Y.V, Prof. Ruden V.V.
2004 Lviv national medical University of Danylo Galittsky
2. MEDICAL STATISTICS from A to Z ,A Guide for Clinicians and Medical Students
B.S. Everitt
2006 Institute of Psychiatry, King’s College, University of London
2006 Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, Sao Paulo
3. Medical Statistics from Scratch An Introduction for Health Professionals
Bowers (Honorary Lecturer, School of Medicine, University of Leeds, UK)
2008 John Wilkey & Sons, Ltd
Encyclopedia of measurement and statistics
Neil J. Salkind , K. RASMUSSEN (UNIVERSITY OF KANSAS)
2007, SAGE Publications, Inc
4. The Concise Encyclopedia of statistics
Y.Dodge ( Honorary professor university of Neuchatel Switzerland) –
2008 Springer Science +Business Media
6
Graphical structure of the theme of the practical lessons
TYPES OF
VARIARION
RAW
VARIATION RAW – it is a raw of variables (x), independent variable with the same qualitative characteristics and different or same relative
frequency are grouped into the raw
SIMPLE
When every variable relative frequency is one (1)
and the total number of observations is no more then 30.
GROUPED
When every variable relative frequency is more than one (1)
and the total number of observations is more then 30.
CHARACTERISTIKS
OF VARIATION RAW
VARIABLE
(х)
Separate element (the value) is a measurable characteristic of data taken in
correct way
FRIQUENCY
(f)
Shows how often is every variable in the one group
KINDS OF VARIATION
RAWS
NUMBER OF OBSERVATIONS
RANGED
(n)
The sum of observations
n=Σf
Variables are grouped and systemized in order of increasing or decreasing of their numerical
UNRANGED
Variables are grouped and but not systemized in order of increasing or decreasing of their
INTERVAL
Variable value represented by an interval
UNINTERVAL
DISCRETE
(DISCONTINUOUS)
INCRETE
(CONTINUES)
Variable value represented without an interval
Variable are taken by the counting and can by represented by the whole numbers
Variable are represented by the fractions numbers, and are results of measurements
GROUPED VARIATION RAW COMPILING STAGES
To estimate number of groups
To estimate interval
To find groups limits and the middle
To classify observations into groups
PRACTICAL USAGE OF VARIATION RAWS
For distribution characteristic
For finding averages (average values)
7
AVERAGES – We use averages to measure central tendency of the phenomenon. A measure of central tendency is a statistic that summarizes a set of
data relative to a quantitative variable. More precisely, it allows determining a fixed value, called a central value, around which the set
of data has tendency to group.
The measures of central tendencies and central distribution are (TYPES OF VARIABLES) :
MODE – Х0
The mode of a set of observations is
the value of the observation that
have the highest frequency
MEDIAN – Хе
defined as the value that is in the
center of a set of ordered observations
when it is in increasing or decreasing
order
MEAN ( ARITHMETIC MEAN) --Х
allows us to characterize the center of the
frequency distribution of a quantitative
variable
GEOMETRIC MEAN
ХGEOM.
The geometric mean is defined as the
root of the product of n non-negative
numbers
ARITHMETIC MEAN, SIMPLE when relative frequency for every variable is no more then 1 by the other words all observations have the
same importance—no more then 1
MEAN TYPES
WEIGHT ARITHMETIC MEAN when relative frequency for every variable is more then 1, by the other words all observations doesn’t have
the same importance We must assign a weight to each observation depending on its importance relative to
other observations. The weighted arithmetic mean equals the sum of observations multiplied by their weights
divided by the sum of their weights
THE MAIN CRITERIA OF
ARITHMETIC MEAN
PRACTICAL USE OF
ARITHMETIC MEAN
Average pulse rate

Х 
Depends on the value of all
observations. Is simple to
interpret Is the most familiar and
the most used measure
Is frequently used as an
estimator of the mean of the
population. Has a value that
can be falsified by the
outliers
For finding general characteristic
of the phenomenon
Average blood pressure
Average newborns weight

Х 

х
n

хf
n
the sum of squared deviations of each
observation xi of a set of data and a value α is
minimal when a equals the arithmetic mean ,
see formula below
For comparing phenomenon characteristic with the average—to find
a deviation ( compare the weight with the age standards)
Average bed occupancy
Average time of treatment
8
Dispersion Criteria –A measure of dispersion allows to describe a set of data concerning a particular variable, giving an indication of the variability of the
values inside the data set. The measure of dispersion completes the description given by the measure of central tendency of a distribution.
If we observe different distributions, we can say that for some of them, all the data are grouped in a more or less short distance from the central value; for others
the distance is much greater.
σ
STANDARD
d
2
n 1
- for simple variation raw
n-1
f
DEVIATION
σ
(σ or S)
d
2
f
n 1
- for grouped variation raw
- number of observations in population
(if n>30 we can replace (n-1) with
n )
- relative frequency of variables;
d = x –Х – a difference of each variants from middle
arithmetic mean
x
- variable
σ - corresponds to the positive square root of the variance, where the variance is the mean of the squared deviations of each observation
with respect to the mean of the set of observations, the a characteristic of dispersion —
the bigger dispersion the bigger is standard deviation “σ” .
COEFFICIENT OF VARIATION (СV or C) –
is a measure of relative dispersion. It describes standard deviation as a percentage of the arithmetic mean. This coefficient
can be used to compare the dispersions of quantitative variables that are not expressed in the same units ( for example ,
when comparing the salaries in different countries, given in different currencies), or the dispersions of variablesthat have
very different means. is defined as the ratio of the standard deviation to the arithmetic mean for a set of observations. This
coefficient is independent of the unit of measurement used for the variable
СV<10% low level of dispersion
CV=10-20% -
CV >20% -
medium level of dispersion
high level of dispersion
CV 
σ
100 %
X
9