Download Lect 3 Relative values

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Statistical inference wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Transcript
Relative Values
Statistical Terms








Mean:  the average of the data
 sensitive to outlying data
Median:  the middle of the data
 not sensitive to outlying data
Mode:  most commonly occurring value
Range:  the difference between the largest observation and
the smallest
Interquartile range:  the spread of the data
 commonly used for skewed data
Standard deviation:  a single number which measures how much
the observations vary around the mean
Symmetrical data:  data that follows normal distribution
 (mean=median=mode)
 report mean & standard deviation & n
Skewed data:  not normally distributed
 (meanmedianmode)
 report median & IQ Range
Measures of Frequency of Events

Incidence
- The number of new events (e.g. death or a particular
disease) that occur during a specified period of time in
a population at risk for developing the events.

Incidence Rate
- A term related to incidence that reports the number of
new events that occur over the sum of time individuals
in the population were at risk for having the event (e.g.
events/person-years).

Prevalence
- The number of persons in the population affected by a
disease at a specific time divided by the number of
persons in the population at the time.
Measures of Association

Relative risk and cohort studies
- The relative risk (or risk ratio) is defined as the
ratio of the incidence of disease in the
exposed group divided by the corresponding
incidence of disease in the unexposed group.

Odds ratio and case-control studies
- The odds ratio is defined as the odds of
exposure in the group with disease divided by
the odds of exposure in the control group.
Measures of Association
Measures of Association




Absolute risk
- The relative risk and odds ratio provide a measure of risk
compared with a standard.
Attributable risk or Risk difference is a measure of absolute
risk. It represents the excess risk of disease in those exposed
taking into account the background rate of disease. The
attributable risk is defined as the difference between the
incidence rates in the exposed and non-exposed groups.
Population Attributable Risk is used to describe the excess
rate of disease in the total study population of exposed and
non-exposed individuals that is attributable to the exposure.
Number needed to treat (NNT)
- The number of patients who would need to be treated to
prevent one adverse outcome is often used to present the
results of randomized trials.
Relative Values
As a result of statistical research during
processing of the statistical data of
disease, mortality rate, lethality, etc.
absolute numbers are received, which
specify the number of the phenomena.
Though absolute numbers have a
certain cognitive values, but their use is
limited.
Relative Values
In order to acquire a level of the phenomenon,
for comparison of a parameter in dynamics or
with a parameter of other territory it is
necessary to calculate relative values
(parameters, factors) which represent result
of a ratio of statistical numbers between itself.
The basic arithmetic action at subtraction of
relative values is division.
In medical statistics themselves the
following kinds of relative parameters
are used:





Extensive;
Intensive;
Relative intensity;
Visualization;
Correlation.
The extensive parameter, or a
parameter of distribution,
characterizes a parts of the
phenomena (structure), that is it
shows, what part from the general
number of all diseases (died) is
made with this or that disease
which enters into total.
Using this parameter, it is possible to
determine the structure of patients
according to age, social status, etc. It is
accepted to express this parameter in
percentage, but it can be calculated and in
parts per thousand case when the part of
the given disease is small and at the
calculation in percentage it is expressed as
decimal fraction, instead of an integer.
The general formula of its calculation is the
following:
part × 100
total




The intensive parameter characterizes frequency
or distribution.
It shows how frequently the given phenomenon
occurs in the given environment.
For example, how frequently there is this or that
disease among the population or how frequently
people are dying from this or that disease.
To calculate the intensive parameter, it is
necessary to know the population or the
contingent.

General formula of the calculation is the
following:
phenomenon×100 (1000; 10 000; 100 000)
environment
General mortality rate
number of died during the year × 1000
number of the population


Parameters of relative intensity represent a
numerical ratio of two or several structures of
the same elements of a set, which is studied.
They allow determining a degree of conformity
(advantage or reduction) of similar attributes
and are used as auxiliary reception; in those
cases where it isn’t possible to receive direct
intensive parameters or if it is necessary to
measure a degree of a disproportion in
structure of two or several close processes.



The parameter of correlation characterizes the
relation between diverse values.
For example, the parameter of average bed
occupancy, nurses, etc.
The techniques of subtraction of the correlation
parameter is the same as for intensive
parameter, nevertheless the number of an
intensive parameter stands in the numerator, is
included into denominator, where as in a
parameter of visualization of numerator and
denominator different.


The parameter of visualization characterizes
the relation of any of comparable values to the
initial level accepted for 100. This parameter is
used for convenience of comparison, and also
in case shows a direction of process (increase,
reduction) not showing a level or the numbers
of the phenomenon.
It can be used for the characteristic of dynamics
of the phenomena, for comparison on separate
territories, in different groups of the population,
for the construction of graphic.
SIMULATION
Consider a box containing chips or cards,
each of which is numbered either 0 or 1.
We want to take a sample from this box in
order to estimate the percentage of the
cards that are numbered with a 1. The
population in this case is the box of cards,
which we will call the population box. The
percentage of cards in the box that are
numbered with a 1 is the parameter π.
SIMULATION
In the Harris study the parameter π is
unknown. Here, however, in order to see
how samples behave, we will make our
model with a known percentage of cards
numbered with a 1, say π = 60%. At the
same time we will estimate π, pretending
that we don’t know its value, by examining
25 cards in the box.
SIMULATION
We take a simple random sample with replacement
of 25 cards from the box as follows. Mix the box of
cards; choose one at random; record it; replace it;
and then repeat the procedure until we have
recorded the numbers on 25 cards. Although
survey samples are not generally drawn with
replacement, our simulation simplifies the analysis
because the box remains unchanged between
draws; so, after examining each card, the chance
of drawing a card numbered 1 on the following
draw is the same as it was for the previous draw, in
this case a 60% chance.
SIMULATION
Let’s say that after drawing the 25 cards this way,
we obtain the following results, recorded in 5
rows of 5 numbers:
SIMULATION
Based on this sample of 25 draws, we want to guess
the percentage of 1’s in the box. There are 14
cards numbered 1 in the sample. This gives us a
sample percentage of p=14/25=.56=56%. If this is
all of the information we have about the population
box, and we want to estimate the percentage of 1’s
in the box, our best guess would be 56%. Notice
that this sample value p = 56% is 4 percentage
points below the true population value π = 60%.
We say that the random sampling error (or simply
random error) is -4%.
ERROR ANALYSIS
An experiment is a procedure which
results in a measurement or
observation. The Harris poll is an
experiment which resulted in the
measurement (statistic) of 57%. An
experiment whose outcome depends
upon chance is called a random
experiment.
ERROR ANALYSIS
On repetition of such an experiment one
will typically obtain a different
measurement or observation. So, if the
Harris poll were to be repeated, the
new statistic would very likely differ
slightly from 57%. Each repetition is
called an execution or trial of the
experiment.
ERROR ANALYSIS
Suppose we made three more series of draws,
and the results were + 16%, + 0%, and +
12%. The random sampling errors of the four
simulations would then average out to:
ERROR ANALYSIS

Note that the cancellation of the positive and
negative random errors results in a small average.
Actually with more trials, the average of the
random sampling errors tends to zero.
ERROR ANALYSIS
So in order to measure a “typical size” of a random
sampling error, we have to ignore the signs. We
could just take the mean of the absolute values
(MA) of the random sampling errors. For the four
random sampling errors above, the MA turns out to
be
ERROR ANALYSIS
The MA is difficult to deal with theoretically because
the absolute value function is not differentiable at
0. So in statistics, and error analysis in general, the
root mean square (RMS) of the random sampling
errors is generally used. For the four random
sampling errors above, the RMS is
ERROR ANALYSIS
The RMS is a more conservative
measure of the typical size of the
random sampling errors in the
sense that MA ≤ RMS.
ERROR ANALYSIS
For a given experiment the RMS of all possible
random sampling errors is called the standard
error (SE). For example, whenever we use a
random sample of size n and its percentages p to
estimate the population percentage π, we have