Download Bio Statistics

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Gibbs sampling wikipedia , lookup

Transcript
PRESENTATION OF DATA
• TEXT FORM
• TABULATION
• DRAWINGS
TABULAR PRESENTATION
• A table is a systematic arrangement of
data into vertical columns and horizontal
rows.
• The process of arranging data into rows
and columns is called tabulation.
TABULATION
• Simple table
• Complex table
– Principles
•
•
•
•
Table should be numbered
Each table has a Title---brief & self explanatory
Headings of column & rows should be clear
Data must be presented a/c to size, importance,
chronologically, alphabetically or geographically.
• No too large table
• Foot note may be given.
STATISTICAL TABLE
• THE TITLE
• THE STUB
• THE BOX-HEAD
• THE BODY
SIMPLE TABLE
• Table 1
• Population of Pakistan
year
Population (millions)
1901
16.6
1911
19.4
1921
21.1
1931
23.6
1941
28.3
COMPOUND TABLE
• Table III
• Sex wise fatality rate of untreated patients
Attribute
Men
Women
Total
Attacked
40
30
70
Deaths
12
8
20
%age died
30
26.7
28.6
COMPOUND TABLE
Table II
Colour choices of medical students of shirts
Sex
White
Blue
Yellow
Green
Pink
Total
Male
60
125
20
10
75
290
Female
55
45
0
25
5
130
Total
115
170
20
35
80
420
Compound table
Table II
Colour choices of medical students about shirt
Sex
White
Blue
Green
Pink
Yellow
Total
Boys
10
60
55
45
22
192
Girls
55
45
25
5
0
130
Total
65
105
80
50
22
322
TABLE I
Population of Punjab & Baluchistan
(thousands)
Punjab
Baluchistan
census
Male
Female
Total
Male
Female
Total
1961
13643
11938
25581
640
521
1161
1971
19942
17566
37508
1272
1133
2405
DATA
• Arrangement of data is based on
• Classification
• Purpose of table
•
•
•
•
•
•
Alphabetically
Geographically
According to magnitude
Historically
Customary classes
Progressive arrangement.
FREQUENCY DISTRIBUTION
• Is a tabular arrangement of data in
which various items are arranged into
classes and the no. of items falling in
each class
(class frequency) is
stated.
• Grouped data
• Class limits
• Class interval
FREQUENCY DISTRIBUTION
• Data is split into groups-called--class intervals
• No. of items (frequency) is written in
adjacent column
FREQUENCY DISTRIBUTION
TABLE
• TABLE II
• Age distribution of patients on Monday
Age
Number of patients
0-4
23
5-9
21
10-14
43
15-19
10
20-24
6
FREQUENCY DISTRIBUTION TABLE
• TABLE II
• Weight of medical students of SZMC
Weight (Kg)
Number of students
35-39
42
40-44
35
45-49
83
50-54
70
55-59
36
60 and above
28
DESCRITIVE STATISTICS
• Descriptive statistics comprises those
methods concerned with collecting
and describing a set of data so as to
yield meaningful information.
STATISTICAL INFERENCE
• Statistical inference comprises those
methods concerning with the analysis
of a subset of data leading to
predictions or inferences about the
entire set of data.
ANALYSIS OF DATA
• When characteristic and frequency are both
variable
• Calculations are:
• Averages
• Percentiles
• Standard deviation,
• Standard error
• Correlation and
• Regression coefficients.
NORMAL
• Normal is not the mean or a central
value but the accepted range of
variation on either side of mean or
average.
–Normal BP is not the mean but is a
range between 100and 140 (mean
120 ± 20).
• Chances of even higher or lower
are there.
MEASURE OF CENTRAL
LOCATION / TENDENCY
• Any measure indicating the centre of a set of
data or observations, arranged in an increasing
or decreasing order of magnitude.
• A single value which represents all the values of
the distribution in a definite way.
• Most commonly used measures of central
location are
– Mean
– Median
– Mode
MEASURE OF CENTRAL TENDENCY
“AVERAGE”
• What is the average or central value?
• How are the values dispersed around this
value?
• Degree of scatter?
• Is the distribution normal ( shape of
distribution)
AVERAGE
• Average value of a characteristic is the one
central value around which all other observations
are dispersed.
• 50% of observations lie above and
• 50% of values lie below the central value.
• It helps
• Most of normal observations lie close to
central value
• Few of the too large or too small values lie
far away at ends
• To find which group is better off by
comparing the average of one group with
that of other.
MEAN
• Most commonly used average.
• It is the value obtained by dividing the sum
of the values by their number i.e.,
summarizing up of all observations and
dividing total by no. of observations
MEAN
• It implies arithmetic average or arithmetic
mean which is obtained by summing up
all the observations and dividing by the
total number of observations.e.g.
• ESRs of 7 patients are 7,5,4,6,4,5,9
• Mean =7+5+4+6+4+5+9 =40/7=5.71
7
MEDIAN
• When all observations are arranged in
either ascending or descending order, the
middle observation is called as median.
i.e. mid value of series
• Median is better indicator of central value
when one or more of the lowest or
highest observations are wide apart or
not so evenly distributed.
MEDIAN
• 83, 75, 81, 79, 71, 95, 75, 77, 84, 79, 75,
71, 73, 91, 93.
• 71, 71, 73, 75, 75, 75, 77, 79, 79, 81, 83,
84, 91, 93, 95.
• Median = 79
MODE
• Most frequently occurring value or
observation in a series i.e. the most
common or most fashionable value.
• 85, 75, 81, 79, 71, 95, 75, 77, 75, 90,
71, 75, 79, 95, 75, 77, 84, 75, 81, 75.
MODE
• Most frequently occurring
observation in a series I.e. the most
common or most fashionable value.
• 85, 75, 81, 79, 71, 95, 75, 77, 75, 90,
71, 75, 79, 95, 75, 77, 84, 75, 81, 75.
• Mode = 75.
NORMAL DISTRIBUTION
• Normal curve
• Smooth, Bell shaped, bilaterally symmetrical
curve
• Total area is =1
• Mean, Median and mode are equal.
• Standard deviation=1
• Mean, median, mode coincide.
• Area between ¯X±1 SD=68.3%
•
X±2SD=95.5%
•
X±3SD=99.9%
NORMAL DISTRIBUTION
NO. OF Pts
ADMITTED PATIENTS IN SZH
15
10
5
0
0--9
10-19
20-29
30-39
40-49
AGE GROUP
50-59
60-69
POSITIVELY SKEWED
NO. OF Pts
AGE WISE Pts VISITING SZH
15
10
5
0
0--9
10-19
20-29
30-39
40-49
AGE GROUP
50-59
60-69
NEGATIVELY SKEWED
NO. OF PtS
AGE WISE Pts VISITING SZH
15
10
5
0
0--9
10-19
20-29
30-39
40-49
AGE GROUP
50-59
60-69
Normal distribution
histogram of weights of students
220
200
195
150
136
90
90
weights
6
17
7.5
17
2.5
17
0
16
7.5
16
5
16
2.5
16
0
15
7.5
16
15
5
14
7.5
45
16
15
2.5
6
15
0
3
14
5
45
17
5
50
0
195
136
100
14
2.5
no. of students
250
3
NORMAL DISTRIBUTION
POSITIVELY SKEWED
NEGATIVELY SKEWED
VARIABILITY
•
•
•
•
Biological data are variable
Two measurements in man are variable
Cure rate are not equal but variable
Height of students in same age group is not
same but variable
• Height of students in one area is not same as
compared to other place but variable
• Variability is essentially a normal character
• It is a biological phenomenon.
TYPES OF VARIABILITY
• Biological variability
• That occurs within certain accepted
biological limits. It occurs by chance.
– Individual variability
– Periodical variability
– Class, group or category variability
– Sampling variability or sampling error
REAL VARIABILITY
– When the difference between two readings
or observations or values of classes or
samples is more than the defined limits in
the universe, it is said to be real variability.
The cause is external factors. e.g.
significant difference in cure rates may be
due to a better drug but not due to a
chance.
Experimental variability
• Errors or differences due to materials,
methods, procedures employed in the
study or defects in the techniques involved
in the experiment.
– Observer error
– Instrumental error
– Sampling error.
MEASURES OF VARIABILITY
• How individual observations are dispersed
around the mean of a large series.
• Measures of variability of individual observations.
–
–
–
–
Range
Mean deviation
Standard deviation
Coefficient of variation
• Measures of variability of samples
–
–
–
–
–
–
Standard error of mean
Standard error of difference between two means
Standard error of proportion
Standard error of difference between two proportions
Standard error of correlation coefficient
Standard error of regression coefficient.
RANGE
• It is the difference between the highest
and lowest values or figures in a given
sample.
• Example: 83,75,81,79,71,90,75,95,77,94.
• Range =71 to 95.
RANGE
• Range defines the normal limits of a biological
characteristic.
• It is the simplest measure of dispersion
• Usually employed as a measure of variability in
medical practice
• It indicates the distance between the lowest and
highest.
• It ignores all observations except two extreme
values on which it is based.
• Normal range covers observations falling in 95%
confidence limits.
MEAN DEVIATION
• It the average of the deviations from the
arithmetic mean.
• M.D=∑ (X-¯X)
•
n
• Example:
• 83,75,81,79,71,90,75,95,77,94.
MEAN DEVIATION
D BP
Mean
Deviation from
mean=X-X
83
81
2
75
81
-6
81
81
0
79
81
-2
71
81
-10
95
81
14
75
81
-6
77
81
-4
84
81
3
90
81
9
810
56
M.D=5.6
STANDARD DEVIATION
• Most frequently used measure of deviation
• “Root – means—square--deviation”
SD
•
•
•
•
•
•
•
•
142.5
145
147.5
150
152.5
155
157.5
160 (M)
3
8
15
45
90
155
194
195
162.5
165
167.5
170
172.5
175
M=160
SD=5
136
93
42
16
6
2
SD
WEIGHTS OF STUDENTS (Kg)
N0. OF STUDENTS
250
200
194 195
155
150
136
100
50
0
93
90
45
3
8
42
15
16
6
2
143 145 148 150 153 155 158 160 163 165 168 170 173 175
WEIGHT
NORMAL DISTRIBUTION
• Range, mean±1SD=160±5=155 to 165cm
– 68.27% of the observations
• Range, mean±2SD=160±2x5=150 to
170cm
– 95.45% of the observations
• Range, mean±3SD=160±3x5=145 to
175cm
– 99.5% of the observations
• 3 observations < -3 SD & 2 observations >
+3 SD fall in 0.05% group.
RELATIVE VARIATE (Z)
• Deviation from the mean in a normal
distribution or curve is called relative or
standard normal deviate.
• It is measured in terms of SD & it tells us
how much an observation is higher or
smaller than mean in terms of SD.
• Z= observation-mean =X-X¯
SD
SD
RANGE
•
•
•
•
Easy to understand
Easy to calculate
Useful as a rough measure of variation
Value may be greatly changed by an
extreme value
• Highly unstable measure of variation.
MEAN DEVIATION
• Simple to understand and interpret.
• Affected by the value of every
observation
• Less affected by absolute variation
• Not suited for any mathematical
treatment.
SD
• Affected by value of every observation
• It avoids algebraic fallacy
• Less affected by fluctuations of sampling
than other measures of dispersion
• Has a definite mathematical meaning
• Has a great practical utility in sampling
and statistical inferences.
QUESTION
• Average weight of baby at birth is 3.05Kg
with SD of 0.39Kg. In a normal distribution
a) wt. of 4 Kg as abnormal?
b) wt. of 2.5 Kg as normal?
Percentage
• Is the number of units with a certain
characteristic divided by total no. of units
multiplied by 100.
Proportion
• It is a numerical expression that compares
one part of the study unit to the whole.
RATIO
• It is a numerical expression, which
indicates the relationship in quantity ,
amount or size between two or more parts.
SAMPLING
• Not possible to include each & every
member
• Not possible to examine all people of
country
• To test efficacy of drug to all patients
• Cooking of rice
• Costly collection & Time consuming
• Blood test
POPULATION
• Population
• Sample
• Parameter: a value calculated from a
population
– Mean (μ)
– Standard Deviation(σ)
• Sample
– Mean (X)
– Standard deviation ( s)
SAMPLING
• Sample is a part of population
• Estimation of population parameters
• To test the hypothesis about the
population from which the sample was
drawn.
• Inferences are applied to the whole
population but generalization are valid if
sample size is sufficiently large & must be
representative of the population-unbiased.
SAMPLING
• Sampling units are break down of
population into smaller parts which are
distinct and non overlapping so that each
member / element of the population
belongs to one and only one sampling
unit.
• When a list of all individuals , households,
schools and industries are drawn, it is
called sampling frame.
Sample
• A representative sample is the one with
which we can draw valid inference
regarding the population parameters.
• It is representative of the population under
study
• Is large enough but not too large
• The selected elements must be properly
approached, included and interviewed.
CONFIDENCE INTERVAL
• It is the interval or range of values which most
likely encompasses the true population value.
• It is the extent that a particular sample value
deviates from the population
• A range or an interval around the sample value
• Range or interval is called confidence interval.
• Upper & lower limits are called confidence limits.
C.I
• Random sample of 11 three years children
were taken, sample mean was 16 Kg and
standard deviation is 2 Kg. standard error
is 0.6 Kg. find C.I.
STANDARD ERROR
• Standard error is the standard deviation of
the means of different samples of
population.
• Standard error of the mean
• S.E. is a measure which enables to judge
whether a mean of a given sample is
within the set of confidence limits or not, in
a population.
• S.E= SD/√n
SAMPLING TECHNIQUES
•
•
•
•
•
•
•
•
•
SIMPLE RANDOM SAMPLING
SYSTEMATIC SAMPLING
STRATIFIED SAMPLING
MULTISTAGE SAMPLING
CLUSTER SAMPLING
MULTIPHASE SAMPLING
CONVENIENT SAMPLING
QUOTA SAMPLING
SNOW BALL SAMPLIG
Sample size
• L= 2 σ
√n
√n= 2 σ
L
n= 4 σ²
L²
Example:
1.mean pulse rate=70
Pop. Standard deviation(σ)=8 beats
Calculate sample size?
2. Mean SBP=120,SD=10, calculate n?
Sample size
• Qualitative data
• N=4pq
L²
e.g.