Download Introduction to Biostatistics - Basic Concepts

Document related concepts
no text concepts found
Transcript
Introduction to
Biostatistics
Part I: Basic Concepts
π
ε
χ
ε
π Pharma Edge
Helmut Schütz
BEBAC
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
Attribution--ShareAlike 3.0 Unported
Wikimedia Commons • 2007 Sujit Kumar • Creative Commons Attribution
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
1 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Biometry, Biometrics,
and Biostatistics
Introduced
in 1947 by R.A Fisher
as ‘Biometry’ and later ‘Biometrics’
‘Biometry, the active pursuit of biological
knowledge by quantitative methods.’
The International Biometric Society
‘The terms “Biometrics” and “Biometry” have been used
since early in the 20th century to refer to the field of
development of statistical and mathematical methods
applicable to data analysis problems in the biological
sciences. Recently, the term “Biometrics” has also been
used to refer to the emerging field of technology devoted
to identification of individuals […]’
π
ε
χ
ε
π Pharma Edge
‘Biostatistics’
was introduced as a new term…
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
2 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Biometry, Biometrics,
and Biostatistics
Statistics. A subject which most
statisticians find difficult but in which nearly
all physicians are expert.
Biostatistician. One who has neither the
intellect for mathematics nor the commitment for
medicine but likes to dabble in both.
Medical statistician. One who will not accept that
Columbus discovered America… because he said
he was looking for India in the trial plan.
π
ε
χ
ε
π Pharma Edge
Stephen Senn
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
3 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Terminology I
high bias
low bias
low variance
high variance
π
ε
χ
ε
π Pharma Edge
bias
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
4 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Terminology II
data
discrete
nominal scale
distictness
continuous
ordinal scale
interval scale
ratio scale
distictness +
distictness +
distictness +
rank order
rank order +
rank order +
interval
interval +
ratio
π
ε
χ
ε
π Pharma Edge
increasing information
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
5 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data I
Nominal
Sex,
scale (aka categorial)
ethnicity,…
mode, χ² test
Transformations: equality
Statistics:
Ordinal
scale
School
grades, disease states,…
Statistics:
median, percentile, sign test,
Wilcoxon test
Transformations: monotonic increasing order
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
6 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data II
Interval
scale
Calendar
dates, temperature in °C, IQ,…
Statistics:
mean, variance (standard
deviation), correlation, regression,
ANOVA
Transformations: linear
Ratio
scale
Measures
with true zero point, temperature in K,…
Statistics:
π
ε
χ
ε
π Pharma Edge
all of the above, geometric and
harmonic mean, coefficient of
variation
Transformations: multiplicative, logarithm
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
7 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Examples from PK
Ordinal
scale
tmax, tlag
Statistics:
median, percentile, sign test,
Wilcoxon test
Transformations: monotonic increasing order
Ratio
scale
AUC,
Cmax, λz,…
Statistics:
π
ε
χ
ε
π Pharma Edge
mean, variance (standard
deviation), correlation, regression,
ANOVA, geometric and harmonic
mean, coefficient of variation
Transformations: multiplicative, logarithm
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
8 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Bell curve – and beyond
π
ε
χ
ε
π Pharma Edge
Abraham de Moivre (1667–1754),
Pierre-Simon Laplace (1749–1827)
Central limit theorem 1733, 1812
Carl F. Gauß (1777–1855)
Normal distribution 1795
William S. Gosset, aka Student
(1876–1937)
t-distribution 1908
Ronald A. Fisher (1890–1962)
Analysis of variance 1918
Frank Wilcoxon (1892–1965)
Nonparametric tests 1945
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
9 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Statistical Distributions
0.015
0.005
0.000
50
100
150
200
0
100
150
normal distr.: mean = 100 CV = 30 %
normal distr.: mean = 100 CV = 30 %
200
0.015
0.010
0.000
0.000
0.005
0.010
Density
0.015
0.020
n = 48
0.005
Density
50
n = 12
0.020
0
π
ε
χ
ε
π Pharma Edge
0.010
Density
0.010
0.000
0.005
Density
0.015
0.020
normal distr.: mean = 100 CV = 30 %
0.020
normal distr.: mean = 100 CV = 30 %
0
50
100
150
200
0
n = 128
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
50
100
150
200
n = 1024
10 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Statistical Distributions
0.015
0.005
0.000
100
150
200
250
0
100
150
n = 12
n = 48
lognormal distr.:
mean = 100 CV = 30 %
lognormal distr.:
mean = 100 CV = 30 %
200
250
200
250
0.015
0.000
0.005
0.010
Density
0.015
0.010
0.000
0.005
Density
50
0.020
50
0.020
0
π
ε
χ
ε
π Pharma Edge
0.010
Density
0.010
0.000
0.005
Density
0.015
0.020
lognormal distr.:
mean = 100 CV = 30 %
0.020
lognormal distr.:
mean = 100 CV = 30 %
0
50
100
150
200
250
0
50
n = 128
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
100
150
n = 1024
11 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Statistical Distributions
Normal
f ( x) =
Distribution
1
e
σ 2π
1 x−µ 
− 

2 σ 
2
Defined
by location (aka central tendency)
and dispersion
Population
population mean µ
2
Dispersion:population variance σ
Location:
Sample
sample mean x
Dispersion:sample variance
Location:
π
ε
χ
ε
π Pharma Edge
Probability
s2
= 1 within -∞ and +∞ F ( x ) =
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
1
σ 2π
∫
x
−∞
e
1  t −µ 
− 

2 σ 
12 • 57
2
dt
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Statistical Distributions
( ln x − µ )
 1
−
2

e 2σ
f ( x ) =  σ x 2π
Lognormal Distribution

Defined by location and dispersion 0
2
x>0
x≤0
Population
population mean µ
2
Dispersion:population variance σ
Location:
Sample
sample mean x
Dispersion:sample variance
Location:
Probability
π
ε
χ
ε
π Pharma Edge
s2
= 1 within 0 and +∞
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
1
F ( x) =
σ 2π
∫
x
0
1
e
t
2
ln t − µ )
(
−
2σ 2
13 • 57
dt
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Statistical Distributions
Density
50
100
150
200
0
150
normal distr.: mean = 100
sd = 30 CV = 30 %
200
Density
0.000
0.010
0.010
0.020
normal distr.: mean = 120
sd = 24 CV = 20 %
0.000
50
100
150
200
0
50
100
150
n = 1000
normal distr.: mean = 110
sd = 22.09 CV = 20.08 %
normal distr.: mean = 100
sd = 25.5 CV = 25.5 %
200
0.000
0.000
0.010
Density
0.020
n = 1000
0.010
Density
100
n = 1000
0.020
0
Density
50
n = 1000
0.020
0
π
ε
χ
ε
π Pharma Edge
0.010
0.000
0.010
0.000
Density
0.020
normal distr.: mean = 100
sd = 20 CV = 20 %
0.020
normal distr.: mean = 100
sd = 20 CV = 20 %
0
50
100
150
200
0
50
n = 2000
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
100
150
200
n = 2000
14 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Statistical Distributions
Density
100
150
200
0
150
sample 2: mean = 100.1
sd = 20.41 CV = 20.38 %
sample 3: mean = 100.2
sd = 19.61 CV = 19.57 %
200
Density
0.010
0.000
0.010
0.000
100
150
200
0
50
100
150
n = 36
n = 36
sample 4: mean = 96.69
sd = 19.31 CV = 19.97 %
sample 5: mean = 102.5
sd = 19.31 CV = 18.85 %
200
0.000
0.000
0.010
Density
0.020
50
0.010
Density
100
n = 36
0.020
0
Density
50
N = 1e+06
0.020
50
0.020
0
π
ε
χ
ε
π Pharma Edge
0.010
0.000
0.010
0.000
Density
0.020
sample 1: mean = 101
sd = 15.94 CV = 15.78 %
0.020
population: mean = 100
sd = 20.08 CV = 20.08 %
0
50
100
150
200
0
n = 36
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
50
100
150
200
n = 36
15 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Statistical Distributions
π
ε
χ
ε
π Pharma Edge
0.020
0.015
0.000
0.005
0.010
Density
0.015
0.010
0.000
0.005
Density
0.020
0.025
20 samples drawn
from population
0.025
population: mean = 100
sd = 20.03 CV = 20.03 %
0
50
100
150
200
0
N = 1e+06
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
50
100
150
200
n = 36
16 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Central Limit Theorem
If
samples are drawn by a random process
from a population with a normal distribution,
distribution of sample means is also normal.
The mean of the distribution of sample means
is identical to the mean of the ‘parent
population’ – the population from which the
samples are drawn.
The higher the sample size that is drawn,
the ‘narrower’ will be the dispersion of the
distribution of sample means.
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
17 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Normal Distribution I
Standard normal distribution: µ = 0, σ = 1
0.4
0.4
Standard normal distribution: µ = 0, σ = 1
0.3
0.1
0.0
-2
0
2
4
-4
0
2
z
Standard normal distribution: µ = 0, σ = 1
Standard normal distribution: µ = 0, σ = 1
4
0.2
0.3
±4σ p 99.99%
0.0
0.0
0.1
0.2
Density
0.3
±3σ p 99.73%
0.1
Density
-2
z
0.4
0.4
-4
π
ε
χ
ε
π Pharma Edge
±2σ p 95.45%
0.2
Density
0.2
0.0
0.1
Density
0.3
±1σ p 68.27%
-4
-2
0
2
4
-4
z
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
-2
0
2
4
z
18 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Normal Distribution II
Standard normal distribution: µ = 0, σ = 1
0.4
0.4
Standard normal distribution: µ = 0, σ = 1
0.3
0.1
0.0
-2
0
2
4
-4
0
2
z
Standard normal distribution: µ = 0, σ = 1
Standard normal distribution: µ = 0, σ = 1
4
0.2
0.3
p 99.9% ±3.291σ
0.0
0.0
0.1
0.2
Density
0.3
p 99% ±2.576σ
0.1
Density
-2
z
0.4
0.4
-4
π
ε
χ
ε
π Pharma Edge
p 95% ±1.960σ
0.2
Density
0.2
0.0
0.1
Density
0.3
p 90% ±1.645σ
-4
-2
0
2
4
-4
z
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
-2
0
2
4
z
19 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Normal Distribution III
π
ε
χ
ε
π Pharma Edge
-2
0
z
2
4
0.3
0.2
Density
0.0
0.0
-4
[-1.645, +∞] 95%
0.1
0.2
Density
0.3
[-∞, +1.645] 95%
0.1
0.2
0.0
0.1
Density
0.3
[±1.645] 90%
Standard normal distribution: µ = 0, σ = 1
0.4
Standard normal distribution: µ = 0, σ = 1
0.4
0.4
Standard normal distribution: µ = 0, σ = 1
-4
-2
0
2
4
z
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
-4
-2
0
2
4
z
20 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Confidence Interval I
If
we have drawn a sample from a population,
we get the sample mean x and the sample
standard deviation s.
Can we make a prediction about the
population mean?
Yes. That’s called a Confidence Interval (CI).
If σ is known:
σ
[µ] = x ± z
n
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
21 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Confidence Interval II
from previous slides: µ 100, σ 20
Sample sizes 36, z0.05 1.960
Example
Samples’ means
101.0
100.1
100.2
96.69
102.5
Confidence Interval
94.47
107.5
93.57
106.6
93.67
106.7
90.16
103.2
95.97
109.0
generally we don’t know σ !
Help is on the way…
But
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
22 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Student’s t Distribution
Depends
ν + 1 
ν +1
Γ
2 − 2
 
x 
2 
f ( x) = 
1+ 

ν
ν 
νπ Γ   
2
on one parameter, the
‘degrees of freedom ν ’. In the
Γ ( x ) = ∫ t e dt
most simple case df = n – 1.
The t Distribution is ‘heavy tailed’ compared to
the normal distribution. Small sample sizes
are penalized.
Approaches quickly the normal distribution for
df >≈30.
Allows calculation of a CI of the sample mean
based on the sample standard deviation s.
+∞
x −1 − t
0
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
23 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Student’s t Distribution
Student’s t distribution: ν = 5
0.4
0.4
Student’s t distribution: ν = 1
0.3
0.1
0.0
-2
0
2
4
-4
0
2
x
Student’s t distribution: ν = 11
Student’s t distribution: ν = 35
4
0.2
0.3
n = 36
0.0
0.0
0.1
0.2
Density
0.3
n = 12
0.1
Density
-2
x
0.4
0.4
-4
π
ε
χ
ε
π Pharma Edge
n=6
0.2
Density
0.2
0.0
0.1
Density
0.3
n=2
-4
-2
0
2
4
-4
x
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
-2
0
2
4
x
24 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Confidence Interval III
from previous slides: µ 100, σ 20
Sample sizes 36, z0.05 1.960, t36-1,0.05 2.030
Example
Samples’
mean
stand. dev.
101.0
15.94
100.1
20.41
100.2
19.61
96.69
19.31
102.5
19.31
π
ε
χ
ε
π Pharma Edge
Confidence Intervals
based on z
based on t
94.47
107.5
95.61
106.4
93.57
106.6
93.19
107.0
93.67
106.7
93.57
106.8
90.16
103.2
90.16
103.2
95.97
109.0
95.97
109.0
[µ] = x ± z
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
σ
n
s
[µ] = x ± t
n
25 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Location parameters
x
= [91,72,141,119,92,124,92,101,90,145]
ranks [3, 1, 9, 7, 4.5, 8, 4.5, 6, 2, 10]
ordered [72,90,91,92,92,101,119,124,141,145]
Mode:
92 (most frequent number)
Median: 96.5 (middle value)
If
n=odd: value at xn/2
If n=even: value (xn/2+xn/2+1)/2 = (92+101)/2
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
26 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Location parameters
Harmonic
mean: 101.9516
n
n
xharm = i =n =
1 1
1
1
+ ⋯
∑
x1 x2 xi
i =1 xi
Geometric mean: 104.2814
x geom =
i =n
n
∏x
i =1
i
= n x1 ⋅ x2 ⋯ xn = e
i =n
1
ln xi
n i =1
∑
Arithmetic
π
ε
χ
ε
π Pharma Edge
xarithm
mean: 106.7
1 i =n
x1 + x2 ⋯ xi
= ∑ xi =
n i =1
n
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
27 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
A note on the
harmonic mean
Driving
at 100 km/h from A to B;
distance is 100 km.
Driving back at 50 km/h.
What is the average speed for the round-trip?
75
km/h
70.71 km/h 66.67 km/h
1
h for 100 km (A→B) and 2 h for 100 km
(A←B); 200 km/3 h = 66.67 km/h.
Harmonic mean
2
2
xharm =
=
= 66.6ɺ
1
1 0.01 + 0.02
for rates!
+
π
ε
χ
ε
π Pharma Edge
100
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
50
28 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Location parameters
Application
of any location parameter always (!)
implies an underlying distributional assumption.
Median:
discrete (or unknown)
Arithmetic mean: normal distribution
Geometric mean: lognormal distribution
Harmonic mean: rates
Example
from above sampled from a lognormal
distribution
Arithmetic
mean: 106.7 (too high!)
Geometric mean: 104.3 (correct)
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
29 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Location parameters
Boxplot
60
4.2
80
4.4
100
4.6
120
4.8
140
5.0
Boxplot
π
ε
χ
ε
π Pharma Edge
linear scale
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
logarithmic scale
30 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Nitpicking terminology
If
we are estimating parameters of a distribution,
we are using Estimators; e.g., the arithmetic
mean is the unbiased estimator of the central
tendency of the normal distribution.
The numerical outcomes (i.e., values one give in
the report) are Estimates.
Don’t write something like
‘The point estimator was 95.34 %.’
… when it was actually a maximum likelihood
estimator based on least squares means in logscale ☺
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
31 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Dispersion parameters
ordered
[72,90,91,92,92,101,119,124,141,145]
Quartiles
(25%, 75%): Be cautious! Different
methods implemented in software…
90.00,
124.00: SAS
91.25, 122.75: S, R, M$-Excel
90.75, 128.25: Minitab, SPSS, Phoenix/WinNonlin
90.92, 125.42: Hyndman & Fan (1996)
… and many others!
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
32 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Dispersion parameters
Standard
deviation (SD) of arithmetic mean:
24.2400
SDarithm
1 i =n
2
=
( xi − x )
∑
n − 1 i =1
SD
of geometric mean:
23.8000
i =n
SDgeom = e
π
ε
χ
ε
π Pharma Edge
∑(
1
ln xi −ln x geom
n −1 i =1
)
2
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
33 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Dispersion parameters
SD
of harmonic mean: 22.8519
SDharm =
π
ε
χ
ε
π Pharma Edge
i =n
( n − 1) ∑ ( H i − H )
2
i =1
1 i =n
H = ∑ Hi
n i =1
n −1
Hi =
 i =n 1  1
 ∑  −
 j =1 x j  xi
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
34 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Dispersion parameters
Coefficient
of Variation
(sometimes given in percent of mean):
CV % = 100 ⋅ SD x
Population (N=106),
parameters
π
ε
χ
ε
π Pharma Edge
Sample (n=36), parameters
µ
100.00
xarithm
106.70
σ
20.00
SDarithm
24.24
SDgeom
23.80
CV%
20.00
CV %
22.72
CV %
22.83
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
x geom
104.28
35 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
A remark on Variances
Whilst
means and variances are additive,
standard deviations (and CVs as well) are not!
sample
1
2
mean
100
100
1+2
(100+100)/2
∑s²/2
√650=25.5!!
0
50
100
n = 1000
150
200
0.000
0.010
Density
0.000
0.010
Density
0.020
mean = 100 sd = 25.5 CV = 25.5 %
0.020
mean = 100 sd = 30 CV = 30 %
0.020
0.010
0.000
Density
400
900
25??
mean = 100 sd = 20 CV = 20 %
π
ε
χ
ε
π Pharma Edge
s²
s
20
30
0
50
100
150
n = 1000
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
200
0
50
100
150
200
n = 2000
36 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Arithm. vs. geom. means
Arithmetic mean (95% CI)
Arithmetic mean (95% CI)
25
10
concentration
concentration
20
15
10
1
5
0
.1
0
4
8
12
16
20
24
0
4
8
time
12
16
20
24
time
Geometric mean (95% CI)
Geometric mean (95% CI)
25
10
15
10
concentration
concentration
20
1
5
π
0
.1
ε
0
4
8
12
16
20
24
0
4
χ
time
ε
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
π Pharma Edge in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
8
12
16
20
24
time
37 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Median and quantiles
Median (5%, 25%, 75%, 95% quantile)
25
concentration
20
15
10
Median (5%, 25%, 75%, 95% quantile)
5
10
0
4
8
12
time
π
ε
χ
ε
π Pharma Edge
16
20
24
concentration
0
1
.1
0
4
8
12
16
20
24
time
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
38 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Degrees of freedom…
For
every estimated parameter in a statistical
model one degree of freedom is ‘lost’ from the
number of samples (df = n – p).
Any
model becomes useless if df=0, and impossible
to fit if df<0 (p>n). Example:
Linear
π
ε
χ
ε
π Pharma Edge
regression: Two parameters are fit (slope,
intercept; since any line is defined
by two points (x1/y1|x2,y2) at least
three data points are needed
(df=1).
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
39 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Degrees of freedom…
150
200
0
50
100
150
200
150
200
0
50
100
150
sample # 6
sample # 7
150
0
50
100
150
ys
200
0 50
ys
200
0 50
ys
0 50
ys
100
0
50
100
150
200
0
50
100
150
n = 12 , df = 10
sample # 8
sample # 9
sample # 10
sample # 11
150
0
50
100
150
ys
200
0 50
ys
200
0 50
ys
0 50
ys
0 50
100
0
50
100
150
200
0
50
100
150
n = 12 , df = 10
sample # 12
sample # 13
sample # 14
sample # 15
150
200
0
50
100
150
n = 12 , df = 10
200
ys
0 50
ys
0 50
ys
0 50
ys
100
n = 12 , df = 10
200
150
n = 12 , df = 10
150
n = 12 , df = 10
150
n = 12 , df = 10
50
200
150
n = 12 , df = 10
150
n = 12 , df = 10
150
n = 12 , df = 10
50
200
150
sample # 5
150
sample # 4
150
n = 12 , df = 10
0 50
0
100
n = 12 , df = 10
150
0
50
n = 12 , df = 10
0 50
50
0 50
0
n = 100
150
0
ys
150
0 50
0 50
100
sample # 3
ys
150
ys
150
y
0 50
50
150
0
π
ε
χ
ε
π Pharma Edge
sample # 2
150
sample # 1
mean = 100, sd = 20
0
50
100
150
n = 12 , df = 10
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
200
0
50
100
150
200
n = 12 , df = 10
40 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Degrees of freedom…
150
200
0
50
100
150
200
200
0
50
100
sample # 7
150
0
50
100
150
200
150
200
ys
200
0 50
ys
200
0 50
ys
0 50
ys
100
150
150
sample # 6
150
sample # 5
150
sample # 4
0
50
100
150
200
0
50
100
n = 2 , df = 0
sample # 8
sample # 9
sample # 10
sample # 11
150
0
50
100
150
ys
200
0 50
ys
200
0 50
ys
0 50
ys
100
150
n = 2 , df = 0
150
n = 2 , df = 0
150
n = 2 , df = 0
0
50
100
150
200
0
50
100
150
n = 2 , df = 0
sample # 12
sample # 13
sample # 14
sample # 15
100
150
n = 2 , df = 0
200
0
50
100
150
n = 2 , df = 0
200
ys
0 50
ys
0 50
ys
0 50
ys
50
200
150
n = 2 , df = 0
150
n = 2 , df = 0
150
n = 2 , df = 0
0 50
0
150
n = 2 , df = 0
0 50
50
100
n = 2 , df = 0
150
0
50
n = 2 , df = 0
0 50
50
0 50
0
n = 100
150
0
ys
150
0 50
0 50
100
sample # 3
ys
150
ys
150
y
0 50
50
150
0
π
ε
χ
ε
π Pharma Edge
sample # 2
150
sample # 1
mean = 100, sd = 20
0
50
100
150
n = 2 , df = 0
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
200
0
50
100
150
200
n = 2 , df = 0
41 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Visualize your data!
Anscombe’s
Quartet (1973)
All datasets:
meanx 9.0, s²x 10
meany 7.5, s²y 3.75
Corryx 0.898
Regryx y = 3 + 0.5x
Don’t rely solely on
numerical results.
π
ε
χ
ε
π Pharma Edge
20
20
correct
15
15
10
10
5
5
0
wrong model:
quadratic!
0
0
5
10
15
20
0
5
Anscombe1
20
15
20
10
5
5
0
0
10
20
15
20
Anscombe3
15
20
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
nonsense
15
10
5
15
Anscombe2
correct model, but biased
by outlier
0
10
0
5
10
Anscombe4
42 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data Transformation?
MPH, 405 subjects
1.0
0.5
Density
0.020
0.0
0.000
0.010
50
100
150
2.5
3.5
4.0
4.5
AUC [ng×h/mL]
ln(AUC [ng×h/mL])
Shapiro-Wilk p= 0.26008
Normal Q-Q Plot
Normal Q-Q Plot
60
4.0
80
Sample Quantiles
100
4.5
5.0
20
3.0
40
Sample Quantiles
3.0
Shapiro-Wilk p= 3.2854e-14
120
0
Clearly in
favor of a
lognormal
distribution.
ShapiroWilk test
highly significant for
normal
distribution
(rejected).
3.5
Density
0.030
1.5
MPH, 405 subjects
π
ε
χ
ε
π Pharma Edge
-3
-2
-1
0
1
2
3
-3
-2
Theoretical Quantiles
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
-1
0
1
2
3
Theoretical Quantiles
43 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data Transformation!
MPH, 12 subjects
1.0
0.5
Density
0.020
0.0
0.000
0.010
50
100
150
2.5
3.5
4.0
AUC [ng×h/mL]
ln(AUC [ng×h/mL])
Shapiro-Wilk p= 0.85764
Normal Q-Q Plot
Normal Q-Q Plot
5.0
3.8
4.0
4.5
3.6
Sample Quantiles
50
40
3.0
3.2
30
20
Sample Quantiles
π
ε
χ
ε
π Pharma Edge
3.0
Shapiro-Wilk p= 0.29668
60
0
3.4
Density
0.030
1.5
MPH, 12 subjects
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
-1.5
-1.0
Theoretical Quantiles
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
-0.5
0.0
0.5
Theoretical Quantiles
1.0
1.5
Data set
from a real
study. Both
tests not
significant
(assumed
distributions
not rejected).
Tests not
acceptable
according
to GLs; logtransformation based
on prior
knowledge
(PK)!
44 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data Transformation
BE
testing started in the early 1980s with an
acceptance range of 80% – 120% of the
reference based on the normal distribution.
Was questioned in the mid 1980s
Like
many biological variables AUC and Cmax do not
follow a normal distribution
Negative
values are impossible
The distribution is skewed to the right
Might follow a lognormal distribution
π
ε
χ
ε
π Pharma Edge
Serial
dilutions in bioanalytics lead to multiplicative
errors
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
45 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data Transformation: PK
AUCT ⋅ CLT
AUCR ⋅ CLR
FT =
, FR =
DT
DR
AUCT
Frel ( BA) =
AUCR
Assumption 1:
Assumption 2:
π
ε
χ
ε
π Pharma Edge
D1=D2 (D1/D2=1*)
CL1=CL2
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
46 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data Transformation
‘Problems’
with logtransformation
If
π
ε
χ
ε
π Pharma Edge
we transform the ‘old’ acceptance limits of
80% – 120%, we get –0.2231, +0.1823.
These limits are not symetrical around 100% any
more, the maximum power is obtained at
e0.1823–0.2231 = 96%…
Solution:
lower limit = 1 – 0.20, upper limit = 1/lower limit
ln(0.80) = –0.2231 and ln(1.25) = +0.2231.
Symetrical around 0 in the log-domain and around
100% in the backtransformed domain (e0=1).
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
47 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Data Transformation
‘Problems’
with logtransformation
Discussion,
π
ε
χ
ε
π Pharma Edge
whether more bioinequivalent formulations will pass due to ‘5% wider’ limits
lower limit = 1 – 0.20, upper limit = 1/lower
80.00% – 125.00% (width 45.00%)
instead of keeping the ‘old’ width
lower limit = 1 – 0.1802, upper limit = 1/lower
81.98% – 121.98% (width 40.00%)
or even become more strict by setting
upper limit = 1 + 0.20, lower limit = 1/upper
83.33% – 120.00% (width 36.67%)
80% – 125% was chosen for convenience (!)
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
48 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
F Distribution
Allows
comparison of variances (depending
onν ) of two distributions. We will need that in

ν ν 
ANOVA.
Γ
 + 

ν1
ν2
ν 1 2 ν 2 2
F ( x |ν 1 ,ν 2 ) = 


0
Γ ( x) =
+∞
∫t
1
2
ν1
−1
x2
2 2
⋅ 
⋅
 ν 1   ν 1  ν x + ν ν1 +2ν 2
Γ Γ  ( 1
2)
2
2
   
x≥0
x<0
x −1 − t
e dt
0
Note
that if one of the degrees of freedom = 1,
there is a relationshop to the t distribution:
π
ε
χ
ε
π Pharma Edge
F (ν 1 = 1,ν 2 = ν ) = ( t (ν ) )
2
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
49 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Significance tests
In
statistics (as well as in science in general)
it is not possible to prove something.
We can only state a hypothesis and try to reject
this so called null hypothesis by evaluating data
from an experiment.
Example:
H0: µ1
= µ2 (no difference in means, null hypothesis)
vs.
Ha: µ1 ≠ µ2 (different means; alternative hypothesis)
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
50 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
α- vs. β-Error
All
formal decisions are subjected to two types
of error:
Type I (α-Error, Risk Type I)
Error Type II (β -Error, Risk Type II)
Example from the justice system:
Error
Verdict
Presumption of innocence not
accepted (guilty)
Presumption of innocence accepted
(not guilty)
π
ε
χ
ε
π Pharma Edge
Defendant innocent
Defendant guilty
Error type I
Correct
Correct
Error type II
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
51 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
α- vs. β-Error
…
in more statistical terms:
Decision
Null hypothesis true
Null hypothesis false
Null hypothesis rejected
Error type I
Correct (Ha)
Failed to reject null hypothesis
Correct (H0)
Error type II
In
BE-testing the null hypothesis is
bioinequivalence (µ1 ≠ µ2)!
Decision
Null hypothesis rejected
Failed to reject null hypothesis
π
ε
χ
ε
π Pharma Edge
Null hypothesis true
Null hypothesis false
Patients’ risk
Correct (BE)
Correct (not BE)
Producer’s risk
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
52 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
α- vs. β-Error
α-Error:
Patients’ Risk to be treated with a
bioinequivalent formulation (H0 falsely rejected)
BA
of the test compared to reference in a particular
patient is risky either below 80% or above 125%.
If we keep the risk of particular patients at 0.05 (5%),
the risk of the entire population of patients
(<80% and >125%) is 2×α (10%) is:
90% CI = 1 – 2×α = 0.90
95% one-sided CI
π
ε
χ
ε
π Pharma Edge
0.6
0.8
1 1.25 1.67
particular patient
95% one-sided CI
0.6
0.8
1 1.25 1.67
particular patient
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
90% two-sided CI
= two 95% one-sided
0.6
0.8
1 1.25 1.67
population of patients
53 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
α- vs. β-Error
β-Error:
Producer’s Risk to get no approval for
a bioequivalent formulation (H0 falsely not rejected)
in study planning to ≤0.2, where
power = 1 – β = ≥80%
If power is set to 80 %
One out of five studies will fail just by chance!
Set
π
ε
χ
ε
π Pharma Edge
α 0.05
BE
not BE
β 0.20
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
54 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Significance test (α- vs. β)
Significance test: α, β
β
π
ε
χ
ε
π Pharma Edge
Scrit. α
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
55 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
Part I: Basic Concepts
Helmut Schütz
BEBAC
π
ε
χ
ε
π Pharma Edge
Consultancy Services for
Bioequivalence and Bioavailability Studies
1070 Vienna, Austria
[email protected]
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
56 • 57
Introduction to Biostatistics (1/3: Basic Concepts)
Concepts)
To bear in Remembrance...
In these matters the only certainty is
that nothing is certain.
Gaius Plinius Secundus (Pliny the Elder)
The theory of probabilities is at bottom
nothing but common sense reduced to calculus.
calculus
Pierre-Simon Laplace
It is a good morning exercise for a research scientist
to discard a pet hypothesis every day before
breakfast.
It keeps him young.
Konrad Lorenz
π
ε
χ
ε
π Pharma Edge
Biostatistics:
Biostatistics: Basic concepts & applicable principles for various designs
in bioequivalence studies and data analysis | Mumbai,
Mumbai, 29 – 30 January 2011
57 • 57
Related documents