Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Biostatistics Part I: Basic Concepts π ε χ ε π Pharma Edge Helmut Schütz BEBAC Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 Attribution--ShareAlike 3.0 Unported Wikimedia Commons • 2007 Sujit Kumar • Creative Commons Attribution Introduction to Biostatistics (1/3: Basic Concepts) Concepts) 1 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Biometry, Biometrics, and Biostatistics Introduced in 1947 by R.A Fisher as ‘Biometry’ and later ‘Biometrics’ ‘Biometry, the active pursuit of biological knowledge by quantitative methods.’ The International Biometric Society ‘The terms “Biometrics” and “Biometry” have been used since early in the 20th century to refer to the field of development of statistical and mathematical methods applicable to data analysis problems in the biological sciences. Recently, the term “Biometrics” has also been used to refer to the emerging field of technology devoted to identification of individuals […]’ π ε χ ε π Pharma Edge ‘Biostatistics’ was introduced as a new term… Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 2 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Biometry, Biometrics, and Biostatistics Statistics. A subject which most statisticians find difficult but in which nearly all physicians are expert. Biostatistician. One who has neither the intellect for mathematics nor the commitment for medicine but likes to dabble in both. Medical statistician. One who will not accept that Columbus discovered America… because he said he was looking for India in the trial plan. π ε χ ε π Pharma Edge Stephen Senn Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 3 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Terminology I high bias low bias low variance high variance π ε χ ε π Pharma Edge bias Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 4 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Terminology II data discrete nominal scale distictness continuous ordinal scale interval scale ratio scale distictness + distictness + distictness + rank order rank order + rank order + interval interval + ratio π ε χ ε π Pharma Edge increasing information Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 5 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data I Nominal Sex, scale (aka categorial) ethnicity,… mode, χ² test Transformations: equality Statistics: Ordinal scale School grades, disease states,… Statistics: median, percentile, sign test, Wilcoxon test Transformations: monotonic increasing order π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 6 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data II Interval scale Calendar dates, temperature in °C, IQ,… Statistics: mean, variance (standard deviation), correlation, regression, ANOVA Transformations: linear Ratio scale Measures with true zero point, temperature in K,… Statistics: π ε χ ε π Pharma Edge all of the above, geometric and harmonic mean, coefficient of variation Transformations: multiplicative, logarithm Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 7 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Examples from PK Ordinal scale tmax, tlag Statistics: median, percentile, sign test, Wilcoxon test Transformations: monotonic increasing order Ratio scale AUC, Cmax, λz,… Statistics: π ε χ ε π Pharma Edge mean, variance (standard deviation), correlation, regression, ANOVA, geometric and harmonic mean, coefficient of variation Transformations: multiplicative, logarithm Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 8 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Bell curve – and beyond π ε χ ε π Pharma Edge Abraham de Moivre (1667–1754), Pierre-Simon Laplace (1749–1827) Central limit theorem 1733, 1812 Carl F. Gauß (1777–1855) Normal distribution 1795 William S. Gosset, aka Student (1876–1937) t-distribution 1908 Ronald A. Fisher (1890–1962) Analysis of variance 1918 Frank Wilcoxon (1892–1965) Nonparametric tests 1945 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 9 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Statistical Distributions 0.015 0.005 0.000 50 100 150 200 0 100 150 normal distr.: mean = 100 CV = 30 % normal distr.: mean = 100 CV = 30 % 200 0.015 0.010 0.000 0.000 0.005 0.010 Density 0.015 0.020 n = 48 0.005 Density 50 n = 12 0.020 0 π ε χ ε π Pharma Edge 0.010 Density 0.010 0.000 0.005 Density 0.015 0.020 normal distr.: mean = 100 CV = 30 % 0.020 normal distr.: mean = 100 CV = 30 % 0 50 100 150 200 0 n = 128 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 50 100 150 200 n = 1024 10 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Statistical Distributions 0.015 0.005 0.000 100 150 200 250 0 100 150 n = 12 n = 48 lognormal distr.: mean = 100 CV = 30 % lognormal distr.: mean = 100 CV = 30 % 200 250 200 250 0.015 0.000 0.005 0.010 Density 0.015 0.010 0.000 0.005 Density 50 0.020 50 0.020 0 π ε χ ε π Pharma Edge 0.010 Density 0.010 0.000 0.005 Density 0.015 0.020 lognormal distr.: mean = 100 CV = 30 % 0.020 lognormal distr.: mean = 100 CV = 30 % 0 50 100 150 200 250 0 50 n = 128 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 100 150 n = 1024 11 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Statistical Distributions Normal f ( x) = Distribution 1 e σ 2π 1 x−µ − 2 σ 2 Defined by location (aka central tendency) and dispersion Population population mean µ 2 Dispersion:population variance σ Location: Sample sample mean x Dispersion:sample variance Location: π ε χ ε π Pharma Edge Probability s2 = 1 within -∞ and +∞ F ( x ) = Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 1 σ 2π ∫ x −∞ e 1 t −µ − 2 σ 12 • 57 2 dt Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Statistical Distributions ( ln x − µ ) 1 − 2 e 2σ f ( x ) = σ x 2π Lognormal Distribution Defined by location and dispersion 0 2 x>0 x≤0 Population population mean µ 2 Dispersion:population variance σ Location: Sample sample mean x Dispersion:sample variance Location: Probability π ε χ ε π Pharma Edge s2 = 1 within 0 and +∞ Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 1 F ( x) = σ 2π ∫ x 0 1 e t 2 ln t − µ ) ( − 2σ 2 13 • 57 dt Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Statistical Distributions Density 50 100 150 200 0 150 normal distr.: mean = 100 sd = 30 CV = 30 % 200 Density 0.000 0.010 0.010 0.020 normal distr.: mean = 120 sd = 24 CV = 20 % 0.000 50 100 150 200 0 50 100 150 n = 1000 normal distr.: mean = 110 sd = 22.09 CV = 20.08 % normal distr.: mean = 100 sd = 25.5 CV = 25.5 % 200 0.000 0.000 0.010 Density 0.020 n = 1000 0.010 Density 100 n = 1000 0.020 0 Density 50 n = 1000 0.020 0 π ε χ ε π Pharma Edge 0.010 0.000 0.010 0.000 Density 0.020 normal distr.: mean = 100 sd = 20 CV = 20 % 0.020 normal distr.: mean = 100 sd = 20 CV = 20 % 0 50 100 150 200 0 50 n = 2000 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 100 150 200 n = 2000 14 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Statistical Distributions Density 100 150 200 0 150 sample 2: mean = 100.1 sd = 20.41 CV = 20.38 % sample 3: mean = 100.2 sd = 19.61 CV = 19.57 % 200 Density 0.010 0.000 0.010 0.000 100 150 200 0 50 100 150 n = 36 n = 36 sample 4: mean = 96.69 sd = 19.31 CV = 19.97 % sample 5: mean = 102.5 sd = 19.31 CV = 18.85 % 200 0.000 0.000 0.010 Density 0.020 50 0.010 Density 100 n = 36 0.020 0 Density 50 N = 1e+06 0.020 50 0.020 0 π ε χ ε π Pharma Edge 0.010 0.000 0.010 0.000 Density 0.020 sample 1: mean = 101 sd = 15.94 CV = 15.78 % 0.020 population: mean = 100 sd = 20.08 CV = 20.08 % 0 50 100 150 200 0 n = 36 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 50 100 150 200 n = 36 15 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Statistical Distributions π ε χ ε π Pharma Edge 0.020 0.015 0.000 0.005 0.010 Density 0.015 0.010 0.000 0.005 Density 0.020 0.025 20 samples drawn from population 0.025 population: mean = 100 sd = 20.03 CV = 20.03 % 0 50 100 150 200 0 N = 1e+06 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 50 100 150 200 n = 36 16 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Central Limit Theorem If samples are drawn by a random process from a population with a normal distribution, distribution of sample means is also normal. The mean of the distribution of sample means is identical to the mean of the ‘parent population’ – the population from which the samples are drawn. The higher the sample size that is drawn, the ‘narrower’ will be the dispersion of the distribution of sample means. π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 17 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Normal Distribution I Standard normal distribution: µ = 0, σ = 1 0.4 0.4 Standard normal distribution: µ = 0, σ = 1 0.3 0.1 0.0 -2 0 2 4 -4 0 2 z Standard normal distribution: µ = 0, σ = 1 Standard normal distribution: µ = 0, σ = 1 4 0.2 0.3 ±4σ p 99.99% 0.0 0.0 0.1 0.2 Density 0.3 ±3σ p 99.73% 0.1 Density -2 z 0.4 0.4 -4 π ε χ ε π Pharma Edge ±2σ p 95.45% 0.2 Density 0.2 0.0 0.1 Density 0.3 ±1σ p 68.27% -4 -2 0 2 4 -4 z Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 -2 0 2 4 z 18 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Normal Distribution II Standard normal distribution: µ = 0, σ = 1 0.4 0.4 Standard normal distribution: µ = 0, σ = 1 0.3 0.1 0.0 -2 0 2 4 -4 0 2 z Standard normal distribution: µ = 0, σ = 1 Standard normal distribution: µ = 0, σ = 1 4 0.2 0.3 p 99.9% ±3.291σ 0.0 0.0 0.1 0.2 Density 0.3 p 99% ±2.576σ 0.1 Density -2 z 0.4 0.4 -4 π ε χ ε π Pharma Edge p 95% ±1.960σ 0.2 Density 0.2 0.0 0.1 Density 0.3 p 90% ±1.645σ -4 -2 0 2 4 -4 z Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 -2 0 2 4 z 19 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Normal Distribution III π ε χ ε π Pharma Edge -2 0 z 2 4 0.3 0.2 Density 0.0 0.0 -4 [-1.645, +∞] 95% 0.1 0.2 Density 0.3 [-∞, +1.645] 95% 0.1 0.2 0.0 0.1 Density 0.3 [±1.645] 90% Standard normal distribution: µ = 0, σ = 1 0.4 Standard normal distribution: µ = 0, σ = 1 0.4 0.4 Standard normal distribution: µ = 0, σ = 1 -4 -2 0 2 4 z Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 -4 -2 0 2 4 z 20 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Confidence Interval I If we have drawn a sample from a population, we get the sample mean x and the sample standard deviation s. Can we make a prediction about the population mean? Yes. That’s called a Confidence Interval (CI). If σ is known: σ [µ] = x ± z n π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 21 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Confidence Interval II from previous slides: µ 100, σ 20 Sample sizes 36, z0.05 1.960 Example Samples’ means 101.0 100.1 100.2 96.69 102.5 Confidence Interval 94.47 107.5 93.57 106.6 93.67 106.7 90.16 103.2 95.97 109.0 generally we don’t know σ ! Help is on the way… But π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 22 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Student’s t Distribution Depends ν + 1 ν +1 Γ 2 − 2 x 2 f ( x) = 1+ ν ν νπ Γ 2 on one parameter, the ‘degrees of freedom ν ’. In the Γ ( x ) = ∫ t e dt most simple case df = n – 1. The t Distribution is ‘heavy tailed’ compared to the normal distribution. Small sample sizes are penalized. Approaches quickly the normal distribution for df >≈30. Allows calculation of a CI of the sample mean based on the sample standard deviation s. +∞ x −1 − t 0 π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 23 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Student’s t Distribution Student’s t distribution: ν = 5 0.4 0.4 Student’s t distribution: ν = 1 0.3 0.1 0.0 -2 0 2 4 -4 0 2 x Student’s t distribution: ν = 11 Student’s t distribution: ν = 35 4 0.2 0.3 n = 36 0.0 0.0 0.1 0.2 Density 0.3 n = 12 0.1 Density -2 x 0.4 0.4 -4 π ε χ ε π Pharma Edge n=6 0.2 Density 0.2 0.0 0.1 Density 0.3 n=2 -4 -2 0 2 4 -4 x Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 -2 0 2 4 x 24 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Confidence Interval III from previous slides: µ 100, σ 20 Sample sizes 36, z0.05 1.960, t36-1,0.05 2.030 Example Samples’ mean stand. dev. 101.0 15.94 100.1 20.41 100.2 19.61 96.69 19.31 102.5 19.31 π ε χ ε π Pharma Edge Confidence Intervals based on z based on t 94.47 107.5 95.61 106.4 93.57 106.6 93.19 107.0 93.67 106.7 93.57 106.8 90.16 103.2 90.16 103.2 95.97 109.0 95.97 109.0 [µ] = x ± z Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 σ n s [µ] = x ± t n 25 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Location parameters x = [91,72,141,119,92,124,92,101,90,145] ranks [3, 1, 9, 7, 4.5, 8, 4.5, 6, 2, 10] ordered [72,90,91,92,92,101,119,124,141,145] Mode: 92 (most frequent number) Median: 96.5 (middle value) If n=odd: value at xn/2 If n=even: value (xn/2+xn/2+1)/2 = (92+101)/2 π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 26 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Location parameters Harmonic mean: 101.9516 n n xharm = i =n = 1 1 1 1 + ⋯ ∑ x1 x2 xi i =1 xi Geometric mean: 104.2814 x geom = i =n n ∏x i =1 i = n x1 ⋅ x2 ⋯ xn = e i =n 1 ln xi n i =1 ∑ Arithmetic π ε χ ε π Pharma Edge xarithm mean: 106.7 1 i =n x1 + x2 ⋯ xi = ∑ xi = n i =1 n Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 27 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) A note on the harmonic mean Driving at 100 km/h from A to B; distance is 100 km. Driving back at 50 km/h. What is the average speed for the round-trip? 75 km/h 70.71 km/h 66.67 km/h 1 h for 100 km (A→B) and 2 h for 100 km (A←B); 200 km/3 h = 66.67 km/h. Harmonic mean 2 2 xharm = = = 66.6ɺ 1 1 0.01 + 0.02 for rates! + π ε χ ε π Pharma Edge 100 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 50 28 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Location parameters Application of any location parameter always (!) implies an underlying distributional assumption. Median: discrete (or unknown) Arithmetic mean: normal distribution Geometric mean: lognormal distribution Harmonic mean: rates Example from above sampled from a lognormal distribution Arithmetic mean: 106.7 (too high!) Geometric mean: 104.3 (correct) π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 29 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Location parameters Boxplot 60 4.2 80 4.4 100 4.6 120 4.8 140 5.0 Boxplot π ε χ ε π Pharma Edge linear scale Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 logarithmic scale 30 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Nitpicking terminology If we are estimating parameters of a distribution, we are using Estimators; e.g., the arithmetic mean is the unbiased estimator of the central tendency of the normal distribution. The numerical outcomes (i.e., values one give in the report) are Estimates. Don’t write something like ‘The point estimator was 95.34 %.’ … when it was actually a maximum likelihood estimator based on least squares means in logscale ☺ π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 31 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Dispersion parameters ordered [72,90,91,92,92,101,119,124,141,145] Quartiles (25%, 75%): Be cautious! Different methods implemented in software… 90.00, 124.00: SAS 91.25, 122.75: S, R, M$-Excel 90.75, 128.25: Minitab, SPSS, Phoenix/WinNonlin 90.92, 125.42: Hyndman & Fan (1996) … and many others! π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 32 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Dispersion parameters Standard deviation (SD) of arithmetic mean: 24.2400 SDarithm 1 i =n 2 = ( xi − x ) ∑ n − 1 i =1 SD of geometric mean: 23.8000 i =n SDgeom = e π ε χ ε π Pharma Edge ∑( 1 ln xi −ln x geom n −1 i =1 ) 2 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 33 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Dispersion parameters SD of harmonic mean: 22.8519 SDharm = π ε χ ε π Pharma Edge i =n ( n − 1) ∑ ( H i − H ) 2 i =1 1 i =n H = ∑ Hi n i =1 n −1 Hi = i =n 1 1 ∑ − j =1 x j xi Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 34 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Dispersion parameters Coefficient of Variation (sometimes given in percent of mean): CV % = 100 ⋅ SD x Population (N=106), parameters π ε χ ε π Pharma Edge Sample (n=36), parameters µ 100.00 xarithm 106.70 σ 20.00 SDarithm 24.24 SDgeom 23.80 CV% 20.00 CV % 22.72 CV % 22.83 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 x geom 104.28 35 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) A remark on Variances Whilst means and variances are additive, standard deviations (and CVs as well) are not! sample 1 2 mean 100 100 1+2 (100+100)/2 ∑s²/2 √650=25.5!! 0 50 100 n = 1000 150 200 0.000 0.010 Density 0.000 0.010 Density 0.020 mean = 100 sd = 25.5 CV = 25.5 % 0.020 mean = 100 sd = 30 CV = 30 % 0.020 0.010 0.000 Density 400 900 25?? mean = 100 sd = 20 CV = 20 % π ε χ ε π Pharma Edge s² s 20 30 0 50 100 150 n = 1000 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 200 0 50 100 150 200 n = 2000 36 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Arithm. vs. geom. means Arithmetic mean (95% CI) Arithmetic mean (95% CI) 25 10 concentration concentration 20 15 10 1 5 0 .1 0 4 8 12 16 20 24 0 4 8 time 12 16 20 24 time Geometric mean (95% CI) Geometric mean (95% CI) 25 10 15 10 concentration concentration 20 1 5 π 0 .1 ε 0 4 8 12 16 20 24 0 4 χ time ε Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs π Pharma Edge in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 8 12 16 20 24 time 37 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Median and quantiles Median (5%, 25%, 75%, 95% quantile) 25 concentration 20 15 10 Median (5%, 25%, 75%, 95% quantile) 5 10 0 4 8 12 time π ε χ ε π Pharma Edge 16 20 24 concentration 0 1 .1 0 4 8 12 16 20 24 time Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 38 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Degrees of freedom… For every estimated parameter in a statistical model one degree of freedom is ‘lost’ from the number of samples (df = n – p). Any model becomes useless if df=0, and impossible to fit if df<0 (p>n). Example: Linear π ε χ ε π Pharma Edge regression: Two parameters are fit (slope, intercept; since any line is defined by two points (x1/y1|x2,y2) at least three data points are needed (df=1). Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 39 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Degrees of freedom… 150 200 0 50 100 150 200 150 200 0 50 100 150 sample # 6 sample # 7 150 0 50 100 150 ys 200 0 50 ys 200 0 50 ys 0 50 ys 100 0 50 100 150 200 0 50 100 150 n = 12 , df = 10 sample # 8 sample # 9 sample # 10 sample # 11 150 0 50 100 150 ys 200 0 50 ys 200 0 50 ys 0 50 ys 0 50 100 0 50 100 150 200 0 50 100 150 n = 12 , df = 10 sample # 12 sample # 13 sample # 14 sample # 15 150 200 0 50 100 150 n = 12 , df = 10 200 ys 0 50 ys 0 50 ys 0 50 ys 100 n = 12 , df = 10 200 150 n = 12 , df = 10 150 n = 12 , df = 10 150 n = 12 , df = 10 50 200 150 n = 12 , df = 10 150 n = 12 , df = 10 150 n = 12 , df = 10 50 200 150 sample # 5 150 sample # 4 150 n = 12 , df = 10 0 50 0 100 n = 12 , df = 10 150 0 50 n = 12 , df = 10 0 50 50 0 50 0 n = 100 150 0 ys 150 0 50 0 50 100 sample # 3 ys 150 ys 150 y 0 50 50 150 0 π ε χ ε π Pharma Edge sample # 2 150 sample # 1 mean = 100, sd = 20 0 50 100 150 n = 12 , df = 10 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 200 0 50 100 150 200 n = 12 , df = 10 40 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Degrees of freedom… 150 200 0 50 100 150 200 200 0 50 100 sample # 7 150 0 50 100 150 200 150 200 ys 200 0 50 ys 200 0 50 ys 0 50 ys 100 150 150 sample # 6 150 sample # 5 150 sample # 4 0 50 100 150 200 0 50 100 n = 2 , df = 0 sample # 8 sample # 9 sample # 10 sample # 11 150 0 50 100 150 ys 200 0 50 ys 200 0 50 ys 0 50 ys 100 150 n = 2 , df = 0 150 n = 2 , df = 0 150 n = 2 , df = 0 0 50 100 150 200 0 50 100 150 n = 2 , df = 0 sample # 12 sample # 13 sample # 14 sample # 15 100 150 n = 2 , df = 0 200 0 50 100 150 n = 2 , df = 0 200 ys 0 50 ys 0 50 ys 0 50 ys 50 200 150 n = 2 , df = 0 150 n = 2 , df = 0 150 n = 2 , df = 0 0 50 0 150 n = 2 , df = 0 0 50 50 100 n = 2 , df = 0 150 0 50 n = 2 , df = 0 0 50 50 0 50 0 n = 100 150 0 ys 150 0 50 0 50 100 sample # 3 ys 150 ys 150 y 0 50 50 150 0 π ε χ ε π Pharma Edge sample # 2 150 sample # 1 mean = 100, sd = 20 0 50 100 150 n = 2 , df = 0 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 200 0 50 100 150 200 n = 2 , df = 0 41 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Visualize your data! Anscombe’s Quartet (1973) All datasets: meanx 9.0, s²x 10 meany 7.5, s²y 3.75 Corryx 0.898 Regryx y = 3 + 0.5x Don’t rely solely on numerical results. π ε χ ε π Pharma Edge 20 20 correct 15 15 10 10 5 5 0 wrong model: quadratic! 0 0 5 10 15 20 0 5 Anscombe1 20 15 20 10 5 5 0 0 10 20 15 20 Anscombe3 15 20 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 nonsense 15 10 5 15 Anscombe2 correct model, but biased by outlier 0 10 0 5 10 Anscombe4 42 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data Transformation? MPH, 405 subjects 1.0 0.5 Density 0.020 0.0 0.000 0.010 50 100 150 2.5 3.5 4.0 4.5 AUC [ng×h/mL] ln(AUC [ng×h/mL]) Shapiro-Wilk p= 0.26008 Normal Q-Q Plot Normal Q-Q Plot 60 4.0 80 Sample Quantiles 100 4.5 5.0 20 3.0 40 Sample Quantiles 3.0 Shapiro-Wilk p= 3.2854e-14 120 0 Clearly in favor of a lognormal distribution. ShapiroWilk test highly significant for normal distribution (rejected). 3.5 Density 0.030 1.5 MPH, 405 subjects π ε χ ε π Pharma Edge -3 -2 -1 0 1 2 3 -3 -2 Theoretical Quantiles Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 -1 0 1 2 3 Theoretical Quantiles 43 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data Transformation! MPH, 12 subjects 1.0 0.5 Density 0.020 0.0 0.000 0.010 50 100 150 2.5 3.5 4.0 AUC [ng×h/mL] ln(AUC [ng×h/mL]) Shapiro-Wilk p= 0.85764 Normal Q-Q Plot Normal Q-Q Plot 5.0 3.8 4.0 4.5 3.6 Sample Quantiles 50 40 3.0 3.2 30 20 Sample Quantiles π ε χ ε π Pharma Edge 3.0 Shapiro-Wilk p= 0.29668 60 0 3.4 Density 0.030 1.5 MPH, 12 subjects -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 -1.5 -1.0 Theoretical Quantiles Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 -0.5 0.0 0.5 Theoretical Quantiles 1.0 1.5 Data set from a real study. Both tests not significant (assumed distributions not rejected). Tests not acceptable according to GLs; logtransformation based on prior knowledge (PK)! 44 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data Transformation BE testing started in the early 1980s with an acceptance range of 80% – 120% of the reference based on the normal distribution. Was questioned in the mid 1980s Like many biological variables AUC and Cmax do not follow a normal distribution Negative values are impossible The distribution is skewed to the right Might follow a lognormal distribution π ε χ ε π Pharma Edge Serial dilutions in bioanalytics lead to multiplicative errors Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 45 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data Transformation: PK AUCT ⋅ CLT AUCR ⋅ CLR FT = , FR = DT DR AUCT Frel ( BA) = AUCR Assumption 1: Assumption 2: π ε χ ε π Pharma Edge D1=D2 (D1/D2=1*) CL1=CL2 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 46 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data Transformation ‘Problems’ with logtransformation If π ε χ ε π Pharma Edge we transform the ‘old’ acceptance limits of 80% – 120%, we get –0.2231, +0.1823. These limits are not symetrical around 100% any more, the maximum power is obtained at e0.1823–0.2231 = 96%… Solution: lower limit = 1 – 0.20, upper limit = 1/lower limit ln(0.80) = –0.2231 and ln(1.25) = +0.2231. Symetrical around 0 in the log-domain and around 100% in the backtransformed domain (e0=1). Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 47 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Data Transformation ‘Problems’ with logtransformation Discussion, π ε χ ε π Pharma Edge whether more bioinequivalent formulations will pass due to ‘5% wider’ limits lower limit = 1 – 0.20, upper limit = 1/lower 80.00% – 125.00% (width 45.00%) instead of keeping the ‘old’ width lower limit = 1 – 0.1802, upper limit = 1/lower 81.98% – 121.98% (width 40.00%) or even become more strict by setting upper limit = 1 + 0.20, lower limit = 1/upper 83.33% – 120.00% (width 36.67%) 80% – 125% was chosen for convenience (!) Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 48 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) F Distribution Allows comparison of variances (depending onν ) of two distributions. We will need that in ν ν ANOVA. Γ + ν1 ν2 ν 1 2 ν 2 2 F ( x |ν 1 ,ν 2 ) = 0 Γ ( x) = +∞ ∫t 1 2 ν1 −1 x2 2 2 ⋅ ⋅ ν 1 ν 1 ν x + ν ν1 +2ν 2 Γ Γ ( 1 2) 2 2 x≥0 x<0 x −1 − t e dt 0 Note that if one of the degrees of freedom = 1, there is a relationshop to the t distribution: π ε χ ε π Pharma Edge F (ν 1 = 1,ν 2 = ν ) = ( t (ν ) ) 2 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 49 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Significance tests In statistics (as well as in science in general) it is not possible to prove something. We can only state a hypothesis and try to reject this so called null hypothesis by evaluating data from an experiment. Example: H0: µ1 = µ2 (no difference in means, null hypothesis) vs. Ha: µ1 ≠ µ2 (different means; alternative hypothesis) π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 50 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) α- vs. β-Error All formal decisions are subjected to two types of error: Type I (α-Error, Risk Type I) Error Type II (β -Error, Risk Type II) Example from the justice system: Error Verdict Presumption of innocence not accepted (guilty) Presumption of innocence accepted (not guilty) π ε χ ε π Pharma Edge Defendant innocent Defendant guilty Error type I Correct Correct Error type II Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 51 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) α- vs. β-Error … in more statistical terms: Decision Null hypothesis true Null hypothesis false Null hypothesis rejected Error type I Correct (Ha) Failed to reject null hypothesis Correct (H0) Error type II In BE-testing the null hypothesis is bioinequivalence (µ1 ≠ µ2)! Decision Null hypothesis rejected Failed to reject null hypothesis π ε χ ε π Pharma Edge Null hypothesis true Null hypothesis false Patients’ risk Correct (BE) Correct (not BE) Producer’s risk Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 52 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) α- vs. β-Error α-Error: Patients’ Risk to be treated with a bioinequivalent formulation (H0 falsely rejected) BA of the test compared to reference in a particular patient is risky either below 80% or above 125%. If we keep the risk of particular patients at 0.05 (5%), the risk of the entire population of patients (<80% and >125%) is 2×α (10%) is: 90% CI = 1 – 2×α = 0.90 95% one-sided CI π ε χ ε π Pharma Edge 0.6 0.8 1 1.25 1.67 particular patient 95% one-sided CI 0.6 0.8 1 1.25 1.67 particular patient Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 90% two-sided CI = two 95% one-sided 0.6 0.8 1 1.25 1.67 population of patients 53 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) α- vs. β-Error β-Error: Producer’s Risk to get no approval for a bioequivalent formulation (H0 falsely not rejected) in study planning to ≤0.2, where power = 1 – β = ≥80% If power is set to 80 % One out of five studies will fail just by chance! Set π ε χ ε π Pharma Edge α 0.05 BE not BE β 0.20 Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 54 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Significance test (α- vs. β) Significance test: α, β β π ε χ ε π Pharma Edge Scrit. α Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 55 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) Part I: Basic Concepts Helmut Schütz BEBAC π ε χ ε π Pharma Edge Consultancy Services for Bioequivalence and Bioavailability Studies 1070 Vienna, Austria [email protected] Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 56 • 57 Introduction to Biostatistics (1/3: Basic Concepts) Concepts) To bear in Remembrance... In these matters the only certainty is that nothing is certain. Gaius Plinius Secundus (Pliny the Elder) The theory of probabilities is at bottom nothing but common sense reduced to calculus. calculus Pierre-Simon Laplace It is a good morning exercise for a research scientist to discard a pet hypothesis every day before breakfast. It keeps him young. Konrad Lorenz π ε χ ε π Pharma Edge Biostatistics: Biostatistics: Basic concepts & applicable principles for various designs in bioequivalence studies and data analysis | Mumbai, Mumbai, 29 – 30 January 2011 57 • 57