Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Analysis of Stability Data with Equivalence Testing for Comparing New and Historical Processes Under Various Treatment Conditions Ben Ahlstrom, Rick Burdick, Laura Pack, Leslie Sidor Amgen Colorado, Quality Engineering May 19, 2009 Agenda 1. Purpose of comparability for stability data 2. Problems with the p-value approach 3. Equivalence approach and acceptance criteria methods 4. Example 2 Example Data Packaging Data (Chow, Statistical Design and Analysis of Stability Studies, p. 116, Table 5.6) P erc en t 10 7 B lister B o ttle 10 6 10 5 10 4 10 3 10 2 2 package types (Bottle, Blister) 10 1 10 0 99 10 lots (5 for each package type) 98 97 96 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 T im e (M o n t h s) 3 1 8 6 time points (0 to 18 months) Comparability Analysis for Stability Data Purpose – Compare the rates of degradation P-value Analysis Steps – Fit the regression lines (process*time interaction) – Calculate p-value for process*time – Compare p-value to =0.05 – Draw conclusion about comparability • pass (comparable) if p-value > 0.05 • fail (not-comparable if p-value < 0.05) I.E.: Evaluate the slopes of the treatment conditions 4 P-value Analysis to Evaluate Comparability for Stability Data P erc en t 10 7 B lister B o ttle 10 6 Percent Label Claim 10 5 10 4 10 3 10 2 10 1 10 0 99 98 97 96 0 1 2 3 4 5 6 7 8 9 1 0 1 1 T im e (M o n t h s) Bottle vs. Blister: Are the processes comparable? 5 1 2 1 3 1 4 1 5 1 6 1 7 1 8 P-value Approach Hypotheses – H0: slopes are comparable – HA: slopes are not comparable If p-value < 0.05, reject H0 If p-value >0.05, fail to reject H0 – Does not imply they are comparable, but rather that there isn’t enough evidence to say the slopes are different 6 P-value Analysis to Evaluate Comparability for Stability Data P erc en t 10 7 B lister B o ttle 10 6 10 5 10 4 10 3 10 2 Packaging: Bottle vs. Blister 10 1 Do we pass or fail the p-value test? 10 0 99 98 97 Pass: p=0.8453 96 0 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 T im e (M o n t h s) We compare the slopes using p-values (Pass if p-value > 0.05 and Fail if p-value < 0.05) 7 Problems with P-value Approach Reporting a P-value only tells us something about statistical significance. – A statistically significant difference in slopes does not necessarily have any practical importance relative to patient safety or efficacy. – P-values are non-informative because they do not quantify the difference in slopes in a manner that allows scientific interpretation of practical importance. – A p-value approach provides a disincentive to collect more data and learn more about a process. 8 Equivalence Testing Method 1. Fit the model with all historical and new process data (includes different storage conditions, orientations, SKU’s, container types) 2. Compute the difference in slopes for the desired comparison Bottle vs. Blister 3. Compute the 95% one-sided confidence limits around the difference observed over the time frame of interest 4. If the confidence limits are enclosed by the equivalence acceptance criteria, conclude that the historical and new processes are comparable 9 Statistical Model Yijk i a j i Xijk ijk Parameters i and βi are the overall regression parameters for the ith process Random variables aj allow the intercepts to vary for each lot Xijk is the time value for process i, lot j, and time k. Model can be extended to more levels 10 Statistical Equivalence Acceptance Criteria (EAC) Goal Post is the space of expected historical performance Football = 95% one-sided CLs around difference between slopes over time frame of interest 11 Methods to Calculate Equivalence Acceptance Criteria (EAC) Equivalence Acceptance Criteria (EAC) provide a definition of practical importance The scientific client has the responsibility to determine a definition of practical importance (based on science, safety, specification, reg. commit., etc.) Statistical methods can help establish a starting point for these decisions Three statistical methods include: – Method 1: Common cause variability – Method 2: Excursion from Product Specification – Method 3: Historic Variability of Slope Estimates 12 3 Statistical Approaches for Defining EAC Method 1 Method 2 Method 3 EAC based on common cause variability of the historic process EAC based on product specification EAC based on historic variability of slope estimates Column 5 0.4 0 0.3 0.2 0.1 -3 1 2 Response 0 N-Quantile probability 3 2 Error New -1 K2 2 Lots -2 New Pth lower percentile centered at historic mean where P is probability of excursion Mean of historical at expiry Overlay Plot Response Hist 1 Response Hist 3 K 0.4 0 0.3 0.2 0.1 -3 Overlay Plot Column 5 -2 0 N-Quantile probability -1 1 Spec (LSL) 2 3 Acceptable difference in slopes is = K/T 0 Time (months) Acceptable difference in slopes is = K/E. T -EAC is expressed as average change in response per month Pth lower percentile centered at new mean 0 Time (months) E (Expiry) -Requires a specification -EAC is expressed as average change in response per month 13 0 Time (months) T -Requires at least 3 different lots in historic data set -EAC is expressed as change response per month Comparability in Profile Data Reference condition A Quality attribute Difference between intercepts t = 0 difference B-A Totalbetween B conditions at New condition time T (intercept and slope) T 0 Time (months) 14 Difference in response averages attributed to the difference in slopes B–A=δ T EAC Method 1: Common Cause Variability Criteria is based on historical performance at various conditions 2 2 (Lot 2e ) T Lot to Lot variability Measurement variability Multiplier aligned with other statistical limits used to separate random noise from a true signal Goal Post is the space of expected historical performance 15 EAC Method 1: Common Cause Variability 1 2 2 2 (Lots Error ) T 2 2 Lots Error is unknown; replace with a 95% upper bound on this quantity T = Expiry = 18 months 2 2.4498 1 0.2722 % per Month 18 16 Percent Label Claim, P-value approach vs. Equivalence Test P-value Equivalence Slope Bottle -0.2892 -0.2892 Slope Blister -0.2783 -0.2783 Key Point P-value 0.8453 NA Slope difference over 18 months • Slope estimates are the same for both approaches NA -0.08267 Goal Post NA +/-0.2722 Result PASS PASS 0.1046 P erc en t 10 7 B lister B o ttle 10 6 Percent Label Claim 10 5 Equivalence graph 10 4 10 3 10 2 10 1 10 0 99 98 97 96 -0. 2722 0 0.2722 0 1 2 3 4 5 6 7 8 9 1 0 1 1 T im e (M o n t h s) Difference in Slopes 17 1 2 1 3 1 4 1 5 1 6 1 7 1 8 EAC Method 2: Product Specification Maximum allowable difference in slopes where new and historic have < p% excursion rate at expiry Typically p=0.01, 0.025, 0.05 Use historic data Relates comparability to specification 18 EAC Method 2: Product Specification Hist Overlay Plot Column 5 0.4 0.3 0.2 0.1 0 -3 -2 Response Pth lower percentile centered at historic mean where P is probability of excursion Mean of historical at expiry 0 N-Quantile probability -1 1 2 3 New 0.4 0.3 0.2 0.1 0 -3 Overlay Plot Column 5 K -2 0 N-Quantile probability -1 1 Spec (LSL) 2 3 Pth lower percentile centered at new mean Acceptable difference in slopes is = K/E. 0 Time (months) 19 E (Expiry) EAC Method 2: Product Specification 2 K Expiry 2 2 LSL K Predicted Y at expiry Z1P Lots Error K is unknown, so replace term in brackets with lower one-sided (1-P)*100% individual confidence bound based on historical (prediction bound) Assume Lower Spec Limit (LSL) = 95 Expiry = 18 months 97.403 95 2 0.1335 % per month 18 20 EAC Method 3: Historic Slope Variability Use historical data for calculation Historical dataset provides nH independent estimates of the common slope β EAC based on 99.5th percentile of distribution of difference in slopes from same lot. If observed slope difference is consistent with this variability, equivalence is demonstrated. 21 EAC Method 3: Historic Slope Variability Response ^ 1 ^ ^ 3 0 Time (months) 22 T EAC Method 3: Historic Slope Variability 1 1 3 2.576 U nH nN θ3 is the 99.5th percentile of the distribution of βˆ H -βˆ N 2.576 is the 99.5th percentile of the standard normal distribution U is a 95% upper bound on the standard error for an estimate of β based on a single lot 1 1 3 2.576 .09176 0.1495 5 5 23 Comparison of Equivalence Acceptance Criteria Criteria Method 1 2 3 Theta Slope Difference over18Hard for a client to know a difference in slopes MonthswhatResult of, say, 0.1 % looks like in a -0.08267 table Pass -/+0.2722 to 0.1046 Once client sees graph, they -0.08267 can get a feel for what a -/+0.1335 Pass difference in slope means to 0.1046 Can visualize what the -0.08267 -/+0.1495 Passrange of regression possible to 0.1046 lines could be to still claim equivalence 24 Comparison of Equivalence Acceptance Criteria EAC Based on Bottle 104 103 Percent Label Claim 102 101 Bottle Method1 Method2 Method3 100 99 98 97 96 95 94 0 6 12 18 Time (months) 25 Based only on historical data Graph is created before data for the new process is collected Results by Method Historical New HA: Show δ is less than some amount deemed practically important Equivalence is demonstrated by computing two one-sided tests (TOST) If the 95% lower one-sided confidence bound on δ is greater than -θ and the 95% upper onesided confidence bound is less than θ, then equivalence is demonstrated 26 Slope Difference over 18 Months Criteria Method Theta 1 -/+0.2722 -0.08267 to 0.1046 Pass 2 -/+0.1335 -0.08267 to 0.1046 Pass 3 -/+0.1495 -0.08267 to 0.1046 Pass Result P-value Approach vs. Equivalence Approach P-value Approach Equivalence Approach Ho: slopes are comparable Ho: slopes are not comparable HA: slopes are not comparable HA: slopes are comparable P-value Equivalence acceptance criteria set a priori Based on interval estimates of slope difference using mixed regression model with random lots Statistical convention is to have research objective in HA 27 Summary P-value approach to comparability has numerous issues – High p-values do NOT prove equivalence – High p-values only indicate that there is NOT enough evidence to conclude slopes are different – At times, leads to ad hoc analysis requests when p-value is small – P-values sensitive to sample size Goal posts allow you to state equivalence – Industry is moving in the direction of equivalence tests Can be extended to accelerated studies Move to Equivalence Testing for Comparability 28 References Limenati, G. B., Ringo, M. C., Ye, F., Bergquist, M. L., and McSorley, E. O. (2005). Beyond the t-test: Statistical equivalence testing. Analytical Chemistry, June 2005, pages 1A-6A. Chambers, D. , Kelly, G., Limentani, G., Lister, A., Lung, K. R., and Warner, E. (2005) Analytical method equivalency: An acceptable analytical practice. Pharmaceutical Technology, Sept 2005, pages 64-80. Richter, S. , and Richter, C. (2002). A method for determining equivalence in industrial applications. Quality Engineering, 14(3), pages 375-380. Park, D. J. and Burdick, R. K. (2004). Confidence Intervals on Total Variance in a Regression Model with an Unbalanced Onefold Nested Error Structure, Communications in Statistics, Theory and Methods, 33, No. 11, pages 2735-2743. 29 Back up slides 30 Back up slides EAC Method 2 Equal Difference Assumption: Controlled room temperature Recommended temperature = Any temperature This assumption may not always hold – The p-value for the interaction between time, process, and temperature tests this assumption 31 Comparison of Equivalence Acceptance Criteria Plot regression line for historical process At time=0 the value is ̂ Bottle vs. Blister Calculate 104 103 ME 2 estimated standard error of ˆ 1.645 102 Percent Label Claim Plot 2 additional lines Value at time=0 is Values at time=T are ̂ 101 Bottle Method1 Method2 Method3 100 99 98 97 96 95 ˆ ˆ ME T 94 0 6 12 Time (months) ˆ ˆ ME T 32 18