* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download An Approach Finding the Right Tolerance Level for Clinical Data Acceptance
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Relational model wikipedia , lookup
Clusterpoint wikipedia , lookup
An Approach Finding the Right Tolerance Level for Clinical Data Acceptance Karen Walker, Walker Consulting LLC, Chandler, Arizona ABSTRACT Highly broadcasted zero tolerance initiatives for database error involving clinical trials are discussed ad infinitum among pharmaceutical research executives. Their case is construed as 100% audit of efficacy variables and serious adverse events to prove the quality of the data used in the clinical trial is good. For some clinical trials studies databases are too large to perform an audit of all the efficacy and safety variables. Therefore, an audit proving a large database acceptable without a structured approach may be dubious at best. For instances such as this, it is reasonable to employ a systematic approach for finding the right level of tolerance for clinical data acceptance. This approach must test the following markers: tolerance limit, error rate, edit checks, manual review, sample sizes, and a rank of sampling schemes; given the use of a statistical inference to determine how good the error rate is with regard to specific tolerance limits. We will set and acceptable level of quality or tolerance limit that certifies the elements converted and stored in the database match all sources of information. INTRODUCTION The purpose of this presentation is to add rhyme or reason to the task of validating a clinical study database. For the scope of this discussion we will consider a typical collection of case report forms rendered to the database by doing manual review. From a random sample of records; we will calculated and error rate for how many records match with the original source data, and how many records are flawed. With further evaluation of summary data points we will learn more about the data using a computerized edit check routines. Thus errors can be corrected without having an effect upon our data integrity. Our tolerance level is set to zero. A test of hypothesis will be employed as a systematic way to tell us whether or not to accept the clinical database. Finally you will observe how to arrive at the same conclusion using as SAS procedure. TOLERANCE LIMIT The tolerance level is a value specified by clinical research executives under onerous corporate directives, and regulatory requirements. Therefore the onus of proving with documented evidence the acceptability of clinical data falls upon Clinical Data Management. The error rate is calculated from the amount of meaningful data communicated properly proportional to that data which is not. This covers all obvious mismatches with source data as well as data incongruencies. For example, consider a clinical database having 100,000 records, after checking 1000 of them if only 10 errors are found, the estimated error rate is 10 for every 1000 records checked or 0.01. Let’s hold on to this value by calling it “P hat”. Error rate = P Hat Î 0.01 To satisfy our objective for tolerance, the errors must be 1 per every 1000 records checked because this will be approximately 0.001. That shows P Hat of almost zero. If the tolerance level is zero then we need the error rate to be zero. P Hat is almost zero when we calculated it from checking 1000 records and only find one error. How can you achieve a P Hat of almost zero when you are checking more than 1000 records? If P Hat is the ratio of 1000 records to 1 error then let us say that “P Knot” is the limit for the ratio of records to errors as the rate approaches zero. This way, it’s now reasonable to consider the tolerance level the limit as the true error rate “P” approaches zero. As a result, the tolerance level is better understood as the “Tolerance Limit” and we can call it “P Knot”. Pknot = LIM PÆ0 1 P ERROR RATE The number of data items converted to the database is checked against the original source information. The error rate is the ratio of mismatched observations over the total observations checked. The true error rate is denoted as “P”. As a subject undergoes treatment during a clinical study, case report forms are filled out and the database records are populated. The number of discrepancies from what was written on the form to that populated in the database contributes to the true error rate. For example, the “INFORMED CONSENT” case report form instrument contains these data points: Subject, Subjects Initials, date the informed consent was signed. In addition, clinical study information like: Protocol Number, Sponsor, and IND number. If any of these data points are incomplete or missing this counts as an error. Also note there are other types of errors from situations conducting the trial. These errors cannot be described as contributors to the error rate. For instance, consider a randomized double blind clinical trial for a drug that generates cells. This study measures growth of certain cells based upon dose treatment groupings. Say that a subject experienced logarithmic growth of cells for a specific group, and then has an adverse event with subsequent dose modification. This kind of situation can be cause for confusion in the clinical database, making data records appear out of sync. Keeping the data clean of this kind of confusion requires close attention, with meticulous data entry and additional peer verification. MANUAL REVIEW Audit of a clinical trial database is needed to insure the correctness of the data that has been populated from source instruments such as case report forms. The practice of populating a clinical database is sometimes done by double entry verification first. With double entry verification, two people entering the same records from source documents compare their entries. Should they be consistent, the database is populated. If the records are inconsistent, a third person checks the record, creates a discrepancy where there is confusion. Or if possible corrects the inconsistency then populates the database. EDIT CHECKS Queries are small programs created to perform comparisons of source data points to that which is in the database. Most clinical database management systems are equipped with query languages for building Edit Check programs. The programs are simple with a few basic constructs like (CREATE, SELECT, FROM, WHERE). The SQL query language is available with SAS ™. It is used here to show how a basic query routine looks. SAS™ source code: Proc SQL; Create table informed as Select subjects, 1 as errorflg from CLINICALDB.INFCNST Where INFCNDT ^=.; Quit; This query will execute SQL running on a UNIX server, or under WINDOWS, or even MVS. The Edit Check is looking for any clinical trial study subjects who failed to sign an informed consent by checking if the signed date is missing. The program ask the computer to create a data table in the SAS system called “INFORMED” that will capture all subjects and mark the subjects with missing signature dates as errors. The errors are captured in the data element “ERRORFLG”. “ERRORFLG” is a boolean variable that counts 1 for records with errors, and 0 for records that are good. You will recall that having no informed consent date represents a discrepancy and counts as an error. If you summarized the number of data records on table “INFORMED”, then set that number proportionately with the total of “1” values for “ERRORFLG” we arrive at our error rate for the INFORMED data table. 2 SAMPLE SIZES The clinical data table “INFORMED” is just one of many data tables we find in clinical databases. A clinical database uses many components to store case report information for safety and efficacy data. If you checked each and every record in our clinical database against the original source case report form, you would know the true error rate. Consider the case when the database is very large, this effort would be enormous. To avoid checking every record, we need a method that is acceptable. We can make a sound statistical inference based upon a portion of the observations by using a sampling of the entire database. Recall that P Hat represented 1000 records checked from a database of 100,000. “P HAT” is the rate calculated by the number of incorrect items over the number of records check instead of the total possibilities. SAS™ source code: Proc means data=CLINICALDB noprint; Var errorflg; Output out=quality n=totlarec stderr=se run; sum=error min=minerror max=maxerror SAS™ has an easier way to find the error rate; it gives us the total number of records, or observations for all components in our database. The maximum and minimum errors for the component tables, sum of the errors, and the standard error for a normal distribution. Here we check the value of the ERRORFLG variable for every table contained in CLINICALDB using SAS™ PROC MEANS. PROC MEANS automatically delivers the statistics we need to prove the database has and error rate below our established tolerance level. So far, nothing has been determined about the distribution of errors for our clinical database. However, if the database is large enough there will be errors. Of that we can be sure. Since our errors are based upon the number incorrect items over the number items checked, some consideration must be given to the types of data checked. Therefore, to make a fair and unbiased statistical inference about the database a random sampling is required. APPROACH Our objective is to prove that the clinical database is of the highest quality possible. Then find a way to show that it is error free. You can show this by doing a test of hypothesis that supports acceptance of the database. P Hat < Pknot Æ Error rate is less than the tolerance limit Ho P Hat >= Pknot Æ Error rate is greater than the tolerance limit MODIFIED APPROACH The strength of our approach depends upon how sure we are about the errors. For a complete 100% audit, the hypothesis test give the strongest assurance the statistical inference is good, and all errors. However, the purpose of this paper is to find a systematic approach finding the right tolerance level for clinical data acceptance without having to do 100% audit of the clinical database. So let us give some thought to how likely it is for the inference to breakdown when using an estimated error rate. Two factors affecting the precision of the hypothesis test other than the error rate and tolerance limit are: 1) the possibility we can find the error rate is less than the tolerance limit and be wrong about it, 2) the possibility we can find the error rate is greater than the tolerance limit and be wrong about it. Factor 1 will be called the alpha (α) type 1 errors. Factor 2 will be called the beta (β) type II errors. Since we are dealing with a large amount of data, by definition we have control over type I and type II errors. Definition tells us that the number of errors in a sample for a finite population resembles a normal distribution. So that if we choose a large enough sample size, and let the tolerance limit approach zero, the rate of errors resembles a normal distribution which yields the following standard error: SE = P(1-P) N-n N N-1 3 Using a standard normal distribution we have full control of type 1 & type 2 errors so we will set them to: Alpha α Beta β Type 1 = 0.5 Type 2 = 0.5 Given a zscore of (1-α) 100% With a standard normal distribution we gain better control over theissues affecting our statistical inference. We have definitions for the standard error(SE), alpha(α), beta(β), true error rate (P), calculated error rate (P Hat) and tolerance limit (P Knot). Therefore the null hypothesis that the true error rate is greated than or equal to the tolerance limit can be tested against the alternative hypothesis that the true error rate isles than the tolerance limit. Ho: P >= P Knot verses Hi: P < P Knot Where P Knot = P Knot ± SE*zscore SAS PROCEDURE ********************************************************************************************************************* *** *** *** *** *** *** *** *** *** *** *** *** *** PURPOSE: DATABASE ACCEPTANCE DESCRIPTION: This program reads clinical database CLINICALDB discrepancy counts and compares then to approaching zero tolerance quality standards **** inputs: db.quality.sas7bdat error totalrec phat se l_se pknot = = = = = = error flag number of observations in CLINICALDB error rate standard error log of the standard error tolerance limit *** **** **** **** **** **** **** **** **** **** **** **** **** **** *** **** *** **** ******************************************************************************************************************************** Options ls=132 ps=60 nodate nonumber; Libname db “~”; **** input quality SAS dataset ; %let alpha = 0.05; %let beta = 0.05; Data pknot: **** Pknot approaches zero; Input pknot; Cards; .005 .004 .003 .002 .001 .0005 .0004 .0003 .0002 .0001 Run; 4 Data test; Merge db.quality pknot; l-se phat zscore lower loglw = log(se) = error/totalrec; = probit(1-&alpha); = pknot-se*zscore; = exp(log(pknot)-l_se*zscore); if (phat < lower) or (phat < loglw) then ho = “ Accept the Clinical Database”; else if phat =>> pknot then ho = “Reject the Clinical Database”; else ho = “Check the next Clinical Databse observations”; run; **** PRINT out ERROR RATE and FINAL DECISION *******; Proc print data=test noobs label; Var phat lower loglw ho; Label phat = ‘Error Rate for the Database’ Lower = ‘Lower Bound’ Loglw = ‘Log base 10 of Lower Bound’ Ho = ‘Result of Hypothesis Test’ ; Run; CONCLUSION Clearly you can see that by factoring in the standard error for a normal distribution and allowing the tolerance limit to approach zero we show that finding a zero tolerance level can be achieved for a large clinical data base. We’ve learned that validating a large database can be documented and proven by using a statistical test of hypothesis if we consider that the chance of making an error or falsely reporting a clinical study database acceptable depends upon the true error rate, the tolerance limit, and the sample size. Although the tolerance level must stay fixed at zero, by setting the sample size, making full use of manual review, and edit checks we can influence the error rate such that it becomes smaller and smaller with every iterative evaluation of the clinical data points. REFERENCES 1. 2. 3. Rondel RD, Varley SA, Webb CF. Clinical Data Management. 2nd edition New York NY: John Wiley and Sons; 2000 Kirk Paul Lafler, PROC: SQL Beyond the Basic using SAS™, SAS™Books by Users, Cary NC: 2004 Zhang, Drug Information Journal, Volume 38 Number 4, 2004 ISSN 0092-8615 pages 371-386 ACKNOWLEDGMENTS I wish to thank George Fernandez, I wish to thank God above who makes all things possible CONTACT INFORMATION (In case a reader wants to get in touch with you, please put your contact information at the end of the paper.) Your comments and questions are valued and encouraged. Contact the author at: Karen Walker Walker Consulting LLC 6481 West Gary Drive Chandler, Arizona 85226 (925) 202 - 1524: [email protected] SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. 5