Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
International Biometric Society ESTIMATION OF THE INTER-OBSERVER VARIABILITY IN DIAGNOSTIC TRIALS: KAPPA VS. KRIPPENDORFF’S ALPHA Antonia Zapf Department of Medical Statistics, University Medical Center Göttingen The reproducibility of test results is an important topic in diagnostic accuracy trials. Especially in trials for imaging agents, where the decision can be rather subjective, the interobserver variability should be estimated and discussed. In the corresponding European guideline [1] as well as in the STARD- (STAndards for Reporting of Diagnostic accuracy) statement [2], the kappa-coefficient is mentioned as a measure for agreement. A review of the literature reveals that kappa together with the percentage agreement is used in general (see for example Leeuwenburgh et al. [3]). However, as Cohen‘s kappa can lead to paradoxic results (see for example Feinstein and Cicchetti [4]), many alternative measures were proposed during the last years. Krippendorff’s alpha as one alternative has the advantage of great flexibility (several observers/categories and missing values can be considered). Therefore, the properties of kappa and Krippendorff’s alpha will be compared in the talk. Results of a simulation study and of examples will be presented and discussed. References: [1] EMA, CHMP (2010). Appendix 1 to the guideline on clinical evaluation of diagnostic agents on imaging agents.Doc. Ref. EMEA/CHMP/EWP/321180/2008. [2] Bossuyt et al. (2003). The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clinical Chemistry, 49(1):7-18. [3] Leeuwenburgh et al. (2013). Accuracy and interobserver agreement between MR-nonexpert radiologists and MR-experts in reading MRI for suspected appendicitis. European Journal of Radiology, doi: 10.1016/j.ejrad.2013.09.022. [4] Feinstein and Cicchetti (1990). High agreement but low kappa: I. The problems of two paradoxes. Journal of Clinical Epidemiology, 43(6):543-549. International Biometric Conference, Florence, ITALY, 6 – 11 July 2014