Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sampling Affects “Sensitivity” Wesley Johnson Department of Statistics and Graduate Group in Epidemiology University of California, Davis Basic Problem We consider taking, perhaps multiple, samples (subsamples) from each animal with the purpose of determining infection status of the particular animal • • • • Each sub-sample is tested independently for the infection using a single diagnostic test The volume of material in each sub-sample may vary around some mean value, or may be fixed Infected cows exhibit their own concentration of infectious material in the sampled medium; the concentration for some infected cows may be too low for the diagnostic test to detect Even if the concentration is above the detection limit, the particular samples taken may not contain detectable material Outline n n n n We consider the effect of sampling on the proportion of test positive outcomes, and ultimately, on the sensitivity of a testing procedure, and the PVP Expressions are presented that depend on • The proportion of infected animals with sufficient concentration of pathogens • The pathogen concentration for the given animal • The number of sub-samples taken • The sensitivity of the diagnostic test used We illustrate the effect of various choices of these parameters on the proportion of positive outcomes, the overall sensitivity, and on the predictive value positive for the overall procedure Conclusions and References Notation T+ Denotes positive outcome on Test T- Denotes negative outcome on Test Se = Pr( T+ | I ), Sp = Pr( T- | no I ) I = Infection Con = Concentration of “infection” per unit volume of sampled medium e.g. blood, feces etc. Vol = Volume of sampled medium per subsample Assumptions Assumption1: If Con < c0 , for some pre-specified value, then I is not detectable by the screening test under any circumstances Assumption2 : (Poisson Assumptions) Detectable material in medium is • Not “clumped” • The number of detectable units in any small volume of material is proportional to Con times the volume • The numbers of detectable units in separate samples are indpendent Assumptions and Notation Assumption3 : Sub-samples are taken randomly, so the rate of detectable material per sub-sample is Con*Vol The above assumptions imply that the number of detectable units per sub-sample can be regarded as a Poisson random variable with rate Con*Vol, which we now term as R. This means that we can calculate the probability of at least one detectable unit in a single sub-sample as p = 1 - exp(-R). Finally, we let the proportion of infected animals with detectable concentration be Prob( Con > c0 | I ) Probability of Detection For given animal with Cons >c0, and with given R, p and Se, it is possible to calculate (using elementary probability theory), the proportion of tested animals, with these same values, that would be detected using the diagnostic test as having the infection: Probability of Detection = 1- {1- p*Se}k If a single sub-sample is taken, this probability is Probability of Detection = p*Se If Con is less than or equal to c0 , the probability is 0 Illustration • • • • Consider the problem of detecting Salmonella in cattle feces. A sample of fecal material is taken from a cow, and k swabs are taken. We assume that 80% of infected cows will have sufficient concentration of Salmonella to detect with the standard diagnostic test, P(Cons >c0) = 0.80. We assume a concentration of 3 “units” per gram are exhibited in the cow, and that a swab will contain a single gram of fecal material on average, thus R = 3 and consequently p = 0.95, e.g. 95% of swabs have at least one detectable unit on them. Thus, with a single swab, Prob of Detection = 0.95*Se and if Se=0.95, this is 0.902. Effect of Increasing K n n n n With Se = 0.95 and with k = 2 and 4 respectively, we obtain 0.99, 0.9999 With Se = 0.90, and the same k’s, we obtain 0.980 and 0.9996 respectively. With Se = 0.80, and the same k’s, we obtain 0.942 and 0.997 respectively. With Se = 0.50, and k =2,4,8,16, we obtain 0.72, 0.924, 0.994 and 0.99997. Effect of Concentration n n n Now suppose R=1 and thus p= 0.63, and let Se = 0.95. Then with k=1, the probability of detection is 0.63*0.95=0.60 With k = 2,4, and 8, we obtain 0.84, 0.974, 0.9993 respectively. Of course, when the concentration is too small, the probability of detection is zero Practical Issues n n n You will never know Con or Vol exactly for a given animal. Solution: Use the best known average value for Con among animals like the one in question, preferably using data that was obtained for this purpose. Also need to know Vol. Need data to obtain an estimate of the average value of Vol; then use the average values obtained for Con and Vol to obtain R, which then is used to represent an average infected animal under a standard application of the diagnostic procedure. The probability we calculated is then the probability for an average cow; probabilities for individual cows will be larger than or smaller than the calculated value Obtaining the PVP*=Prob(I|T+) n n n Here, we calculate the predictive value positive (PVP*) for a given animal, using the overall diagnostic procedure including sub-sampling, and using a diagnostic test with given Se and Sp, e.g., what are the chances that an animal that just tested positive is actually infected. This is done using Bayes formula and requires the prevalence of infection in the population from which the animal in question was sampled, and the sensitivity (Se*) and specificity (Sp*) of the overall diagnostic procedure. The probability of detection that we just calculated was obtained under the presumption that the animal was infected, and that the concentration, Con, of infectious material was above the cutoff. PVP Calculation n n n This probability is thus the sensitivity of the overall screening procedure, among those animals whose Con value exceeds the cutoff, c0. Thus, if say only half of the infected animals had Con values exceeding the cutoff, the sensitivity of the overall screening procedure would necessarily only be half the value that we have been calculating Thus, the sensitivity of the overall screening procedure, which includes sub-sampling, under the assumption that the animal in question has concentration, Con, and, assuming the average amount of material per sub-sample is Vol, is Se* = Pr( T+ | I, R )* P(Con > c0| I ). Calculating the PVP n n The specificity of the overall procedure is assumed to be the same as the specificity of the diagnostic test itself, namely Sp* = Sp Let Prev, be the proportion of animals with the infection in the population that was sampled The Formula for the PVP n n n Let a = Prev*Se*, the proportion of overall True Positive results And let b = (1-Prev)*(1-Sp*), the proportion of overall False Positive results Then PVP = a/(a+b) Illustration of PVP* n n n n Back to the Salmonella example, let the prevalence of Salmonella in a given herd be Prev = 0.10. Assume that 80% of Con values exceed the cutoff, that is Pr(Con >c0) = 0.8 Assume the basic sensitivity and specificity of the diagnostic test are Se= 0.95 and Sp=0.98 respectively Assume that R = 3, so p = 0.95 Illustration of PVP* Then with k=1 and a positive outcome, Se* = 0.902*0.8 = 0.72, a=0.10*0.72=.072, b = 0.90*(0.02) = 0.018, and hence, PVP* = 0.072/(0.072 + 0.018) = 0.8 • With k=2 and 4, we obtain Se* values of 0.792 and 0.8 respectively, and PVP* values of 0.815 and 0.816 respectively • Clearly, once k is large enough to get the sensitivity among animals with Con larger than the cutoff to be near one, there is no need to increase k to improve PVP* • Summary and Conclusions n n n n Simple probability calculations allow for the calculation of an overall sensitivity of a procedure that allows for sub-sampling Sub-sampling has a clear effect on the sensitivity of the overall screening procedure Increasing the number of sub-samples can result in a dramatic increase in the sensitivity of the screening procedure, among those animals with concentrations that are detectable The proportion of infected animals with nondetectable concentrations can have a large effect on the overall sensitivity, and consequently on the overall PVP* Reference n n Geng, S., Campbell, R.N., Carter, M. et. al. (1983). Quality-control programs for seedborne pathogens. Plant Disease. Vol. 67, p. 236-42. Utts, J.M. and Heckard, R.F. (2002). Mind on Statistics. Duxbury Press, Belmont, CA.