Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Possible Roles for Reinforcement Learning in Clinical Research S.A. Murphy November 14, 2007 1 Outline Goal: Improving Clinical Decision Support Systems Using Data – – – – Clinical Decision Support Systems Critical Decisions Types of Data Challenges • Incomplete, primitive, mechanistic models • Measures of Confidence – Clinical Trials 2 3 4 Patient Evaluation Screen with MSE Questions 5 6 Outline – – – – Clinical Decision Support Systems Critical Decisions Types of Data Challenges • Incomplete mechanistic models • Measures of Confidence – Clinical Trials 7 Critical Decisions • Which treatments should be offered first? • How long should we wait for these treatments to work? • How long should we wait before offering a transition to a maintenance stage? • Which treatments should be offered next? 8 Critical Decisions • All of these questions relate to the formulation of a policy. • Actions include medications, behavioral therapies, delivery mechanisms, monitoring • Observations include biological measures, family history, severity, side effects, functionality, symptoms • Rewards include functionality, side effects, symptoms 9 Outline – – – – Clinical Decision Support Systems Critical Decisions Types of Data Challenges • Incomplete mechanistic models • Measures of Confidence – Clinical Trials 10 Types of Data • Large Observational Data Sets – Actions are not manipulated by scientist • Clinical Trial Data – Actions are manipulated by scientist • Bench research on cells/animals/humans 11 Clinical Trial Data Sets • Experimental trial data collected for research purposes – Scientists decide proactively which data to collect and how to collect this data – Use scientific knowledge to enhance the quality of the proxies for observation, reward – Actions are manipulated (randomized) by scientist – Short Horizon (less than 5) – Hundreds of subjects. 12 Observational Data Sets • Observational data collected for research purposes – Scientists decide proactively which data to collect and how to collect this data – Use scientific knowledge to enhance the quality of the proxies for observation, action, reward – Actions are not manipulated by scientist – Moderate Horizon – Hundreds to thousands of subjects. 13 Observational Data Sets • Clinical databases or registries– (an example in the US would be the VA registries) – Data was not collected for research purposes – Use gross proxies to define observation, action, reward – Moderate to Long Horizon – Thousands to Millions of subjects 14 Outline – – – – Clinical Decision Support Systems Critical Decisions Types of Data Challenges • Incomplete mechanistic models • Measures of Confidence – Clinical Trials 15 Availability of Mechanistic Models • In many areas of RL, scientists can use mechanistic theory, e.g., physical laws, to model or simulate the interrelationships between observations and how the actions might impact the observations. • Scientists know many (the most important) of the causes of the observations and know a model for how the observations relate to one another. 16 Incomplete Mechanistic Models in Medical Sciences • Scientists who want to use data on individuals to construct policies must confront the fact that non-causal “associations” occur due to the unknown causes of the observations. 17 Conceptual Structure in the Medical Sciences (observational data) Unknown Causes Observations Unknown Causes Action Time 1 Observations Time 2 Action Time 2 Reward Time 3 18 Unknown, Unobserved Causes (Incomplete Mechanistic Models) Maturity/ Decision to join "Adult" Society Unknown Causes + + Binge Drinking Treatment Time 1 Binge Drinking Time 2 Counseling Time 2 Functionality Time 3 19 Unknown, Unobserved Causes (Incomplete Mechanistic Models) • Problem: Non-causal associations between treatment (here counseling) and rewards are likely. • Solutions: – Collect clinical trial data in which treatments are randomized. This breaks the non-causal associations yet permits causal associations. – Prior to applying methods to observational data proactively brainstorm with domain experts to ascertain and measure the main determinants of treatment selection. Then take advantage of causal inference methods designed to minimize assumptions on the data20 Unknown, Unobserved Causes (Incomplete Mechanistic Models) Maturity/ Decision to join "Adult" Society Unknown Causes "+" Observations Treatment Time 1 Binge Drinking Time 2 Counseling Time 2 Functionality Time 3 21 Unknown, Unobserved Causes (Incomplete Mechanistic Models) Maturity/ Decision to join "Adult" Society Unknown Causes + - Binge Drinking Yes Counseling on Health Consequences Yes/No - Binge Drinking Yes/No Time 2 Sanctions + counseling Yes/No Functionality Time 3 22 Unknown, Unobserved Causes (Incomplete Mechanistic Models) • The problem: Even when treatments are randomized, noncausal associations occur in the data. • Solutions: – Recognize that parts of the Q-function/transition probabilities can not be informed by domain expertise as these parts reflect noncausal associations – Or use methods for constructing policies that “average” over the non-causal associations between action and reward. • I think that the importance of this second causal inference problem depends on the kind of data and how you use it. 23 Measures of Confidence – Measures of confidence are essential • Noisy data • Need to know when any one of a subset of actions will yield the best rewards –that is, when there is no or little evidence otherwise. • It is important to minimize the number of observations that must be collected in the clinical setting 24 Measures of Confidence • We would like measures of confidence for the following: – To compare the value of two estimated policies (both estimated using the training data). – To assess if there is sufficient evidence that a particular observation (e.g. output of a biological test) should be part of the policy. – To assess if there is sufficient evidence that a subset of the actions lead to better rewards for a given observation than the remaining actions. 25 Measures of Confidence • I must both learn the policy and provide an evaluation of the policy using one data set. • The data set is small 26 Measures of Confidence • Traditional methods for constructing measures of conference require differentiability (if frequentist properties are desired). • Q-functions are constructed via nondifferentiable operations (e.g. maximization). • The value of a policy is a non-differentiable function of the policy. 27 Outline – – – – Clinical Decision Support Systems Critical Decisions Types of Data Challenges • Causal ::: Unknown, unobserved causes • Measures of Confidence – Clinical Trials 28 Clinical Trials • Data from the --short horizon– clinical trials make excellent test beds for combinations of supervised/unsupervised and reinforcement learning methods. – Developing methods for variable selection in decision making (in addition to variable selection for prediction) – Model selection when goal is learning good policies. – Confidence intervals for the difference in value between two policies. – Feature Construction 29 ExTENd • Ongoing study at U. Pennsylvania (D. Oslin) • Goal is to learn how best to help alcohol dependent individuals reduce alcohol consumption. 30 Oslin ExTENd Naltrexone 8 wks Response Random assignment: Early Trigger for Nonresponse Random assignment: TDM + Naltrexone CBI Nonresponse CBI +Naltrexone Random assignment: 8 wks Response Naltrexone Random assignment: TDM + Naltrexone Late Trigger for Nonresponse Random assignment: Nonresponse CBI CBI +Naltrexone 31 Adaptive Treatment for ADHD • Ongoing study at the State U. of NY at Buffalo (B. Pelham) • Goal is to learn how best to help children with ADHD improve functioning at home and school. 32 ADHD Study A1. Continue, reassess monthly; randomize if deteriorate Yes 8 weeks A. Begin low-intensity behavior modification A2. Add medication; bemod remains stable but medication dose may vary AssessAdequate response? No Random assignment: Random assignment: A3. Increase intensity of bemod with adaptive modifications based on impairment B1. Continue, reassess monthly; randomize if deteriorate 8 weeks B. Begin low dose medication AssessAdequate response? No Random assignment: B2. Increase dose of medication with monthly changes as needed B3. Add behavioral treatment; medication dose remains stable but intensity of bemod may increase with adaptive modifications 33 based on impairment Studies under review • H. Jones study of drug-addicted pregnant women (goal is to reduce cocaine/heroin use during pregnancy and thereby improve neonatal outcomes) • J. Sacks study of parolees with substance abuse disorders (goal is reduce recidivism and substance use) 34 Jones’ Study for Drug-Addicted Pregnant Women rRBT 2 wks Response Random assignment: tRBT Random assignment: tRBT tRBT Nonresponse eRBT Random assignment: 2 wks Response aRBT Random assignment: rRBT rRBT Random assignment: Nonresponse tRBT rRBT 35 Sack’s Study of Adaptive Transitional Case Management 4 wks Response Standard TCM Standard TCM Nonresponse Random assignment: Augmented TCM Random assignment: Standard TCM Standard Services 36 Discussion • Methods for online updating the policy as data accumulates. • Methods for producing composite rewards. – High quality elicitation of functionality • Human-Computer interface • Improving tactics 37 This seminar can be found at: http://www.stat.lsa.umich.edu/~samurphy/ seminars/UAlberta07.ppt Email me with questions or if you would like a copy: [email protected] 38 Unknown, Unobserved Causes • Problem: We recruit students via flyers posted in dormitories. Associations between observations and rewards are highly likely to be (due to the unknown causes) nonrepresentative. • Solution: Sample a representative group of college students. 39 STAR*D • This trial is over and the data is being analyzed (PI: J. Rush). • One goal of the trial is construct good treatment sequences for patients suffering from treatment resistant depression. www.star-d.org 40 41