Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Computational phylogenetics wikipedia , lookup
Neuroinformatics wikipedia , lookup
Pattern recognition wikipedia , lookup
Inverse problem wikipedia , lookup
Artificial intelligence wikipedia , lookup
Gene prediction wikipedia , lookup
Data assimilation wikipedia , lookup
Interfaces for Intense Information Analysis Marti Hearst UC Berkeley This research funded by ARDA 1 Outline • A contrast – Search vs. Analysis • Goals for three user groups – Intelligence Analysts – Biomedical Researchers – Investigative Reporters • Our current interface design 2 Search vs. Analysis Search: Finding hay in a haystack Analysis: Creating new hay 3 UIs for Search vs. Analysis • Search: – A necessary but undesirable step in a larger task – UI should not draw attention to itself – UI should be very easy to use for everyone • Analysis: – The larger task – UI can be more of a “science project” – But UI should have “flow” 4 General Goals • Support hypothesis formation / refutation • Flow – Easy creation, destruction, and cataloging of connections and coverage – Easy movement between multiple views • Represent: – – – – – Multiple supporting clues Conflicting evidence Uncertainty Timeliness Non-monotonicity 5 Intelligence Analysts 6 Intelligence Analysts • I have recently interviewed several active counter-terrorist analysts • Great diversity in – Goals – Computing environments • Biggest problems are social/systemic • Many mundane IT problems as well 7 Mundane IT Problems • • • • • System incompatibilities Data reformatting Data cleaning Documenting sources Archiving materials 8 Intelligence Analysts: Problem 1 • Look at a series of reports, images, communication patterns; • Try to build a model of what is going on – Follow leads – Compare to previous situations • Recent problem: – Groups are changing their behavior patterns quickly • Very little use of sophisticated software tools 9 Intelligence Analysts: Problem 2 • Given a large collection • “Roll around” in the data – See what has been “touched” • Tools should indicate which parts of the collection have been examined and which have yet to be looked at, and by whom – View data in several different ways • Data reduction methods such as MDS, SVD, and clustering often hide important trends. 10 Intelligence Analysts: Problem 2 – Don’t show the obvious • e.g., Cheney is president – Don’t show what you’ve already shown – Only show the most recent version – Show which info is not present • Changes in the usual pattern • Something stops happening 11 Intelligence Analysts: Problem 3 • Prepare a very short executive summary for the purposes of policy making – Really the culmination of a cascade of summaries – Reps from different agencies meet and “pow-wow” to form a view of the situation – Rarely, but crucially, must be able to refer back to original sources and reasoning process for purposes of accountability 12 BioInformatics Researchers 13 BioInformatics Example 1 • How to discover new information … • … As opposed to discovering which statistical patterns characterize occurrence of known information. • Method: – Use large text collections to gather evidence to support (or refute) hypotheses – Make Connections – Gather Evidence 14 Etiology Example • Don Swanson example, 1991 • Goal: find cause of disease – Magnesium-migraine connection • Given – medical titles and abstracts – a problem (incurable rare disease) – some medical expertise • find causal links among titles – symptoms – drugs – results 15 Gathering Evidence stress magnesium CCB migraine magnesium SCD magnesium PA magnesium 16 Gathering Evidence CCB migraine PA magnesium SCD stress 17 Swanson’s Linking Approach • Two of his hypotheses have received some experimental verification. • His technique – Only partially automated – Required medical expertise 18 BioInformatics Example 2: • How to find functions of genes? – Have the genetic sequence – Don’t know what it does – But … • Know which genes it coexpresses with • Some of these have known function – So …infer function based on function of coexpressed genes • This is problem suggested by Michael Walker and others at Incyte Pharmaceuticals 19 Gene Co-expression: Role in the genetic pathway Kall. g? Kall. h? PSA PSA PAP PAP g? Other possibilities as well 20 Make use of the literature • Look up what is known about the other genes. • Different articles in different collections • Look for commonalities – Similar topics indicated by Subject Descriptors – Similar words in titles and abstracts adenocarcinoma, neoplasm, prostate, prostatic neoplasms, tumor markers, antibodies ... 21 22 Formulate a Hypothesis • Hypothesis: mystery gene has to do with regulation of expression of genes leading to prostate cancer • New tack: do some lab tests – See if mystery gene is similar in molecular structure to the others – If so, it might do some of the same things they do 23 Investigative Reporter Example • Looking for trends in online literature • Create, support, refute hypotheses 24 Investigative Reporter Example What are the current main topics? Clustering What are the new popular terms? Corpus-level statistics, Co-occurrence statistics How do they track with the news? Contrasting collection statistics 25 Investigative Reporter Example How long after a new Star Trek series comes on the air before characters from the series appear in stories? How often do Klingons initiate attacks against Vulcans, vs. the converse? Named-entity recognition Creating a list of terms Apply the list to a Subcollection Create regex rules with POS information 26 LINDI File Help Summary Term Set New Merge a u m All terms: * Diseases: c y z emphysema cancer hypertension … Query x x Analysis Document Set All documents: * WHO: organization = world health organization Thank you! For more information: bailando.sims.berkeley.edu/lindi.html 28