Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Medical Informatics Ida Sim, MD, PhD February 17, 2004 Division of General Internal Medicine, and Graduate Group in Biological and Medical Informatics UCSF Copyright Ida Sim, 2004. All federal and state rights reserved for all original material presented in this course through any medium, including lecture or print. February 17, 2004: I. Sim Overview Medical Informatics Outline • Introduction • Course Goals and Overview • Computing Infrastructure for Health Care – data storage – networking February 17, 2004: I. Sim Overview Medical Informatics Introduction: Ida Sim, MD, PhD • PhD in Medical Informatics, Stanford • Assistant Professor – General Internal Medicine • Associate Director for Medical Informatics – Program in Biological and Medical Informatics • Interests – – – – computer-assisted clinical decision-making electronic scientific publication economics of health information technology meta-analysis, and evidence-based decision making February 17, 2004: I. Sim Overview Medical Informatics Informatics and Clinical Care • Institute of Medicine (IOM) report on med errors – calls for electronic prescribing – Leapfrog initiative: financial rewards for hospitals that use e-prescribing • IOM report on “quality chasm” – “A nationwide effort is needed to build a technologybased information infrastructure that would lead to the elimination of most handwritten clinical data within the next 10 years…”; asks for $1 billion for health informatics • Rise of consumer health informatics – consumer may be next “driver” for health care February 17, 2004: I. Sim Overview Medical Informatics Informatics and Clinical Research • Human genome findings will need to be translated into population and clinical medicine • RCTs now a $3.6 billion business (C. Scott, 7/00) – in 1988, 95% of RCTs conducted by academics – now, over 80% conducted by industry – industry is seeking increased efficiency in a very fragmented and complex business • Computers needed to help translate research results to practice – over 10,000 RCTs indexed in 1999 Medline February 17, 2004: I. Sim Overview Medical Informatics Yet... • Only ~12% of outpatient clinics have an EMR; only 30% of hospitals have a website • Much clinical research is still done using chart abstraction and paper forms • Medicine and medical research is information intensive, but – health sector invests only 2.5% of gross revenue on information technologies (Gartner Group, 2003) – vs. 6% in comparable information-intensive sectors (e.g., banking) February 17, 2004: I. Sim Overview Medical Informatics Course Goals • Be familiar with core concepts in medical informatics: vocabularies, interchange standards, decision support systems • Understand key concepts about electronic medical records (EMRs) and data warehouses, and their uses for clinical research • Understand the clinical, economic, and social context in which information technologies are being developed and deployed in health care February 17, 2004: I. Sim Overview Medical Informatics Context • Few students working directly in informatics • Desired outcome – that you be able to understand and converse with “tech” folks – that you have a better chance of recognizing and taking advantage of opportunities in • using informatics for your research work • participating in innovative informatics projects February 17, 2004: I. Sim Overview Medical Informatics Course Overview • 5 Lectures – PowerPoint file up few days before lecture – class participation expected • Guest lecture: Palo Alto Medical Foundation – Thurs. Mar. 4, 1:00 to 2:30 pm • Assignments – 5 homeworks, no final exam • Office “hours”: [email protected] – http://www.epibiostat.ucsf.edu/courses/schedule/med_informatics.html February 17, 2004: I. Sim Overview Medical Informatics Outline • Introduction • Course Goals and Overview • Computing Infrastructure for Health Care – data storage – networking February 17, 2004: I. Sim Overview Medical Informatics Computing Infrastructure B&T Clinic 2004 Logician EMR Front Desk HealthNet (HL-7) Medical Information Bureau Radiology Lab Pharm Benefit Manager Specialist Internet UniLab Intranet Walgreens Phone/Paper/Fax Understanding the Infrastructure • Clients and servers (the components) • Data storage (how data is stored) – flat file versus relational model • Networking (how data gets back and forth) February 17, 2004: I. Sim Overview Medical Informatics Client/Server Model Clients Web Server • Computers can be servers and/or clients • Web server “serves” web pages to “clients,” who view these pages using a browser – MS Internet Explorer or Netscape Communicator February 17, 2004: I. Sim Overview Medical Informatics Internet Clients and Servers nci.nih.gov myhome.com cochrane.uk amazon.com Main Trunk Cables aol.com pacbell.net medicine itsa ucsf.edu LAN February 17, 2004: I. Sim Overview Medical Informatics at home Data Storage • Computers can help us – store, retrieve, query, compute, and report data • For this to happen, we must describe the data in such a way that the computer – “understands” the data – can manipulate the data • e.g., sort them, graph them, add numbers, perform analyses – can retrieve the data for later use February 17, 2004: I. Sim Overview Medical Informatics “Describing” the Data • The extent to which the computer can help you manage your data depends on how well you described your data to it • In JIFE database example, did you describe your data – correctly: did Baby Oscar have jaundice? • accurate, clear, consistent, etc. – cleanly: with as little redundancy as possible • don’t want Baby Oscar’s birthdate in 3 separate places – sufficiently: all that is needed for later analyses • captured ethnicity for anticipated analysis by ethnicity? • what later analyses do you have in mind? – understandably: for humans and for computers February 17, 2004: I. Sim Overview Medical Informatics “Describing” Data: To Humans • For understanding and communication – via a system for codifying meaning • English language, mathematical notation, – making the “code” itself concrete 142 108 96 3.9 24 • skywriting, a graph drawn on a sandy beach • text on paper, an oil painting, lecture on audiotape • For later retrieval – a permanent or semi-permanent physical embodiment of the description • papers in a file cabinet, museum of runes February 17, 2004: I. Sim Overview Medical Informatics “Describing” Data: To Computers • For understanding and communication – via a data model for describing data to computers • akin to “German prose on paper” or “Olde English epic poetry on audiotape” – standard data models to choose from include • flat file • relational • object-oriented • For later retrieval – storage as 1’s and 0’s in • random access memory: short term, until power off • permanent memory on a hard disk: longer term February 17, 2004: I. Sim Overview Medical Informatics Data Model Choices • Data model should be the best that allows you to – do what you want to do with the data • query, manipulate, share, merge – handle the amount of data that you have – handle the type of data that you have • prose, numbers, xray images, audio files, etc. • Standard data model choices – flat file: one long list of text characters – relational: tables of columns and rows – object: data arranged in conceptual groups • Usual clinical research choice is flat file/relational • Clinical databases are increasingly becoming relational February 17, 2004: I. Sim Overview Medical Informatics Flat File Model • For understanding and communication – data are encoded as strings of characters • one character at a time, no concept of a “word” or “sentence” – so, computers cannot understand the meaning of data • “male” is just a string of 4 characters • For storage – in a single file (e.g. a Word or STATA file) – “flat” structure: start with one baby’s data, and keep adding data baby by baby • Like writing all your data from beginning to end onto one piece of paper and putting that paper into your file drawer February 17, 2004: I. Sim Overview Medical Informatics Flat File Examples Word Text File Carson Jackson Hannah Hillary Jonas Oscar 1 2 1 3/2/05 1/2/05 1/1/05 STATA File Carson,Jackson,1,3/2/05,J,5 Hannah,Hillary,2,1/2/05,C,2 Jonas,Oscar,1,1/1/05,J,3 February 17, 2004: I. Sim Overview Medical Informatics J C J 5 2 3 Database Schema • A database’s schema is a compact summary description your database’s contents • Database schema = description of database – what type of data – how that data is conceptually arranged • E.g., schema for research paper – intro, methods, results, discussion (text) – tables (table) and figures (graphic) – pictures (image) February 17, 2004: I. Sim Overview Medical Informatics Flat File Data Schema Word File Carson Jackson Hannah Hillary Jonas Oscar 1 2 1 3/2/05 1/2/05 1/1/05 J C J 5 2 3 • Which fields are – first name, DOB, case status, last name, exam score, gender • Flat file schemas are implicit – is in the mind of whoever is entering the data – can change from record to record • maybe first baby’s name is Jackson Carson and the second baby’s name is Hannah Hillary February 17, 2004: I. Sim Overview Medical Informatics Flat File Advantages • Easy, just start entering data, doesn’t need any preliminary database work or knowledge • Can do with any word processor – Word, WordPerfect, editor for STATA or SAS, Excel, SimpleText • Cheap • Can be exported to analysis programs • Portable – almost all programs can read in a flat file February 17, 2004: I. Sim Overview Medical Informatics Flat File Disadvantages • Description of the data isn’t clear, and may not even be understandable – meaning of the data items is not explicit • unclear that the last column is the neuropsych exam score – structure is not explicit • does last name always precede first name? • Inefficient and prone to error for representing repeating data fields – e.g., if each baby has more than one neuropsych exam score February 17, 2004: I. Sim Overview Medical Informatics Repeating Data in Flat File Model (1) Word Text File Carson Jackson Hannah Hillary Jonas Oscar Carson Jackson Jonas Oscar Jonas Oscar 1 2 1 2 1 1 3/2/05 1/2/05 1/1/05 3/3/05 1/3/05 1/1/05 J C J J J J 5 2 3 4 4 3 • Jackson/Carson’s gender might change from one record to another, or... February 17, 2004: I. Sim Overview Medical Informatics Repeating Data in Flat File Model (2) Word Text File Carson Jackson Hannah Hillary Jonas Oscar 1 2 1 3/2/05 1/2/05 1/1/05 J C J 5 2 3 x 4 4 3 • Implicit structure to repeating data – is the nth column always the nth neuropsych exam score? • can a missed exam be denoted by an X? • Whatever data schema there is, may vary from record to record February 17, 2004: I. Sim Overview Medical Informatics Flat File Disadvantages (cont.) • Inefficient at finding a particular baby – must look at records one by one from beginning to end – no guarantee that you have found all the information for that baby unless you look all the way to the end • Inefficient at manipulating data – to see list of male babies, must make a new file • Difficult to share since the database itself gives no clues about what data is in each field February 17, 2004: I. Sim Overview Medical Informatics Summary of Flat File Data Model Factor Flat File Humanunderstandable ComputererstandableΣ Complexity of data Querying Manipulating Amount of data Type of data Sharing and merging Frequently Not February 17, 2004: I. Sim Relational No Simple Inefficient Inefficient Small Text, Numbers Very Difficult Overview Medical Informatics Object When Are Flat Files Useful? • For a small, simple, “quick and dirty” databases – few data items, small number of records – one set of predictors and one set of outcomes per participant/subject • i.e., no repeating data fields • i.e., only one-to-one relations, no one-to-many – quick and dirty • for very few users (i.e. just you) • you’re not planning on reusing this database later • you’re not planning on sharing this database now or later February 17, 2004: I. Sim Overview Medical Informatics Flat Files in Clinical Care • Really no reason nowadays to build a flat file system for clinical care databases – lots of one-to-many relationships • Many flat file systems are leftover from early days of computerization – old VA system in Mumps (ANSI Standard M) – STOR, a pioneering system in the 1970s • “STOR does not store data in a relational database - it is a flat file data structure. To obtain it's data, I run queries off it and download them into FileMaker Pro or Microsoft Access or Excel and then manipulate the data into a form more easy to read for providers.” Tirzah Gonzalez, DGIM STOR analyst February 17, 2004: I. Sim Overview Medical Informatics Relational Data Model • Data are arranged in tables made up of columns and rows – the columns are the types of data • fixed number of columns • each column has a unique name (e.g., FirstName) • has a “domain” of values that may appear in that column – domain=text for FirstName, domain=positive integers for age – the rows are the records themselves • there can be an arbitrary number of unique unnamed rows (i.e., the table can be arbitrarily long) February 17, 2004: I. Sim Overview Medical Informatics Flat File Admissions Database Robert Lee, 000-01-001, M, 09-Jul-70,B/T Healthnet 31-Dec-94 to 12-Jan-95, admitted to Medicine with Acute MI, discharged with Acute MI, COPD, Diabetes, CHF 27-Mar-96 to 31-Mar-96, admitted to Medicine with COPD, discharged with Pneumonia, COPD, CHF, Diabetes June Smith, 000-01-002,F,22-Oct-25,Medicare 02-Feb-95 to 16-Feb-95, admitted to Surgery for Total Hip Replacement, discharged with THR, Acute MI, Diabetes 27-Feb-95 to 20-Mar-95, admitted to Medicine with Acute MI, discharged with Acute MI,VF Arrest, Diabetes Marissa Perez,000-01-003,F,13-Jun-57,B/T Pacificare 19-Nov-97 to 23-Nov-97, admitted to Gyn for metrorrhagia, discharged with uterine fibroids, Diabetes February 17, 2004: I. Sim Overview Medical Informatics Review of Problems with Flat Files • • • • Implicit structure, implicit data schema Schema may change from record to record Inefficient for finding a particular admission Inefficient for pulling out all Acute MI admissions • Difficult to share or to understand later • etc. February 17, 2004: I. Sim Overview Medical Informatics Relational Admissions Database (#1) InpatientMasterTable ID Name 000-01-001 Lee 000-01-002 Smi th 000-01-003 Perez AdmissionsTable ID Admit Service 000-01-001 000-01-001 000-01-002 000-01-002 000-01-003 Med Med Surg Med Gyn Sex M F F Birthdate 09-Jul-70 22-Oct-25 13-Jun-57 Insurance B/T Healthnet Medicare B/T Pacificare Admit Date Discharge Date Admit Diagno sis 31-Dec-94 27-Mar-96 03-Feb-95 27-Feb-95 19-Nov-97 12-Jan-95 31-Mar-96 16-Feb-95 20-Mar-95 23-Nov-97 Acute MI COPD THR Acute MI Menorrhagia Principal Discharge Diagno sis Acute MI Pneumonia THR Acute MI von Will ebrand's Seconda ry Discharge Diagno ses COPD COPD Acute MI VF Arrest Dia betes Seconda ry Discharge Diagno ses Dia betes (CHF) CHF (Diabetes) Dia betes Dia betes • Doesn’t handle secondary diagnoses very well – for many admissions, there are either too few or too many columns February 17, 2004: I. Sim Overview Medical Informatics Relational Admissions Database InpatientMasterTable ID Name 000-01-001 Lee 000-01-002 Smi th 000-01-003 Perez AdmissionsTable ID Admit Service 000-01-001 000-01-001 000-01-002 000-01-002 000-01-003 Med Med Surg Med Gyn Sex M F F Birthdate 09-Jul-70 22-Oct-25 13-Jun-57 Insurance B/T Healthnet Medicare B/T Pacificare Admit Date Discharge Date Admit Diagno sis 31-Dec-94 27-Mar-96 03-Feb-95 27-Feb-95 19-Nov-97 12-Jan-95 31-Mar-96 16-Feb-95 20-Mar-95 23-Nov-97 Acute MI COPD THR Acute MI Menorrhagia Seconda ryDischargeDiagnos isTable ID Admit Date 000-01-001 31-Dec-94 000-01-001 31-Dec-94 000-01-001 31-Dec-94 000-01-001 27-Mar-96 000-01-001 27-Mar-96 000-01-001 27-Mar-96 000-01-002 03-Feb-95 000-01-002 03-Feb-95 000-01-002 27-Feb-95 000-01-002 27-Feb-95 000-01-003 19-Nov-97 Seconda ry Discharge Diagnos es COPD Dia betes CHF COPD CHF Dia betes Acute MI Dia betes VF Arrest Dia betes Dia betes Principal Discharge Diagno sis Acute MI Pneumonia THR Acute MI von Will ebrand's Relational Database Schema • The schema is the names of the tables and their column names – InpatientMasterTable(ID,Name,Sex,Birthdate,Insura nce) – AdmissionsTable(ID,AdmitService,AdmitDate,Disc hargeDate,AdmitDiagnosis,PrincipalDischargeDiag nosis) – SecondaryDiagnosisTable(ID,AdmitDate,Secondary DischargeDiagnosis) • The schema is explicitly stated – in a language called Structured Query Language (SQL) February 17, 2004: I. Sim Overview Medical Informatics Pros of Relational Model • Database is always consistent – built-in prevention against insert, delete, and update errors • Based on formal set theory – normalization saves storage space – normalization supports more efficient searching through the data – standard schema definition and query language available • SQL=Structured Query Language • Available as reliable commercial software systems... February 17, 2004: I. Sim Overview Medical Informatics Cons of (Traditional) Relational Model • Profusion of tables and keys can be confusing – higher organizing principles are implicit • e.g., a patient has only one primary diagnosis but may have several secondary diagnoses • Inefficient at representing complex semantic relationships – e.g., ICU admission is a type of admission • Unable to capture certain types of data – nested data • e.g., admit diagnosis = MITable(location,Qwave,CHFStatus) – images and other multimedia – metadata (e.g., “Exam score corrected May 2nd, 2000”) February 17, 2004: I. Sim Overview Medical Informatics Summary of Relational Data Model Factor Flat File Relational Humanunderstandable ComputererstandableΣ Complexity of data Querying Manipulating Amount of data Type of data Sharing and merging Frequently Not Yes No Yes Simple Inefficient Inefficient Small Text, Numbers Very Difficult Complex Efficient Efficient Very Large Text, Numbers Least Difficult • We don’t normally think in tables... February 17, 2004: I. Sim Overview Medical Informatics Object Object Data Model • Data arranged in conceptual groups, with prototypes and their attributes Patient -name -gender -b-day -address -insurance -primary MD -etc February 17, 2004: I. Sim Admission Diagnosis -admit date -code -discharge date -attending MD -admit, primary, secondary dx -etc. -name -modifiers Overview Medical Informatics Inheritance • Special classes of data can be modeled efficiently Admission -admit date -discharge date -attending MD -admit, primary, secondary dx -etc. is-a ICUAdmission -APACHE score -ICU attending MD February 17, 2004: I. Sim Overview Medical Informatics Pros and Cons of Object Model • Pros: Can represent very complex data types and data relationships – images, audio, inheritance, procedural data (e.g., how to draw a graph of given data) • Cons: Very complex – inefficient since no formal mathematical basis for storage and querying – more difficult to share since data is more complex – commercial systems are flaky February 17, 2004: I. Sim Overview Medical Informatics Summary of Data Models Factor Flat File Relational Object Humanunderstandable ComputererstandableΣ Complexity of data Querying Manipulating Amount of data Type of data Sharing and merging Frequently Not Yes Partially No Yes Yes Simple Inefficient Inefficient Small Text, Numbers Very Difficult Complex Efficient Efficient Very Large Text, Numbers Least Difficult Very Complex Inefficient Inefficient Large All Rather Difficult February 17, 2004: I. Sim Overview Medical Informatics Summary of Data Model Choices • Generally, use the RELATIONAL MODEL for storing clinical and clinical research data • Exceptions – you have only one-to-one relations in your database, which you are not intending on sharing or reusing • use a flat file (e.g., Excel, STATA) – you need to store complex, multimedia data • consider an extended-relational database (aka object-relational) – database designed using the object model – data is stored and queried as a relational database • but could probably work around this using newer standard relational databases February 17, 2004: I. Sim Overview Medical Informatics The Model vs. The System • Data model – the generic abstract structure of the information • domain independent, not a “product” per se • Database management system – is a real-world program that you can buy Data Model Flat file Relational Object Example Database Management Systems Small Scale (PC’s) Large Scale (Mainframes) Filemaker Pro VA system (enhanced) Access, MySQL Oracle, Sybase, MySQL, SQL Server Informix Objectivity – stores information using a data model – provides additional functionality February 17, 2004: I. Sim Overview Medical Informatics DBMS Features for System Selection • Memory capacity • Multi-user support and transaction management • Data entry forms • Triggers and rules • Security • Backup and archiving February 17, 2004: I. Sim Overview Medical Informatics Other DBMS Features • Security – can have logins and different levels of access • only database administrator can change data schema • data entry person can only enter data into certain fields • Backup and archiving – safer if this is automatically done on a regular schedule – standard for health care data is at least 7 years of archiving February 17, 2004: I. Sim Overview Medical Informatics Computing Infrastructure B&T Logician EMR Front Desk HealthNet Radiology Medical Information Bureau Lab Pharm Benefit Manager Modern U. Specialist Internet UniLab Intranet Walgreens Phone/Paper HealthSystem Minnesota • 1.6 million patient visits per year, 270,000 capitated lives, 460 physicians, 4700 employees, 31 clinics, and over $400 million in revenues (1998) – over 50 computer and 50 paper systems • “Maintaining the consistency of these tables in various systems is impossible and creates enormous problems for understanding let alone improving our performance.” February 17, 2004: I. Sim Overview Medical Informatics Summary on Data Storage • How a computer stores information can have serious implications for – – – – data integrity speed ability to share data security (via enhancements available to relational database management systems) • Relational model is generally the best choice for storing clinical data – but making sense of multiple databases is still non-trivial February 17, 2004: I. Sim Overview Medical Informatics Understanding the Infrastructure • Clients and servers (the components) • Data storage (how data is stored) – flat file versus relational model • Networking (how data gets back and forth) February 17, 2004: I. Sim Overview Medical Informatics Internet = Network of Networks nci.nih.gov myhome.com cochrane.uk amazon.com Main Trunk Cables aol.com local trunk cable through Berkeley pacbell.net medicine or use a commercial Internet Service Provider (ISP) itsa via dial--up or DSL ucsf.edu LAN dial-in to itsa.ucsf.edu via modem February 17, 2004: I. Sim Overview Medical Informatics at home Networking Media • Copper wire (twisted pair) – generally not well suited to high bandwith transmission • Coaxial cable – can carry high frequencies without leak – cable industry has “more bandwidth by accident than the telephone people have on purpose” • Fiber optic – highest bandwidth, but expensive and de novo • Curb-to-home problem – only phone and coax cables now run from curb to home – hybrid fiber/coax cables and approaches coming February 17, 2004: I. Sim Overview Medical Informatics Networking Bandwidth Sim: Computer Infrastructure Connection Type Phone mod em ISDN 1/26/00 Speed (in kilo bits per second, Kbps) 14.4, 28.8 , or 56 CXR (12 Mbits) CT Scan (5.2 Mbits) 64 to 128 3 min 1.4 mi n 8 sec 3.3 sec T1 1,000 Spread-spectrum RF 2,000 ADSL Cable modem 6,000 to 7,000 to 10,000 Infrared 16,000 Etherne t 10,000 100,000 on some sytems 45,000 155,000 ove r copper w ir es 622,000 ove r fiberoptic 52,000 to 9,953,000 T3 ATM SONET February 17, 2004: I. Sim Overview Medical Informatics What Happens over Network Cables? nci.nih.gov myhome.com cochrane.uk amazon.com Main Trunk Cables aol.com pacbell.net medicine itsa ucsf.edu LAN February 17, 2004: I. Sim Overview Medical Informatics at home Networking Protocols • Protocol = grammar for machines talking to each other – e..g, protocol for the WWW = http • WWW vs. Internet vs. Intranet vs. VPN – WWW = http-based communication on Internet – Intranet = network of networks restricted to within an organization (usually implies only http-based communication) – Virtual Private Network is an Intranet that physically uses part of the Internet • Health-specific protocols needed (e.g., HL-7) February 17, 2004: I. Sim Overview Medical Informatics Significant Issue in HealthCare • UCSF spent ~$100 million on networking in the late 1990’s • Health-specific networking “grammars” add to complexity of infrastructure • Many interactive services (e.g., realtime teleconsultation) would need more bandwidth than is commonly available February 17, 2004: I. Sim Overview Medical Informatics Conclusions • Computing infrastructure for health care is very complex, very fragmented, has lots of gaps, and is saddled with lots of old technology • Clinical (and research) databases are generally more reliable and efficient if they are relational rather than flat file • Networking involves both hardware (cable) and software (protocols); bandwidth limits wide deployment of interactive technologies February 17, 2004: I. Sim Overview Medical Informatics Teaching Points • If you want computers to do “smart” things with your data (e.g., retrieve, sort, graph), you must describe that data very explicitly – what you don’t say the computer does not know • Data models are standard abstract ways of describing data • To send data back and forth, you also need very explicit “grammars” for communication • Today = how of infrastructure; next class = what February 17, 2004: I. Sim Overview Medical Informatics References • L.T. Kohn, J.M. Corrigan, M.S. Donaldson, To Err is Human: Building a Safer Health System (Washington: National Academy Press, 1999.) • Crossing the Quality Chasm: A New Health System for the 21st Century (Washington: National Academy Press, 2001) February 17, 2004: I. Sim Overview Medical Informatics