Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Informatics and Information Engineering CSE 300 Prof. Steven A. Demurjian, Sr. Computer Science & Engineering Department The University of Connecticut 371 Fairfield Road, Box U-255 Storrs, CT 06269-2155 [email protected] http://www.engr.uconn.edu/~steve (860) 486 - 4818 Copyright © 2008 by S. Demurjian, Storrs, CT. Portions of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech. IIE-1 Overview CSE 300 Informatics What is Informatics? What is Biomedical Informatics? What are Key Biomedical Informatics Challenges? Information Engineering Data vs. Information vs. Knowledge What is Science? What is Engineering? What is Information Consistency? Information Usage and Repositories How do we Store and Utilize Information? Role of Web in Informatics Sharing, Collaboration, and Security Databases vs. Data Mining IIE-2 Informatics CSE 300 Informatics is: Management and Processing of Data From Multiple Sources/Contexts Involves Classification (Ontologies), Collection, Storage, Analysis, Dissemination Informatics is Multi-Disciplinary Computing (Model, Store, Process Information) Social Science (User Interactions, HCI) Statistics (Analysis) Informatics Can Apply to Multiple Domains: Business, Biology, Fine Arts, Humanities Pharmacology, Nursing, Medicine, etc. IIE-3 What is Informatics? CSE 300 Heterogeneous Field – Interaction between People, Information and Technology Computer Science and Engineering Social Science (Human Computer Interface) Information Science (Data Storage, Retrieval and Mining) Informatics People Information Technology Adapted from Shortcliff textbook IIE-4 What is Biomedical Informatics (BMI)? CSE 300 BMI is Information and its Usage Associated with the Research and Practice of Medicine Including: Clinical Informatics for Patient Care Medical Record + Personal Health Record Bioinformatics for Research/Biology to Bedside From Genomics To Proteomics Public Health Informatics (State and Federal) Tracking Trends in Public Sector Clinical Research Informatics Deidentified Repositories and Databases Facilitate Epidemiological Research and Ongong Clinical Studies (Drug Trails, Data Analysis, etc.) IIE-5 What are Key BMI Focal Areas? CSE 300 T1 Research Transition Bench Results into Clinical Research Clinical Research Applying Clinical Research Results via Trials with Patients on Medication, Devices, Treatment Plans T2 Research Translating “Successful” Clinical Trials into Practice and the Community Clinical Practice Tracking all of the Information Associated with a Patient and his/her Care Integrated and Inter-Disciplinary Information Spectrum IIE-6 What is Medical Informatics? CSE 300 Clinical Informatics, Pharmacy Informatics Public Health Informatics Consumer Health Informatics Nursing Informatics Systems and People Issues Intended to Improve Clinical outcomes, Satisfaction and Efficiency Workflow Changes, Business Implications, Implementation, etc… Patient Centered – Personal Health Record and Medical Home Care Centered – Pay for Performance, Improving Treatment Compliance IIE-7 What is Bionformatics? CSE 300 Focused on Research Tools for T1: Genomic and Proteomic Tools, Evaluation Methods, Computing And Database Needs Information Retrieval and Manipulation of Large Distributed (caBIG) Data Sets (cabig.cancer.gov/index.asp) Often Requires Grid Computing Includes Cancer and Immunology Research Increasing Need to Tie These Separate Types of Systems Together = Personalized Medicine Biology and the Bedside (www.i2b2.org) IIE-8 Where is Data/How is it Used? CSE 300 Medical And Administrative Data Found in Clinical Information Systems (CIS) Such As: Hospital Info. Systems Electronic Medical Records Personal Health Records… Pharmacy Nursing, Picture Archiving Systems Complex Data Storage and Retrieval – Many Different Systems T1 Research Increasingly Reliant on CIS T2 Research is Reliant on: End Systems for Embedding EBM (EvidenceBased Medicine) Guidelines Measuring Outcomes, Looking at Policy IIE-9 What are Major Informatics Challenges? CSE 300 Shortage of Trained People Nationally Slows adoption of Health Information Technology Results in Poor Planning and Coordination, Duplication of Efforts and Incomplete Evaluation What are Critical Needs? Dually Trained Clinicians or Researchers in Leadership of some Initiatives Connect all folks with Informatics Roles across Institutions to Improve Efficiency Multi-Disciplinary: CSE, Statistics, Biology, Medicine, Nursing, Pharmacy, etc. Emerging Standards for Information Modeling and Exchange (www.hl7.org) based on XML IIE-10 Information Engineering CSE 300 Data vs. Information vs. Knowledge How do we Differentiate Between them? Where are they used in BMI? Science vs. Engineering What is each of their Roles in Informatics? How can we Engineer Information? What is their Role in BMI? What is Information Engineering? What are the Unique Challenges and Opportunities? What is Available Today and Tomorrow? IIE-11 From American Heritage CSE 300 Data Information, esp. information organized for analysis or used as the basis for a decision. Numerical information in a form suitable for processing by computer. Information The act of informing or the condition of being informed; communication of knowledge. A non-accidental signal used as an input to a computer or communications system. Knowledge The state or fact of knowing. The sum or range of what has been perceived, discovered, or learned. Specific information about something. IIE-12 From Webster’s 9th Collegiate CSE 300 Data Factual information (e.g. statistics) used as a basis for reasoning, discussion, or calculation. Information The communication of knowledge or intelligence Something (as a message, experimental data, or a picture) which justifies change in a construct (as a plan or theory) that represents physical or mental experience or another construct quantitative measure of the content of information Knowledge The fact or condition of having information or of being learned. The sum of what is known: the body of truth, information, and principles acquired by mankind. IIE-13 Data vs. Information vs. Knowledge CSE 300 Overlapping Definitions Conflicting Definitions Agreement on Data Knowledge and Information - Synonyms Discussion Questions: Equivalence of Knowledge/Information? How can we Distinguish them? Do these Three Terms Cover Possibilities? IIE-14 Data, Information, and Knowledge in BMI CSE 300 Data – Basic Level BP, Pulse, Temperature Peak Flow, Glucose Level, Biopsy Result X-Ray, MRI, Cat Scan Information - First level of Interpretation BPs, Peak Flow, Glucose over Time Interpreting Scan (Radiologist) or Biopsy Result (Oncologist) Knowledge – Applying Experience towards Diagnosis What can Low Peak Flows over Time lead to? What Next Step after Positive Scan or Biopsy? What if Glucose Level is Yo-yoing? IIE-15 From American Heritage CSE 300 Science The observation, identification, description, experimental investigation, and theoretical explanation of natural phenomena. Methodologoical activity, discipline, or study. An activity that appears to require study & method. Knowledge, esp. gained through experience. Engineering The application of scientific and mathematical principles to practical ends such as the design, construction, and operation of efficient and economical structures, equipment, and systems. IIE-16 From Webster’s 9th Collegiate CSE 300 Science The state of knowing: knowledge as distinguished from ignorance or misunderstanding A department of systemized knowledge as an object of study A system or method reconciling practical ends with scientific laws. Engineering The application of science and mathematics by which the properties of matter and the sources of energy in nature are made useful to people in structures, machines, products, systems, and processes. IIE-17 Science and Engineering in BMI CSE 300 Science Data/Information Collection & Analysis to Reach Hypothesis Patients with CHF and Lipitor have Less Heart Attacks than CHF and Baby Aspirin Verify in Clinical Research/Epidemiological Study Engineering Usage of Information in Practice Apply Scientific Results to Medical Practice Image Processing used to Identify Tumors in CT and MRI Scans Transfer of Radiologists Knowledge into Computer Based (Assisted) Solution An Engineering Solution to Scientific Result IIE-18 What is Information Engineering? CSE 300 Incorporation of an Engineering Approach and Discipline to the Generation of Information and the Promotion of the Better Use of Information and Resources Information Engineering Unifies and Combines: Software Engineering Database Engineering Security Engineering Performance Engineering Etc... Moral: Systems Cannot and Must Not be Engineered in a Vacuum! Particularly true in BMI (T1, T2, Clinical Research, and Clinical Practice) IIE-19 Information Engineering is Motivated by: CSE 300 Realization that Management/Control of Information will be a Primary Concern as we Continue through the 1990s and into the 21st Century Currently in an Age of Information - Volume and Complexity Dependencies Critical Systems Heavily Depend on Information: Airline/Hotel/Auto Reservations Telecommunications Banking/ATMs ATM/Credit Cards at Gas Stations/Supermarkets Credit Bureaus Electronically Collect Information from Many Diverse Sources E-Tailing Medical Care/All Aspects of BMI IIE-20 Info. Engrg. - Challenge for 21st Century CSE 300 Timely and Efficient Utilization of Information Significantly Impacts on Productivity Supports and Promotes Collaboration for Competitive Advantage Use Information in New and Different Ways Collection, Synthesis, Analyses of Information Better Understanding of Processes, Sales, Productivity, etc. Dissemination of Only Relevant/Significant Information - Reduce Overload Implications for BMI? Sharing of Results – Benefit Mankind Ability to Research on Rare Diseases Are there Unknown Isolated “Cures”? IIE-21 How is Information Engineered? CSE 300 Careful Thought to its Definition/Purpose & Thorough Understanding of its Intended Usage/Potential Impact Insure and Maintain its Consistency Quality, Correctness, and Relevance Protect and Control its Availability (Secure Access) Who can Access What Information in Which Location and at What Time? Long-Term Persistent Storage/Recoverability Cost, Reusability, Longitudinal, and Cumulative Experience Integration of Past, Present and Future Information via Intranet and Internet Access What are Implications/Challenges for BMI? Let’s Discuss Briefly… IIE-22 Towards Information Consistency CSE 300 Consistency of Information is Key! Consistency Gauged with respect to: Usage of Information Persistency of Information Integrity/Security of Information Allowable Values and Protection from Misuse Validity (Relevance) of Information Means Something to Someone in a Postive Way Discussion Questions: Why is Consistency Important for BMI? How is Consistency Attained for BMI? What Else Impacts Consistency BMI? IIE-23 What's Available to Support IE? CSE 300 What Can be Provided to Make the Advanced Application Design Process: More Complete? More Robust? More Responsive? Less Error Prone? Current Choices to Support Information Engineering: Conventional Programming Languages and Data Models Object-Oriented Programming Languages Object-Oriented DBS XML Databases Middleware and SOA (Web) Data Mining/Warehouses IIE-24 What are Key Questions? CSE 300 Focus on Information and its Behavior What are Different Kinds of Information? How is Information Manipulated? Is Same Information Stored in Different Ways? What are Information Interdependencies? Will Information Persist? Long-Term DB? Versions of Information? What Past Info. is Needed from Legacy DBs or Applications? Who Needs Access to What Info. When? What Information is Available Across WWW? All of these Questions Apply to BMI! IIE-25 Information Usage and Repositories CSE 300 How do we Store and Utilize Information? Databases Data Mining What are Key Issues? Information Sharing/Data Correctness Collaboration 1. Among Providers and Researchers 2. Among Providers and Patients 3. Among Patients (Support Groups) Security 1. Control of Patient Information (De-identified) 2. Secure Exchange/Patient Ownership 3. Establish Custom Patient Controlled Groups What is the Role of Web in Informatics? IIE-26 The Role of a Database CSE 300 Database is a Norm in Today's and Tomorrow's Applications Usage Information Tightly Linked to its Storage Integration of Database - Key Component Support Many Representations of ``Same'' Information Promotes Retrieval of Information Geared Towards User Needs and Responsibilities Gap Exists Between Standalone Programming Applications and Database Systems For BMI: Database (Data Warehouse) is a Key Feature Need for Access to Data (De-identified) Need to Share and Interact among Stakeholders IIE-27 DBMS Architecture CSE 300 DBMS Languages Data Definition Language (DDL) Data Manipulation Language (DML) From Embedded Queries or DB Commands Within a Program “Stand-alone” Query Language Host Language: DML Specification (e.g., SQL) is Embedded in a “Host” Programming Language (e.g., Java, C++) DBMS Interfaces Menu-Based Interface Graphical Interface Forms-Based Interface Interface for DBA (DB Administrator) IIE-28 ANSI/SPARC - Three Schema Architecture CSE 300 External Data Schema (Users’ view) Conceptual Data Schema (Logical Schema) Internal Data Schema (Physical Schema) IIE-29 How are these Used for BMI? CSE 300 Internal Data Schema (Physical Schema) Hidden Data Representation for Storage of BMI Data in Proprietary Format Under the Control of DB System Conceptual Data Schema (Logical Schema) The Data Model for the BMI Application Access to Schema Controllable via SQL External Data Schema (Users’ view) Subsets of the Data Model for Different Users External View for Patients External View for Providers External View for Clinical Researchers Need Ability for a Patient to Control Access to his/her Own External View IIE-30 Data Independence CSE 300 Ability that Allows Application Programs Not Being Affected by Changes in Irrelevant Parts of the Conceptual Data Representation, Data Storage Structure and Data Access Methods Invisibility (Transparency) of the Details of Entire Database Organization, Storage Structure and Access Strategy to the Users Both Logical and Physical Recall Software Engineering Concepts: Abstraction the Details of an Application's Components Can Be Hidden, Providing a Broad Perspective on the Design Representation Independence: Changes Can Be Made to the Implementation that have No Impact on the Interface and Its Users IIE-31 Physical Data Independence CSE 300 The Ability to Modify the Physical Data Representation Without Causing Application Programs to Be Rewritten Examples: Transparency of the Physical Storage Organization Transparency of Physical Access Paths Numeric Data Representation and Units Character Data Representation Data Coding Physical Data Structure All of these are Vital for BMI – Particularly if we Use Standard to Achieve Application Independence IIE-32 Physical Data Independence CSE 300 Physical Data Independence is a Measure of How Much the Internal Schema Can Change Without Affecting the Application Programs In BMI – Allows us to Plug and Play Different DBMS Platforms – Extensible and Versatile Integration Physical IIE-33 Logical Data Independence CSE 300 Transparency of the Entire Database Conceptual Organization As a Result: Transparency of Logical Access Strategy Addition of New Entities Removal of Entities Virtual (Derived) Data Items Union of Records Views Common Mechanism for Logical Data Dependency Provide Different Logical Data Contexts to Different Users Based on Their Needs Update Views vs. Read-Only Views IIE-34 Logical Data Independence CSE 300 Logical Data Independence is a Measure of How Much the Conceptual Schema Can Change Without Affecting the Application Programs For BMI – Allows us to Separate End User Applications (Patients, Providers, etc.) from DB Logical IIE-35 Classic Information System Design CSE 300 IIE-36 Data vs. Information CSE 300 IIE-37 Programming Language Systems vs. DBS CSE 300 Similarities and Differences Exist At System Level: Shared Resources vs. Shared Data Execution Granularity - Programs vs. Transactions Granularity Difference - Files vs. Instances Classic Problem of “Impedance Mismatch” Thin Layer of Overlap between PLS (C++, Java, etc.) and Relational Database System What will Future Bring? SQL3 with Object-Oriented Extensions XML Databases (Apached Xindice, Sendra, etc.) Today Tomorrow? PLS PLS RDBS XML DBS IIE-38 What is Today’s Impedance Mismatch? CSE 300 Relational Data Organizes Information into Flat Files Relational Tables with Primary Key High Number of Tuples per Table (1000s & more) Limited Number of Tables (10-50) for Even Large Size Application Limited Linkages Among Tables (Foreign Keys) What Does BMI/PHR/EMR Require? For Each Patient, Track Multiple Dependencies Visits per Patient Tests per Patient Prescriptions per Patient Data Inherently Complex and Interdependent Flattened into Relational Format IIE-39 The Health Care Application - Classes CSE 300 IIE-40 The Health Care Application - Classes CSE 300 IIE-41 The Health Care Application - Classes CSE 300 IIE-42 The Health Care Application - Relationships CSE 300 IIE-43 How Does Mismatch Occur? CSE 300 On Left – OO Classes Inheritance Dependencies Programmatic View C++ or Java Usage Staging from DB to OO Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech) Above – Relational Tables Stage Data from Tables into OO (e.g. Java) format Utilize JDBC What are the Implications/Impacts? IIE-44 Implications and Impact CSE 300 Three Copies of “Same” Information in Different Database Table (Item) OO Representation – Server Side (Classes) GUI Display – Client Side (html/xml) What can this Lead to? Dr. D, Jan 01, 08 Fever, Flu, Bed Rest No Scripts No Tests Item(Phy_Name*, Date*, Visit_Flag, Symptom, Diagnosis, Treatment, Presc_Flag, Pre_No, Pharm_Name, Medication, Test_Flag, Test_Code, Spec_No, Status, Tech) IIE-45 What is one Possible Solution? CSE 300 Standards and Usage of XML Consider CDA – Clinical Document Architecture Standard for Clinical (Provider) Medical Record Clinical Record Organized as: <patient_encounter> - location <legal_authenticator> - MD <originating_organization> and <provider> <patient> - name, birthdate, gender <body_confidentiality-”CONF1”> - note History Past Medical History Medications Allergies Social History Physical Exam Vitals (BP, Resp, Temp, HR) Etc... IIE-46 What is one Possible Solution? CSE 300 Let’s Explore this in Greater Detail Starting with the CDA Header <?xml version="1.0"?> <!DOCTYPE levelone PUBLIC "-//HL7//DTD CDA Level One 1.0//EN" "levelone_1.0.dtd"> <levelone> <clinical_document_header> <id EX="a123" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="2"/> <document_type_cd V="11488-4" S="2.16.840.1.113883.6.1" DN="Consultation note"/> <origination_dttm V="2000-04-07"/> <confidentiality_cd ID="CONF1" V="N" S="2.16.840.1.113883.5.1xxx"/> <confidentiality_cd ID="CONF2" V="R" S="2.16.840.1.113883.5.1xxx"/> <document_relationship> <document_relationship.type_cd V="RPLC"/> <related_document> <id EX="a234" RT="2.16.840.1.113883.3.933"/> <set_id EX="B" RT="2.16.840.1.113883.3.933"/> <version_nbr V="1"/> </related_document> </document_relationship> <fulfills_order> <fulfills_order.type_cd V="FLFS"/> <order><id EX="x23ABC" RT="2.16.840.1.113883.3.933"/></order> <order><id EX="x42CDE" RT="2.16.840.1.113883.3.933"/></order> </fulfills_order> IIE-47 CDA Example - Continued CSE 300 IIE-48 CDA Example - Continued CSE 300 IIE-49 CDA Example - Continued CSE 300 IIE-50 CDA Example - Continued CSE 300 IIE-51 CDA Example - Continued CSE 300 IIE-52 CDA Example - Continued CSE 300 IIE-53 CDA Example - Continued CSE 300 IIE-54 CDA Example - Continued CSE 300 IIE-55 Information Sharing/Access: Potential Pitfalls CSE 300 Another Critical Issue is Information Sharing Perception: How do I see/understand Data/Info? Differences: What is the Reality? Dealing with Information at Different Levels Syntax – Format of Information Semantics – Meaning of Information Pragmatics – Usage of Information When Unifying Databases/Information Repositories, Must Address all Three! Data Integrity and Data Security Correct and Consistent Values Assurance in All Secure Accesses For BMI – All of the Above are Critical for Correct Usage and Interpretation in All Contexts (T1, T2, …) IIE-56 Information Syntactic Considerations CSE 300 Syntax is Structure and Format of the Information That is Needed to Support a Coalition Incorrect Structure or Format Could Result in Simple Error Message to Catastrophic Event For Sharing, Strict Formats Need to be Maintained Health Care Data Suffers from Lack of Standards Standards for Diagnosis (Insurance Industry) Emerging Standards Include: Health Level 7 (HL7) Based on XML Formats Non-Standard for Different Health Organizations, Insurers, Pharmacy Networks, etc. N*N Translations Prone to Errors! IIE-57 Information Semantics Concerns CSE 300 Semantics (Meaning and Interpretation) NATO and US - Different Message Formats Distances (Miles vs. Kilometers) Grid Coordinates (Mils, Degrees) Maps (Grid, True, and Magnetic North) What Can Happen in Health Care Data? Possible to Confuse Dosages of Medications? Weight of Patients (Pounds vs. Kilos)? Measurement of Vital Signs? Dana Farber Chemo Death – Checks/Balances What Others are Possible? IIE-58 Syntactic & Semantic Considerations CSE 300 What’s Available to Support Information Sharing? How do we Insure that Information can be Accurately and Precisely Exchanged? How do we Associate Semantics with the Information to be Exchanged? What Can we Do to Verify the Syntactic Exchange and that Semantics are Maintained? Can Information Exchange Facilitate Federation? Can this be Handled Dynamically? Or, Must we Statically Solve Information Sharing in Advance? IIE-59 Information Pragmatics Considerations CSE 300 Pragmatics Require that we Totally Understand Information Usage and Information Meaning What are the Critical Information Sources? How will Information Flow Among Them? What Systems Need Access to these Sources? How will that Access be Delivered? Who (People/Roles) will Need to See What When? How will What a Person Sees Impact Other Sources? Focus on: Way that Information is Utilized and Understood in its Specific Context Can Medical Info be Misused even if Understood? IIE-60 Information Pragmatics Considerations CSE 300 What are Pragmatics Issues re. Underinsured and Uninsured Populations in Event? How Can we Use Info Effectively if we Don’t Know if it is Complete? Has Info from All Sources Been Collected? What Happens if Same Patient in Different Repositories Can’t be Reconciled? What if Patient in Unresponsive and Can’t Supply any Info? Is Usage of Info Complicated due to Incompleteness? Multiple Locations? Or, if the Event is Major – will all Patient Populations Suffer Same Substandard Care? IIE-61 Collaboration and Security CSE 300 Two Concepts go Hand in Hand Strong Parallels Collaboration Among Providers and Researchers Among Providers and Patients Among Patients (Support Groups) Security Control of Patient Information (De-identified) Secure Exchange/Patient Ownership Establish Custom Patient Controlled Groups Let’s Explore them Both via our Semester Project Also Consider Emergent and Policy Issues IIE-62 Collaboration: Providers and Researchers CSE 300 Providers Seeking new Treatment Plans Looking for Clinical Research Studies for Patients Looking to Communicate with Clinical Researchers Researchers Publish Evidence-Based Guidelines New Treatments Collect Data on Provider Visits Provide Forum to Discuss with Provider Allow Provider to Upload Anonymous Outcomes Also – Need to Collaborate Among Researchers of All Types (Sharepoint, WIKIs, etc.) IIE-63 Collaboration: Providers and Patients CSE 300 Patients Open Personal Health Record to Providers Patients have Data Entry Facility for Chronic Conditions Ability to Graph and Track their Disease Education Materials also Available Providers Securely Communicate (email) with Patients (see https://www.relayhealth.com/rh/specific/patients/default.aspx) Access to Authorized Patient Data Tracking of Patients (to Reduce Office Visits) Proactive Intervention to Head off Potential Hospitalizations/Problems via Treatment Algorithms to Auto-Notify Based on Data Values IIE-64 Collaboration: Among Patients CSE 300 Patients Provide Each with a List of Support Groups Allow them to Join Groups or Form New Groups Secure Communication via: Email Chatting Environment Link to Actual (Physical Meetings) Repository of Available Support Groups Overall: Patients can Meet other Patients with Same Issues Vital for Patients with Rare Diseases Form On-Line Communities IIE-65 Security: General Concepts CSE 300 Authentication Proving you are who you are Signing a Message Is the Client who S/he Says they are? Authorization Granting/Denying Access Revoking Access Does the Client have Permission to do what S/he Wants? Encryption Establishing Communications Such that No One but Receiver will Get the Content of the Message Symmetric Encryption Public Key Encryption IIE-66 Key Security Issues CSE 300 Legal and Ethical Issues Information that Must be Protected Information that Must be Accessible Policy Issues Who Can See What Information When? Applications Limits w.r.t. Data vs. Users? System Level Enforcement What is Provided by the DBMS? Programming Language? OS? Application? How Do All of the Pieces Interact? Multiple Security Levels/Organizational Enforcement Mapping Security to Organizational Hierarchy Protecting Information in Organization IIE-67 What are Key Access Control Concepts? CSE 300 Assurance Are the Security Privileges for Each User Adequate to Support their Activities? Do the Security Privileges for Each User Meet but Not Exceed their Capabilities? Consistency Are the Defined Security Privileges for Each User Internally Consistent? Least-Privilege Principle: Just Enough Access Are the Defined Security Privileges for Related Users Globally Consistent? Mutual-Exclusion: Read for Some-Write for Others IIE-68 Available Security Approaches CSE 300 Mandatory Access Control (MAC) Bell/Lapadula Security Model Security Classification Levels for Data Items Access Based on Security Clearance of User Role Based Access Control (RBAC) Govern Access to Information based on Role Users can Play Different Roles at Different Times Responsibilities of Users Guiding Factor Facilitate User Interactions while Simultaneously Protecting Sensitive Data Discretionary Access Control (DAC) Richer Set of Access Modes - Govern Access to Information based on User Id Discretionary Rules on Access Privileges Focused on Application Needs/Requirements IIE-69 Mandatory Security Mechanism CSE 300 Typical Security Classification Levels for Subjects/programs and Objects/resources Top Secret (TS) and Secret (S) Confidential (C) and Unclassified (U) Rules: TS is the Highest and U is the Lowest Level TS > S > C > U Security Levels: C1 is Security Clearance Given to User U1 C2 is Security Classification Given to Object O1 U1 can Access O1 iff C1 C2 This is Referred to as the Domination of U1 Over O1 Not Prevalent in BMI – But May have Relevance IIE-70 Role Based Access Control (RBAC) CSE 300 Focuses on Defining Roles of Typical Behavior Nurse, Nurse-Manager, Education-RN Physician, Attending-MD, Specialist Student, Faculty-Advisor, Head Focus on Duties that are Shared During Authorization of Roles to Users Establish Boundaries of Access User Steve with Role Faculty-Advisor Limited to Faculty Capabilities on Peoplesoft Only Can Manipulate His Advisees User Steve with Role Associate Head Possible Overlap in Responsibilities w/ Faculty-Advisor Other Activities not given to Faculty-Advisor Role IIE-71 Why is RBAC Needed? CSE 300 In Health Care, different professionals (e.g., Nurses vs. Physicians vs. Administrators, etc.) Require Select Access to Sensitive Patient Data Suppose we have a Patient Access Client Lois playing the Nurse Role would be Allowed to Enter Patient History, Record Vital Signs, etc. Steve playing M.D. Role would be Allowed to do all of a Nurse plus Write Orders, Enter Scripts, etc. Vicky playing Admin Role would be Allowed to Enter Demographic/Insurance Info. Role Dictates Client Behavior Physician’s Write Scripts Nurses Enter Patient Data (Vitals + History) All Access Shared Medical Record Access is Limited Based on Role IIE-72 Discretionary Access Control CSE 300 Discretionary Grant Privileges to Users, Including Capabilities to Access Specific Data Items in a Specific Mode Available in Most Commercial DBMSs Aspects of DAC User’s Identity Predefined Discretionary “Rules” Defined by the Security Administrator Allows User to “Delegate” Capabilities to Another User Delegate Capabilities and Ability to Delegate Role Delegation and Delegation Authority DAC Available in SQL2 IIE-73 What is Role Delegation? CSE 300 Role Delegation, a User-to-User Relationship, Allows an Original User (OU) to Transfer Responsibility for a Particular Role to a Delegated User (DU) Two Major Types of Delegation Administratively-directed Delegation has an Administrative Infrastructure Outside the Direct Control of a User Mediates Delegation User-directed Delegation has an User (Playing a Role) Determining If and When to Delegate a Role to Another User In Both, Security Administrators Still Oversee Who Can Do What When w.r.t. Delegation IIE-74 Why is Role Delegation Important? CSE 300 Many Different Scenarios Under Which Privileges May Want to be Passed to Other Individuals Large organizations often require delegation to meet demands on individuals in specific roles for certain periods of time True in Many Different Sectors Health Care and Financial Services Engineering and Academic Setting Example: Reda Delegates Head Role to Steve when Traveling Key Issues: Who Controls Delegation to Whom? How are Delegation Requirements Enforced? IIE-75 Coalitions for Clinical/Translational Science CSE 300 Pfizer Bayer UConn Storrs UConn Health Center Saint DCF, Francis, DSS, etc. CCMC, … Info. Sharing - Joint R&D Support T1, T2, and Clinical Research Company and University Partnerships Collaborative Funding Opportunities Cohesive and Trusted Environment Existing Systems/Databases and New Applications How do you Protect Commercial Interests? Promote Research Advancement? Free Read for Some Data/Limited for Other? Commercialization vs. Intellectual Property? NIH FDA NSF Balancing Cooperation with Propriety IIE-76 Emergent Public Policy Issues CSE 300 How do we Protect a Person’s DNA? Who Owns a Person’s DNA? Who Can Profit from Person’s DNA? Can Person’s DNA be Used to Deny Insurance? Employment? Etc. How do you Define Security Limitations/Access? What about i2b2 – Informatics for Integrating Biology and the Bedside (see https://www.i2b2.org/) Scalable Informatics Framework to Bridge Clinical Research Data Vast Data Banks for Basic Science Research Goal: Understand Genetic Bases of Diseases IIE-77 Emergent Public Policy Issues CSE 300 Can DNA Repositories be Anonymously Available for Medical Research? Do Societal Needs Trump Individual Rights? Can DNA be Made Available Anonymously for Medical Research? De-identified Data Repositories Privacy Protecting Data Mining International Repository Might Allow Medical Researchers Access to Large Enough Data Set for Rare Conditions (e.g., Orphan Drug Act) Individual Rights vs. Medical Advances IIE-78 Internet and the Web CSE 300 A Major Opportunity for Business A Global Marketplace Business Across State and Country Boundaries A Way of Extending Services Online Payment vs. VISA, Mastercard A Medium for Creation of New Services Publishers, Travel Agents, Teller, Virtual Yellow Pages, Online Auctions … A Boon for Academia Research Interactions and Collaborations Free Software for Classroom/Research Usage Opportunities for Exploration of Technologies in Student Projects What are Implications for BMI? Where is the Adv? IIE-79 WWW: Three Market Segments Server CSE 300 Business to Business Corporate Network Server Intranet Decision support Mfg.. System monitoring corporate repositories Workgroups Information sharing Ordering info./status Targeted electronic commerce Internet Corporate Server Network Internet Sales Marketing Information Services Provider Network Server Provider Network Exposure to Outside IIE-80 Information Delivery Problems on the Net CSE 300 Everyone can Publish Information on the Web Independently at Any Time Consequently, there is an Information Explosion Identifying Information Content More Difficult There are too Many Search Engines but too Few Capable of Returning High Quality Data Most Search Engines are Useful for Ad-hoc Searches but Awkward for Tracking Changes What are Information Delivery Issues for BMI? Publishing of Patient Education Materials Publishing of Provider Education Materials How Can Patients/Providers find what Need? How do they Know if its Relevant? Reputable? IIE-81 Example Web Applications CSE 300 Scenario 1: World Wide Wait A Major Event is Underway and the Latest, Up-tothe Minute Results are Being Posted on the Web You Want to Monitor the Results for this Important Event, so you Fire up your Trusty Web Browser, Pointing at the Result Posting Site, and Wait, and Wait, and Wait … What is the Problem? The Scalability Problems are the Result of a Mismatch Between the Data Access Characteristics of the Application and the Technology Used to Implement the Application May not be Relevant to BMI: Hard to Apply Scenario IIE-82 Example Web Applications CSE 300 Scenario 2: Many Applications Today have the Need for Tracking Changes in Local and Remote Data Sources and Notifying Changes If Some Condition Over the Data Source(s) is Met To Monitor Changes on Web, You Need to Fire Your Trusty Web Browser from Time to Time, Cache the Most Recent Result, and Difference Manually Each Time You Poll the Data Source(s) Issue: Pure Pull is Not the Answer to All Problems BMI: If a Patient Enters Data that Sets off a Chain Reaction, how Can Provider be Notified and in Turn the Provider Notify the Patient (Bad Health Event) IIE-83 What is the Problem? CSE 300 Applications are Asymmetric but the Web is Not Computation Centric vs. Information Flow Centric Type of Asymmetry Network Asymmetry Satellite, CATV, Mobile Clients, Etc. Client to Server Ratio Too Many Clients can Swamp Servers Data Volume Mouse and Key Click vs. Content Delivery Update and Information Creation Clients Need to be Informed or Must Poll Clearly, for BMI, Simple Web Environment/Browser is Not Sufficient – No Auto-Notification IIE-84 What are Information Delivery Styles? CSE 300 Pull-Based System Transfer of Data from Server to Client is Initiated by a Client Pull Clients Determine when to Get Information Potential for Information to be Old Unless Client Periodically Pulls Push-Based System Transfer of Data from Server to Client is Initiated by a Server Push Clients may get Overloaded if Push is Too Frequent Hybrid Pull and Push Combined Pull First and then Push Continually IIE-85 Publish/Subscribe CSE 300 Semantics: Servers Publish/Clients Subscribe Servers Publish Information Online Clients Subscribe to the Information of Interest (Subscription-based Information Delivery) Data Flow is Initiated by the Data Sources (Servers) and is Aperiodic Danger: Subscriptions can Lead to Other Unwanted Subscriptions Applications Unicast: Database Triggers and Active Databases 1-to-n: Online News Groups May work for Clinical Researcher to Provider Push IIE-86 Design Options for Nodes CSE 300 Three Types of Nodes: Data Sources Provide Base Data which is to be Disseminated Clients Who are the Net Consumers of the Information Information Brokers Acquire Information from Other Data Sources, Add Value to that Information and then Distribute this Information to Other Consumers By Creating a Hierarchy of Brokers, Information Delivery can be Tailored to the Need of Many Users Brokers may be Ideal Intermediaries for BMI! Act on Behalf of Patients, Providers Incorporate Secure Access IIE-87 Research Challenges CSE 300 Ubiquitous/Pervasive Many computers and information appliances everywhere, networked together Inherent Complexity: Coping with Latency (Sometimes Unpredictable) Failure Detection and Recovery (Partial Failure) Concurrency, Load Balancing, Availability, Scale Service Partitioning Ordering of Distributed Events “Accidental” Complexity: Heterogeneity: Beyond the Local Case: Platform, Protocol, Plus All Local Heterogeneity in Spades. Autonomy: Change and Evolve Autonomously Tool Deficiencies: Language Support (Sockets,rpc), Debugging, Etc. IIE-88 Infosphere Problem: too many sources,too much information CSE 300 Internet: Information Jungle Infopipes Clean, Reliable, Timely Information, Anywhere Digital Earth Personalized Filtering & Info. Delivery Sensors IIE-89 Current State-of-Art CSE 300 Web Server Mainframe Database Server Thin Client IIE-90 Infosphere Scenario – for BMI CSE 300 Infotaps & Fat Clients Sensors Variety of Servers Many sources Database Server IIE-91 Heterogeneity and Autonomy CSE 300 Heterogeneity: How Much can we Really Integrate? Syntactic Integration Different Formats and Models Web/SQL Query Languages Semantic Interoperability Basic Research on Ontology, Etc Autonomy No Central DBA on the Net Independent Evolution of Schema and Content Interoperation is Voluntary Interface Technology (Support for Isvs) DCOM: Microsoft Standard CORBA, Etc... IIE-92 Security and Data Quality CSE 300 Security System Security in the Broad Sense Attacks: Penetrations, Denial of Service System (and Information) Survivability Security Fault Tolerance Replication for Performance, Availability, and Survivability Data Quality Web Data Quality Problems Local Updates with Global Effects Unchecked Redundancy (Mutual Copying) Registration of Unchecked Information Spam on the Rise IIE-93 Legacy Data Challenge CSE 300 Legacy Applications and Data Definition: Important and Difficult to Replace Typically, Mainframe Mission Critical Code Most are OLTP and Database Applications Evolution of Legacy Databases Client-server Architectures Wrappers Expensive and Gradual in Any Case IIE-94 Potential Value Added/Jumping on Bandwagon CSE 300 Sophisticated Query Capability Combining SQL with Keyword Queries Consistent Updates Atomic Transactions and Beyond But Everything has to be in a Database! Only If we Stick with Classic DB Assumptions Relaxing DB Assumptions Interoperable Query Processing Extended Transaction Updates Commodities DB Software A Little Help is Still Good If it is Cheap Internet Facilitates Software Distribution Databases as Middleware IIE-95 Data Warehousing and Data Mining CSE 300 Data Warehousing Provide Access to Data for Complex Analysis, Knowledge Discovery, and Decision Making Underlying Infrastructure in Support of Mining Provides Means to Interact with Multiple DBs OLAP (on-Line Analytical Processing) vs. OLTP Data Mining Discovery of Information in a Vast Data Sets Search for Patterns and Common Features based Discover Information not Previously Known Medical Records Accessible Nationwide Research/Discover Cures for Rare Diseases Relies on Knowledge Discovery in DBs (KDD) IIE-96 Data Warehousing and OLAP CSE 300 A Data Warehouse Database is Maintained Separately from an Operational Database “A Subject-Oriented, Integrated, Time-Variant, and Non-Volatile Collection of Data in Support for Management’s Decision Making Process [W.H.Inmon]” OLAP (on-Line Analytical Processing) Analysis of Complex Data in the Warehouse Attempt to Attain “Value” through Analysis Relies on Trained and Adept Skilled Knowledge Workers who Discover Information Data Mart Organized Data for a Subset of an Organization Establish De-Identified Marts for BMI Research IIE-97 Building a Data Warehouse CSE 300 Option 1 Leverage Existing Repositories Collate and Collect May Not Capture All Relevant Data Option 2 Start from Scratch Utilize Underlying Corporate Data Corporate data warehouse Option 1: Consolidate Data Marts Option 2: Build from scratch Data Mart ... Data Mart Data Mart Data Mart Corporate data IIE-98 BMI – Partition/Excerpt Data Warehouse CSE 300 Clinical and Epidemiological Research (and for T2 and T1) Each Study Submitted to Institutional Review Board (IRB) For Human Subjects (Assess Risks, Protect Privacy) See: http://resadm.uchc.edu/hspo/irb/ To Satisfy IRB (and Privacy, Security, etc.), Reverse Process to Create a Data Mart for each Approved Study Export/Excerpt Study Data from Warehouse May be Single or Multiple Sources BMI data warehouse Data Mart ... Data Mart Data Mart Data Mart IIE-99 Data Warehouse Characteristics CSE 300 Utilizes a “Multi-Dimensional” Data Model Warehouse Comprised of Store of Integrated Data from Multiple Sources Processed into Multi-Dimensional Model Warehouse Supports of Times Series and Trend Analysis “Super-Excel” Integrated with DB Technologies Data is Less Volatile than Regular DB Doesn’t Dramatically Change Over Time Updates at Regular Intervals Specific Refresh Policy Regarding Some Data IIE-100 Three Tier Architecture CSE 300 monitor External data sources OLAP Server integrator Summarization report Operational databases Extraxt Transform Load Refresh serve Data Warehouse Query report Data mining metadata Data marts IIE-101 Data Warehouse Design CSE 300 Most of Data Warehouses use a Start Schema to Represent Multi-Dimensional Data Model Each Dimension is Represented by a Dimension Table that Provides its Multidimensional Coordinates and Stores Measures for those Coordinates A Fact Table Connects All Dimension Tables with a Multiple Join Each Tuple in Fact Table Represents the Content of One Dimension Each Tuple in the Fact Table Consists of a Pointer to Each of the Dimensional Tables Links Between the Fact Table and the Dimensional Tables for a Shape Like a Star IIE-102 What is a Multi-Dimensional Data Cube? CSE 300 Representation of Information in Two or More Dimensions Typical Two-Dimensional - Spreadsheet In Practice, to Track Trends or Conduct Analysis, Three or More Dimensions are Useful For BMI – Axes for Diagnosis, Drug, Subject Age IIE-103 Multi-Dimensional Schemas CSE 300 Supporting Multi-Dimensional Schemas Requires Two Types of Tables: Dimension Table: Tuples of Attributes for Each Dimension Fact Table: Measured/Observed Variables with Pointers into Dimension Table Star Schema Characterizes Data Cubes by having a Single Fact Table for Each Dimension Snowflake Schema Dimension Tables from Star Schema are Organized into Hierarchy via Normalization Both Represent Storage Structures for Cubes IIE-104 Example of Star Schema CSE 300 Product Date Date Month Year Sale Fact Table Date ProductNo ProdName ProdDesc Categoryu Product Store Customer Unit_Sales Store StoreID City State Country Region Dollar_Sales Customer CustID CustName CustCity CustCountry IIE-105 Example of Star Schema for BMI CSE 300 Vitals Date Date Month Year Patient Fact Table Visit Date BP Temp Resp HR (Pulse) Vitals Symptoms Patient Medications Symptoms Pulmonary Heart Mus-Skel Skin Digestive Etc. Patient PatientID PatientName PatientCity PatientCountry Reference another Star Schema for all Meds IIE-106 A Second Example of Star Schema … CSE 300 IIE-107 and Corresponding Snowflake Schema CSE 300 IIE-108 Data Warehouse Issues CSE 300 Data Acquisition Extraction from Heterogeneous Sources Reformatted into Warehouse Context - Names, Meanings, Data Domains Must be Consistent Data Cleaning for Validity and Quality is the Data as Expected w.r.t. Content? Value? Transition of Data into Data Model of Warehouse Loading of Data into the Warehouse Other Issues Include: How Current is the Data? Frequency of Update? Availability of Warehouse? Dependencies of Data? Distribution, Replication, and Partitioning Needs? Loading Time (Clean, Format, Copy, Transmit, Index Creation, etc.)? For CTSA – Data Ownership (Competing Hosps). IIE-109 Knowledge Discovery CSE 300 Data Warehousing Requires Knowledge Discovery to Organize/Extract Information Meaningfully Knowledge Discovery Technology to Extract Interesting Knowledge (Rules, Patterns, Regularities, Constraints) from a Vast Data Set Process of Non-trivial Extraction of Implicit, Previously Unknown, and Potentially Useful Information from Large Collection of Data Data Mining A Critical Step in the Knowledge Discovery Process Extracts Implicit Information from Large Data Set IIE-110 Steps in a KDD Process CSE 300 Learning the Application Domain (goals) Gathering and Integrating Data Data Cleaning Data Integration Data Transformation/Consolidation Data Mining Choosing the Mining Method(s) and Algorithm(s) Mining: Search for Patterns or Rules of Interest Analysis and Evaluation of the Mining Results Use of Discovered Knowledge in Decision Making Important Caveats This is Not an Automated Process! Requires Significant Human Interaction! IIE-111 OLAP Strategies CSE 300 OLAP Strategies Roll-Up: Summarization of Data Drill-Down: from the General to Specific (Details) Pivot: Cross Tabulate the Data Cubes Slide and Dice: Projection Operations Across Dimensions Sorting: Ordering Result Sets Selection: Access by Value or Value Range Implementation Issues Persistent with Infrequent Updates (Loading) Optimization for Performance on Queries is More Complex - Across Multi-Dimensional Cubes Recovery Less Critical - Mostly Read Only Temporal Aspects of Data (Versions) Important IIE-112 On-Line Analytical Processing CSE 300 Data Cube A Multidimensonal Array Each Attribute is a Dimension In Example Below, the Data Must be Interpreted so that it Can be Aggregated by Region/Product/Date Product Product Store Date Sale acron Rolla,MO 7/3/99 325.24 budwiser LA,CA 5/22/99 833.92 large pants NY,NY 2/12/99 771.24 Pants Diapers Beer Nuts West East 3’ diaper Cuba,MO 7/30/99 81.99 Region Central Mountain South Jan Feb March April Date IIE-113 On-Line Analytical Processing CSE 300 For BMI – Imagine a Data Table with Patient Data Define Axis Summarize Data Create Perspective to Match Research Goal Essentially De-identified Data Mart Medication Patient Med BirthDat Dosage Steve Lipitor 1/1/45 10mg John Zocor 2/2/55 Harry Crestor 3/3/65 5mg Lois Lipitor 4/4/66 20mg Charles Crestor 7/1/59 Lescol Crestor Zocor Lipitor 80mg 10mg 5 10 Dosage 20 40 80 1940s 1950s 1960s 1970s Decade IIE-114 Examples of Data Mining CSE 300 The Slicing Action A Vertical or Horizontal Slice Across Entire Cube Months Slice on city Atlanta Products Sales Products Sales Months Multi-Dimensional Data Cube IIE-115 Examples of Data Mining CSE 300 The Dicing Action A Slide First Identifies on Dimension A Selection of Any Cube within the Slice which Essentially Constrains All Three Dimensions Months Products Sales Products Sales Months March 2000 Electronics Atlanta Dice on Electronics and Atlanta IIE-116 Examples of Data Mining Drill Down - Takes a Facet (e.g., Q1) and Decomposes into Finer Detail Jan Feb March Products Sales CSE 300 Drill down on Q1 Roll Up on Location (State, USA) Roll Up: Combines Multiple Dimensions From Individual Cities to State Q1 Q2 Q3 Q4 Products Sales Products Sales Q1 Q2 Q3 Q4 IIE-117 Mining Other Types of Data CSE 300 Analysis and Access Dramatically More Complicated! Time Series Data for Glucose, BP, Peak Flow, etc. Spatial databases Multimedia databases World Wide Web Time series data Geographical and Satellite Data IIE-118 Advantages/Objectives of Data Mining CSE 300 Descriptive Mining Discover and Describe General Properties 60% People who buy Beer on Friday also have Bought Nuts or Chips in the Past Three Months Predictive Mining Infer Interesting Properties based on Available Data People who Buy Beer on Friday usually also Buy Nuts or Chips Result of Mining Order from Chaos Mining Large Data Sets in Multiple Dimensions Allows Businesses, Individuals, etc. to Learn about Trends, Behavior, etc. Impact on Marketing Strateg IIE-119 Data Mining Methods (1) CSE 300 Association Discover the Frequency of Items Occurring Together in a Transaction or an Event Example 80% Customers who Buy Milk also Buy Bread Hence - Bread and Milk Adjacent in Supermarket 50% of Customers Forget to Buy Milk/Soda/Drinks Hence - Available at Register Prediction Predicts Some Unknown or Missing Information based on Available Data Example Forecast Sale Value of Electronic Products for Next Quarter via Available Data from Past Three Quarters IIE-120 Association Rules CSE 300 Motivated by Market Analysis Rules of the Form Item1^Item2^…^ ItemkItemk+1 ^ … ^ Itemn Example “Beer ^ Soft Drink Pop Corn” Problem: Discovering All Interesting Association Rules in a Large Database is Difficult! Issues Interestingness Completeness Efficiency Basic Measurement for Association Rules Support of the Rule Confidence of the Rule IIE-121 Data Mining Methods (2) CSE 300 Classification Determine the Class or Category of an Object based on its Properties Example Classify Companies based on the Final Sale Results in the Past Quarter Clustering Organize a Set of Multi-dimensional Data Objects in Groups to Minimize Inter-group Similarity is and Maximize Intra-group Similarity Example Group Crime Locations to Find Distribution Patterns IIE-122 Classification CSE 300 Two Stages Learning Stage: Construction of a Classification Function or Model Classification Stage: Predication of Classes of Objects Using the Function or Model Tools for Classification Decision Tree Bayesian Network Neural Network Regression Problem Given a Set of Objects whose Classes are Known (Training Set), Derive a Classification Model which can Correctly Classify Future Objects IIE-123 An Example CSE 300 Attributes Attribute Possible Values outlook sunny, overcast, rain temperature continuous humidity continuous windy true, false Class Attribute - Play/Don’t Play the Game Training Set Values that Set the Condition for the Classification What are the Pattern Below? Outlook Temperature Humidity sunny 85 85 overcast 83 78 sunny 80 90 sunny 72 95 sunny 72 70 … … … Windy false false true false false … Play No Yes No No Yes ... IIE-124 Data Mining Methods (3) CSE 300 Summarization Characterization (Summarization) of General Features of Objects in the Target Class Example Characterize People’s Buying Patterns on the Weekend Potential Impact on “Sale Items” & “When Sales Start” Department Stores with Bonus Coupons Discrimination Comparison of General Features of Objects Between a Target Class and a Contrasting Class Example Comparing Students in Engineering and in Art Attempt to Arrive at Commonalities/Differences IIE-125 Summarization Technique CSE 300 Attribute-Oriented Induction Generalization using Concert hierarchy (Taxonomy) barcode category 14998 milk brand diaryland content size Skim 2L food 12998 mechanical MotorCraft valve 23a 12in … … … … ... Milk … Skim milk … 2% milk Category milk milk … Content Count skim 2% … 280 98 ... bread White whole bread … wheat Lucern … Dairyland Wonder … Safeway IIE-126 Why is Data Mining Popular? CSE 300 Technology Push Technology for Collecting Large Quantity of Data Bar Code, Scanners, Satellites, Cameras Technology for Storing Large Collection of Data Databases, Data Warehouses Variety of Data Repositories, such as Virtual Worlds, Digital Media, World Wide Web Corporations want to Improve Direct Marketing and Promotions - Driving Technology Advances Targeted Marketing by Age, Region, Income, etc. Exploiting User Preferences/Customized Shopping What is Potential for BMI? How do you see Data Mining Utilized? What are Key Issues to Worry About? IIE-127 Requirements & Challenges in Data Mining CSE 300 Security and Social What Information is Available to Mine? Preferences via Store Cards/Web Purchases What is Your Comfort Level with Trends? User Interfaces and Visualization What Tools Must be Provided for End Users of Data Mining Systems? How are Results for Multi-Dimensional Data Displayed? Performance Guarantees Range from Real-Time for Some Queries to LongTerm for Other Queries Data Sources of Complex Data Types or Unstructured Data - Ability to Format, Clean, and Load Data Sets IIE-128 Concluding Remarks CSE 300 We’ve looked at: Informatics Information Engineering Information Usage and Repositories Focused on Their Applicability and Relevance for BMI Likely Generated More Questions than Answers IIE-129