Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
caTIES 2.0 APIII 2006 Rebecca Crowley Kevin Mitchell Presentation Overview caTIES – Goals Tissue Banking Collaboration Grid Trust Fabric Concept coding and recoding Data stewardship, data sharing and honest brokering Interoperability within a grid community University of Pittsburgh caTIES – Goals The Cancer Text Information Extraction System (caTIES) pilot project will focus on two important challenges of bioinformatics: (1) information extraction from free text and (2) access to tissue. Specifically, caTIES has four primary goals: 1. Extract coded information from free text Surgical Pathology Reports (SPRs) using controlled terminologies to populate caBIG-compliant data structures. 2. Provide researchers with the ability to query, browse, and acquire annotated tissue data and physical material across a network of federated sources. 3. Provide a collaboration space in which researchers may construct and manage retrospective tissue distribution protocols. 4. Pioneer research for distributed text information extraction within the context of caBIG. caTIES modules will be developed as generalized components available in caBIG, to encourage reuse by other caBIG projects that require tissue information extraction. University of Pittsburgh Tissue Banking Collaboration Administrator initiation of a Research Protocol – The IT System’s administrator is responsible for providing support for the electronic capture of research information. The Administrator works with Researchers, Health Care Professionals, IRBs and others to establish repositories of electronic data often categorized by study Researcher case discovery and order generation – In conducting tissue sample based retrospective research studies, Researchers examine free text descriptions of those tissue or delegate the responsibility of gathering a tissue collection to Honest Brokers. Honest Broker order facilitation – Work with Tissue Bank personnel to acquire tissue and tissue related materials. Work with courier system to deliver orders to researchers. These orders often need to maintain a degree of atomicity University of Pittsburgh Administrator – Create New Study University of Pittsburgh Administrator – Assign Organization Role University of Pittsburgh Administrator – Add User to Study as Role University of Pittsburgh Researcher Perspective University of Pittsburgh Researcher - Graphical Search Specification University of Pittsburgh Honest Broker – Verifies Physical Material University of Pittsburgh Honest Broker – Relays Order Status back to Researcher University of Pittsburgh Grid Trust Fabric Electronic Components (4 Pillars of security) Identity (DN or public key) Isolation Traceability Authentication (TLS handshake) Prevent Identity Theft Authorization (gridmapfile or Globus+OGSA-AuthZ+Services) Access Control Resource Control Audit (logfiles) Troubleshooting Forensics Accounting University of Pittsburgh Grid Trust Fabric (cont) Social Fabric Narrative DeIdentification defined by levels or kind of DeIdentification. Narrative redactors Concept Coders Information Extraction to Synoptic Structures IRB must endorse federated environment Individuals must maintain a level of integrity University of Pittsburgh Current caTIES Security Summary of caTIES’ current security solution 1. User Registration with IMS – GUMS 2. User Registration with caTIES System – CTRM 3. Authentication and Authorization – GUMS + CTRM 4. User Access to caTIES Resources – caTIES Client University of Pittsburgh User Authentication - GUMS User Authentication Scenario: Users log into the caTIES client with their GUMS username and password. The caTIES client securely connects to GUMS with the user’s GUMS X.509 certificate and retrieves the GUMS user proxy. The caTIES client uses the user proxy to securely connect to the EVS service exposed by caTIES. This is essentially a connectivity check, and any caTIES secured service could be used. University of Pittsburgh User Authentication User Authorization - CTRM CTRM contains user authorization information. It contains information about how users are related to organizations. It classifies these user-organization relationships by the following roles - Researcher, Institution Honest Broker or Local Administrator. The CTRM service is responsible for issuing queries to the CTRM. When a user is authenticated, the user proxy’s distinguished name is sent as a query parameter to the CTRM service by the caTIES Client. CTRM Services in turn fetches the user’s role from CTRM and sends the user’s role information to the client. University of Pittsburgh De-Identification caTIES De-Identification service scrubs pathology report, creates de-identified identifiers, loads ‘De-Identified’ caTIES datastore caTIES de-identification service wraps the de-ID™ software; easy to switch Safe-Harbor method removes HIPAA mandated identifiers Creates tokens for names and preserves temporal relationships De-ID will work with adopters as each site comes on-line Currently evaluating Harvard Scrubber open-source option University of Pittsburgh Concept Coding and Recoding Changing dimensions necessitate recoding Vocabulary revisions Algorithmic enhancements and bug fixes De-Identification redactor errors What is the necessary level of auditing for recoding? University of Pittsburgh Tokenization Sectioning Concept Mapping with MMTx Negation and Semantic Type Categorization RegEx Finding Attribute Value Concept Coded Structured Data University of Pittsburgh Data stewardship, data sharing and honest brokering CaTIES maintains data in three databases that are schematically equivalent but differ in their deployment location, security configuration, and the data being held. Each Role has limited access to the set of data sources public datastore – (Researcher) private datastore – (Honest Broker) central tissue resource manager datastore (Administrator, Researcher, Honest Broker) University of Pittsburgh caTIES Model Three points for Data Access: University of Pittsburgh Interoperability within a grid community MDA - caBIG uses Model Driven Architecture to automatically generate Object Relational Mapping (ORM) middleware. Following caBIG’s semi-automated guidelines for application development guarantees grid compliant data services. caBIG annotates data and service interfaces with a conceptual ontology. This provides an environment for intelligent discovery and automatic data transformation. University of Pittsburgh caTIES Development Process 1. Design UML Model in Enterprise Architect 2. Metadata annotation using NCIT (public model only) 3. CDEs are registered in the caDSR in the ‘caBIG’ context 4. 5. 6. Run Model through caCORE SDK to generate API and caTIES Silver Application Implement API generated by the SDK for caTIES’ Client’s functions Utilize caGrid SDK to generate Gold front-end to the caTIES Silver Application University of Pittsburgh cd CaTIES Reference Model domain::Patient # # # # # # # domain::Application id: java.lang.Long uuid: java.lang.String race: java.lang.String ethnicity: java.lang.String birthDate: java.util.Date gender: java.lang.String conceptCodeSet: java.lang.String #patient #pathologyReportCollection 1 # # # # id: java.lang.Long uuid: java.lang.String version: java.lang.String name: java.lang.String #application #patient 1 0..1 0..* domain::PathologyReport # # # # # # # # # # # # # # # # id: java.lang.Long uuid: java.lang.String originalId: java.lang.String collectionDateTime: java.util.Date patientAgeAtCollection: java.lang.Integer patientAgeAtCollectionMvr: java.lang.String #pathologyReport documentText: java.lang.String 1 documentXml: java.lang.String documentBinary: java.lang.String conceptCodeSet: java.lang.String classifiedConceptCodeSet: java.lang.String isFlaggedForReview: java.lang.Boolean isTissueAvailable: java.lang.Boolean isQuarantined: java.lang.Boolean reviewComment: java.lang.String honestBrokerComment: java.lang.String #pathologyReport domain::ConceptClassification # # # id: java.lang.Long uuid: java.lang.String name: java.lang.String #conceptReferentCollection #conceptClassification #conceptReferentCollection 1 domain::Concept # # # # # # id: java.lang.Long uuid: java.lang.String cui: java.lang.String tui: java.lang.String name: java.lang.String semanticType: java.lang.String #concept 1 0..* # # # # #conceptReferentCollection # # 1..* # #executionCollection domain::Execution #executionCollection # 0..* # # # id: java.lang.Long uuid: java.lang.String startTime: java.util.Date endTime: java.util.Date 0..1 0..* #conceptReferentCollection 0..* domain::ConceptReferent id: java.lang.Long uuid: java.lang.String documentFragment: java.lang.String startOffset: java.lang.Long endOffset: java.lang.Long isModifier: java.lang.Boolean isNegated: java.lang.Boolean University of Pittsburgh 0..* caTIES Phase 2 Grid-Enabled [Public] Model Development Process Summary University of Pittsburgh Access to caTIES Public Resources Dual Access to caTIES 1. Via caTIES Client 2. Via caGrid Gold API. The caTIES Gold Service provides programmatic access to caTIES’ resources. The caGrid Browser implements this API to query resources. University of Pittsburgh Sample Query Silver Format DetachedCriteria p = DetachedCriteria.forClass(PathologyReport.class); p.add(Restrictions.like(“uuid","e44ddc0f-c589-11da-bbee-5103a71c2a47")); List resultList = appService.query(p,PathologyReport.class.getName()) ; for(int i=0;i<resultList.size();i++){ PathologyReport pr = (PathologyReport)reslutSet.get(i); pr.getDocumentText(); } Gold Format <caBIGXMLQuery name="MyQueryTest3"> <Target name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport"> <Objects name="edu.upmc.opi.caBIG.caTIES.database.domain.PathologyReport"> <Property name="uuid" predicate="equal" value="e44ddc0f-c589-11da-bbee-5103a71c2a47"/> </Objects> </Target> </caBIGXMLQuery> University of Pittsburgh Query run by caTIES Client University of Pittsburgh Query run through caGrid Browser University of Pittsburgh Query run through caGrid Browser University of Pittsburgh Query run through caGrid Browser University of Pittsburgh Equivalent Results Both methods return the same Pathology Report caGRID Browser caTIES Client University of Pittsburgh CaDSR CDEs CAP Protocols University of Pittsburgh Shallow Structure Derivation based on conceptual matching. University of Pittsburgh Deep Structure Inference Based on Discourse Reasoning University of Pittsburgh