Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Scalable Architecture for Federated Therapeutic Inquiries Network (SAFTINet) ETL Specifications Document Version 4.0 March 3rd, 2013 SAFTINet ETL Specifications Document Page 1 LICENSE © 2011 Foundation for the National Institutes of Health (FNIH). Licensed under the Apache License, Version 2.0 (the "License"); you may not use this document except in compliance with the License. You may obtain a copy of the License at http://omop.fnih.org/publiclicense. Unless required by applicable law or agreed to in writing, documentation and software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. Any redistributions of this work or any derivative work or modification based on this work should be accompanied by the following source attribution: "This work is based on work by the Observational Medical Outcomes Partnership (OMOP) and used under license from the FNIH at http://omop.fnih.org/publiclicense. Any scientific publication that is based on this work should include a reference to http://omop.fnih.org. This document was created specifically for the Scalable Architecture for Federated Translational Inquiries Network (SAFTINet) project, in collaboration with OMOP. It reflects changes to the OMOP CDMv2 to create OMOP CDMv3 which were done in collaboration with FNIH OMOP and the SCANNER (Scalable National Network for Effectiveness Research) project (http://scanner.ucsd.edu/) SAFTINet is supported by grant number R01HS019908 from the Agency for Healthcare Research and Quality. SAFTINet ETL Specifications Document Page 2 TABLE OF CONTENTS 1.0 Introduction 6 2.0 Definition of terms 7 3.0 Assumptions 12 4.0 Source Data Mapping Approach 4.1 Change to Existing Tables 4.2 Table Name: ORGANIZATION 4.3 Table Name: CARE_SITE 4.4 Table Name: PROVIDER 4.5 Table Name: X_DEMOGRAPHIC 4.6 Table Name: VISIT_OCCURRENCE 4.7 Table Name: DRUG_OCCURRENCE 4.8 Table Name: CONDITION_OCCURRENCE 4.9 Table Name: PROCEDURE_OCCURRENCE 4.10 Table Name: OBSERVATION 13 14 15 17 18 21 25 27 30 33 35 5.0 Appendix A: Table Specific Rules 38 6.0 Appendix B: Row Filters 39 7.0 Appendix C: Sending data using flatfiles 46 SAFTINet ETL Specifications Document Page 3 Document Control Authors and Contributors Name Organization Title Patrick Hosokawa Michael Kahn Elias Brandt Lisa Schilling COHO Statistician/Analyst Reviewers Name Role Title Christian Reich OMOP Project Manager Date Reviewed Patrick Ryan OMOP Co-investigator Document References Document Title Type of Reference Document Location OMOP CDM V3 Specification OMOP CDM Core and Dictionary Tables Release Notes OMOP OSIM Specification Business Rules Detailed Technical Information Detailed Technical Information OMOP Download Center OMOP Download Center SAFTINet ETL Specifications Document OMOP Download Center Page 4 Change Record Date Author Version Change Reference 02-Nov-2009 1.0 Original OMOP ETL Template Document 04-Oct-2011 Vicki Fan, Mark Khayter Patrick Hosokawa 2.0 20-Dec-2011 Patrick Hosokawa 2.1 Document adapted to SAFTINet ETL data model, flowcharts added to detail data flow from ETL model to grid model Document updated to 12/20/11 ETL data model 17-Mar-2012 Patrick Hosokawa 2.2 Document updated to 3/17/12 ETL data model 06-Aug-2012 Patrick Hosokawa 4.0 03-Mar-2013 Patrick Hosokawa 4.1 Change section removed, Appendix B updated, final move to CDMv4. Added data on labs provided to Appendix B. Additions to Appendix B, Added Appendix C for flatfile instructions SAFTINet ETL Specifications Document Page 5 1.0 Introduction This document reflects the requirements, assumptions, business rules and transformations for the implementation of OMOP CDM V3, as recommended for SAFTINet. The purpose of this document is two-fold: 1. Describe ETL mapping of data from SAFTINet partners into Common Data Model. 2. Serve as a blueprint for equivalent ETL mapping processes for other data sources into CDM. In each section, the tables and their mapping are individually reviewed along with any source specific rules and exceptions. The intended audiences for this document are the SAFTINet team and partner ETL technical personnel. Sections of the document are targeted specifically towards each audience with appropriate focus and level of detail. SAFTINet ETL Specifications Document Page 6 2.0 Definition of Terms TERM Activity DEFINITION A query or query response performed across the grid network as described in the following use cases Care Site (entity) The Care Site table refers to the lower level of the provider care hierarchy. Individual provider care locations will be stored in this table. Cohort A collection of subjects who meet specific demographic or clinical characteristics The CDM intends to facilitate observational analyses of disparate healthcare databases. The CDM defines table structures for each of the data entities (e.g., Persons, Visit Occurrence, Drug Exposure, Condition Occurrence, Observation, ProcedureOccurrence, etc.). It includes all observational data elements that are relevant to identifying exposure to various treatments and defining condition occurrence. The CDM includes both the vocabulary of terms and the entity domain tables. A concept is the basic unit of information. Concepts may be grouped into a given domain. A concept is a unique term that has a unique and static identifier/name, belongs to a Namespace, and may exist in relation to other concepts. The vertical relationships consist of "is a" statements that form a logical hierarchy. In general, concepts above a given concept are referred to as ancestors and those below as descendants. A condition is a disease, such as a heart condition, as in medical condition. Condition Occurrences record individual instances of a Person’s Conditions (i.e., diagnoses) extracted from source data. Conditions are recorded in various data sources in different forms with varying levels of standardization, and are stored in the CONDITION_OCCURRENCE table. A terminology that is maintained by the American Medical Association (AMA). It is used by hospitals for Medicare hospital outpatient and by physician for outpatient services. The data element mappings between two distinct data models, terminologies, or concepts. Data mapping is the process of creating data element mappings between two distinct data models. Data mapping is used as a first step for a wide variety of data integration tasks. Demographics refer to selected population characteristics. Demographics may include data such as race, age, sex, date of birth, location, etc. A data domain refers to all the unique values which a data element may contain. For example, a database table that has information about people, with one record per person, might have a "gender" column. This gender column might be declared Common Data Model Concept Condition Condition Occurrence (entity) Current Procedural Terminology (CPT), 4th edition Data Mapping Demographics Domain SAFTINet ETL Specifications Document Page 7 as a string data type, and allowed to have one of two known code values: "M" for male, "F" for female -- and NULL for records where gender is unknown or not applicable (or arguably "U" for unknown as a sentinel value). The data domain for the gender column is: "M", "F". Drug Drug Exposure (entity) Encrypted Unique Identifiers Electronic Health Record (EHR) Electronic Medical Record (EMR) Extract, Transform, Load (ETL) Generic Product Information (GPI) In database technology, domain refers to the description of an attribute's allowed values. The physical description is a set of values the attribute can have, and the semantic, or logical, description is the meaning of the attribute. In pharmacology, a drug as "a chemical substance used in the treatment, cure, prevention, or diagnosis of disease or used to otherwise enhance physical or mental well-being." Drugs may be prescribed for a limited duration, or on a regular basis for chronic disorders. The Drug Exposure entity contains individual records that suggest drug utilization by the person. Drug Exposure indicators store key information about each person medication and the timing thereof, including the drug (captured as standard Concept code in the CDM), quantity, beginning date of medication, number of days supply, period of exposure, and prescription refill data. Drug Exposures are stored in the DRUG_EXPOSURE table. Output of a de-identification process used to hash the identity of subjects, providing them with a unique but de-identified identifier. Electronic health record refers to an individual person's medical record in digital format. It may be made up of electronic medical records from many locations and/or sources. The EHR is a longitudinal electronic record of person health information generated by one or more encounters in any care delivery setting. Included in this information are person demographics, progress notes, problems, medications, vital signs, past medical history, immunizations, laboratory data and radiology reports. The EHR has the ability to generate a complete record of a clinical person encounter - as well as supporting other care-related activities directly or indirectly via interface - including evidencebased decision support, quality management, and outcomes reporting. An electronic medical record is a computerized legal medical record created in an organization that delivers care, such as a hospital or outpatient setting. Electronic medical records tend to be a part of a local stand-alone health information system that allows storage, retrieval and manipulation of records. This document will reference EHR moving forward even if certain data sources internally use the EMR definition. Process of getting data out of one data store (Extract), modifying it (Transform), and inserting it into a different data store (Load). A proprietary unique identifier for a drug used by the commercial Medi-Span® formulary database SAFTINet ETL Specifications Document Page 8 Grid-enabled network Grid Node Grid Portal Healthcare Common Procedure Coding System (HCPCS) International Classification of Disease, 9th Revision, Clinical Modifications (ICD9-CM) Investigator Logical Observation Identifiers Names and Codes (LOINC) Limited Data Set A collection of grid nodes (virtual organizations) capable of responding to/with grid query/response services A grid-enabled database containing data “owned” by a specific health care entity or virtual organization. Contains that set of services that allows queries to be sent, to give access to authorized user, and administer query and response activities. HCPCS Level I codes are managed by the AMA (licensing fees apply). The HCPCS Level II codes are managed by CMS (Centers for Medicare & Medicaid Services). The Level II codes includes: alphanumeric HCPCS procedure and modifier codes, their long and short descriptions, and applicable Medicare administrative, coverage, and pricing data. These codes are used for Medicare outpatient services. The official system of assigning codes to diagnoses and procedures associated with hospital utilization in the United States. Any authorized clinician or researcher, or person designated to act on their behalf (e.g., research assistant, statistician) who has been authenticated for access to query and response functionality on the grid-enabled network Universal code names and identifiers to medical terminology related to the Electronic Health Record and assists in the electronic exchange and gathering of clinical results (such as laboratory tests, clinical observations, outcomes management and research). As defined by HIPAA, limited data sets are data sets stripped of certain direct identifiers that are specified in the Privacy Rule. They are not de-identified information under the Privacy Rule. A limited data set is PHI that excludes the following direct identifiers of the individual or of relatives, employers, or household members of the individual: (1) names; (2) postal address information, other than town or city, state, and ZIP code; (3) telephone numbers; (4) fax numbers; (5) e-mail addresses; (6) social security numbers; (7) medical record numbers; (8) health plan beneficiary numbers; (9) account numbers; (10) certificate/license numbers; (11) vehicle identifiers and serial numbers, including license plate numbers; (12) device identifiers and serial numbers; (13) web URLs; (14) Internet Protocol (IP) address numbers; (15) biometric identifiers, including fingerprints and voiceprints; and (16) full-face photographic images and any comparable images. Importantly, unlike de-identified data, PHI in limited data sets may include the following: city, state and ZIP codes; all elements of dates (such as admission and discharge dates); and unique codes or identifiers not listed as direct identifiers. Recognizing that institutions, IRBs and investigators are frequently faced with applying both the Common Rule and the HIPAA Privacy Rule, OHRP does not consider a Limited Data SAFTINet ETL Specifications Document Page 9 Local Reference Value National Drug Codes (NDC) Observation (entity) Observational Medical Outcomes Partnership (OMOP) Organization (entity) Person (entity) Primary Care Physician Procedure Occurrence (entity) Protected Health Information (PHI) Set (as defined under the HIPAA Privacy Rule) to constitute individually identifiable information under 45 CFR 46.102(f)(2). The specific value stored used in the partner’s data to refer to any given concept. This value will be mapped to a standardized concept value for translation by ROSITA. Unique identifiers assigned to individual drugs. NDCs are used primarily as an inventory code and for prescriptions. The Observation table contains all general observations that are tracked as attributes, including source Observation code, matching standard Concept Code, date of the Observation, type of Observation, type of result, number/text/Concept code, and reference range for numeric results. Observation entities are recorded in the Observation table. A public-private partnership designed to protect human health by improving the monitoring of drugs for safety and effectiveness. The Organization table is the highest level of the partner care infrastructure hierarchy. Each organization may have multiple care sites. Providers will work at one or more care sites. A Person entity is one of the basic dimensions of analysis. It presents the framework for active drug surveillance. The Person entity is Concept driven, and its attribute values are stored as standard Concept codes rather than original (i.e., “raw”) source values and is stored in the logical X_Demographic table. A physician designated as responsible to provide specific care to a patient, including evaluation and treatment as well as referral to specialists. A Procedure Occurrence records individual instances of medical procedures extracted from source data. Procedures are recorded in various data sources in different forms with varying levels of standardization such as CPT-4, ICD-9-CM, and HCPCS procedure codes. These are stored in the PROCEDURE_OCCURRENCE table. Protected health information (PHI) under HIPAA includes any individually identifiable health information. Identifiable refers not only to data that is explicitly linked to a particular individual (that's identified information). It also includes health information with data items which reasonably could be expected to allow individual identification. De-indentified information is that from which all potentially identifying information has been removed. Provider (entity) The Provider table contains information on local care providers including type and specialty. Providers are assigned to an individual care site. Query A request for data based on the query specifications “sent” via a grid services portal to a specified grid network. A software package designed to transition SAFTINet data from the partner XML download to a grid database compatible form. This package will translate local source codes into OMOP concepts and will remove PHI other than dates of birth, dates of service, and zip codes. ROSITA SAFTINet ETL Specifications Document Page 10 RxNorm A standardized nomenclature for clinical drugs and drug delivery devices is produced by the National Library of Medicine. In RxNorm, the name of a clinical drug combines its ingredients, strengths, and/or form. RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First DataBank, Micromedix, MediSpan, Gold Standard Alchemy, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary. Subject A patient, client or person of interest in the use cases described whose clinical and demographic data are contained within the virtual organization(s) Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) SNOMED-CT is one of a suite of designated standards for use in U.S. Federal Government systems for the electronic exchange of clinical health information, and is also a required standard in interoperability specifications of the U.S. Healthcare Information Technology Standards Panel. SNOMED-CT is also being implemented internationally as a standard within other IHTSDO Member countries. Technical or special terms used in a business or special subject area. Any entity or group of entities (e.g., clinic, network of clinics, agency or agencies) whose data is represented by a single grid node and available through grid services for query/response activities The Visit Occurrence entity contains the information available in the source data about person visits to healthcare providers, including inpatient, outpatient, and ER visits. Visits are recorded in various data sources in different forms with varying levels of standardization. The detail level of the classification and description of the visit differs by data source. Visit Occurrence entities are recorded in the VISIT_OCCURRENCE table. A computerized list (as of items of data or words) used for reference (as for information retrieval or word processing). Terminology Virtual organization (aka Partner) Visit Occurrence (entity) Vocabulary SAFTINet ETL Specifications Document Page 11 3.0 Assumptions The design follows the agreed upon general project assumptions: 1 - Electronic Medical Data: EMR is a subset of EHR. This document will reference EHR moving forward even if specific data source might internally use Electronic Medical Record (EMR) definition. - Financial Information: The CDM model makes use of financial information such as Fees, Payments, Deductibles, Copayment, etc. from payer source data, such as Medicaid - Plan Detail Information: The model potentially makes use of fields related to Plan or Coverage details such as Benefit Plan, Plan Indicator, etc. of the administrative information in the claims data. The model makes use of medical coverage period and eligibility for prescription drugs. - Cleansing and Validation: The selected data fields will be handled (whether loaded directly or as part of a transformation) with a validation plan which is to be determined later. - Data Privacy: ETL from EHR/CDW will contain clear text direct patient identifiers and dates. ROSITA will encrypt all clear text direct patient identifiers. A random identifier (called a GUID) that is unrelated to any patient identifier will be associated with each patient record. Birth dates and dates of service will remain unchanged. Zip codes will also be forward to the grid node unchanged and as second variable to only include the 3-digit zip (the leftward 3 digits). The resulting data exported to the grid node will therefore be a limited data set containing encrypted direct identifiers with unchanged dates and both 5digit and 3-digit zip codes. The grid node will have no access to any clear text direct patient identifiers from the EHR/CDW. Under the assumption that payer data will be provided with clear text direct identifiers, ROSITA will perform record linkage to link the clinical record with the financial record using clear text identifiers. If a match is made, the same GUID assigned to the clinical data will be assigned to the financial data. Otherwise, a new GUID will be generated that is unrelated to any patient identifier. Dates will remain unchanged. The resulting data exported to the grid node will be consistent with a Limited Data Set containing encrypted direct identifiers, unchanged dates, a 5- and 3-digit zip code, and a GUID random identifier. The grid node will have no access to any clear text direct patient identifiers from payer (e.g. Medicaid) data. - Concept Identifiers: Data are represented through standard concept identifiers using a standardized terminology. During ETL, source data representations (raw data codes) will be translated to standard concept identifiers through a mapping process. If no standard concept identifier is available, the concept identifier field will contain ‘0’ as a value. SAFTINet ETL Specifications Document Page 12 4.0 Source Data Mapping Approach This section covers the high-level assumptions and approach to extraction, transformation and loading (ETL) of raw source data into the Common Data Model (CDM). The assumptions and approach are defined with a special focus on claims and EHR data. The section covers each of the major tables in the CDM separately, elaborating the distinct handling required for each. Unless otherwise specified with ‘Required’ in field listing, missing attributes will not disqualify data from being loaded into the Common Data Model. Missing attributes for Concept Identifiers will be populated with the value zero (0) in the CDM, while the rest of the missing attributes will be populated with NULL. The Source Field and Applied Rule fields are left blank for the partners to fill in. The source field should be filled in with the equivalent field in the partner’s source data. The Applied Rule field should contain any specialized rules (i.e. filtering, translation, combination of categories etc…) that the partner implements when filling in the field. In the flowcharts, the colors red, yellow, and green are used in the following manner. Left Side (ETL View): represents desired source data Red – Field is not brought forward into the grid Green – Field is brought forward into the grid Right Side (Grid View): represents desired grid-facing data Red – Field is generated by the Rosita application. It is not derived from any ETL data field. Yellow – Field is generated from ETL data, but does not exist as a field in the ETL data. Green – Field is brought forward from ETL data unchanged. The arrows indicate that the field on the right (in yellow) is generated from the field on the left (green for those fields brought forward, red otherwise). The grid facing data model (right side of the flowcharts) closely matches the OMOP v3 data model. However, the SAFTINet grid model has a few extra fields needed specifically for SAFTINet. All fields present in SAFTINet but not in the OMOP model use the prefix X_ (i.e. X_Organization_Source). SAFTINet ETL Specifications Document Page 13 4.1 Changes to existing tables Table Visit_Occurrence Changed Field Change x_visit_occurrence_source_identifier Changed from visit_occurrence_source_identifier, new x_ prefix is so the field can pass through to the grid Drug_Exposure x_visit_occurrence_source_identifier Changed from visit_occurrence_source_identifier, new x_ prefix is so the field can pass through to the grid Condition_Occurrence x_visit_occurrence_source_identifier Changed from visit_occurrence_source_value, new x_ prefix is so the field can pass through to the grid Procedure_Occurrence x_visit_occurrence_source_identifier Changed from visit_occurrence_source_value, new x_ prefix is so the field can pass through to the grid Observation x_visit_occurrence_source_identifier Changed from visit_occurrence_source_value, new x_ prefix is so the field can pass through to the grid SAFTINet ETL Specifications Document Page 14 4.2 Table Name: ORGANIZATION The Organization table is the highest level of the partner care infrastructure hierarchy. Each organization may have multiple care sites. Providers can work at one or more care sites. Address information submitted with the organization will be used to create a new location record which will be linked to the organization record via the Location_ID field. The field mapping is performed as follows: Destination Field organization_source _value X_data_source_type Data Type / Required String(50) / Required place_of_service_source _value String(20) / Required String (50) / Required organization_address_1 organization_address_2 organization_city organization_state organization_zip organization_county String (50) String (50) String (50) String (2) String (9) String (20) Source Field SAFTINet ETL Specifications Document Applied Rule Comment Local reference value for organization, used to create the organization_id field on the grid facing record. This value will also be used in other records to refer to the organization. Data Source Identifier (EHR / CDW / Medicaid) The type of organization. If the organization type is not defined in the source data refer to the place_of_service_type section of the Concept ID Table. Used to create place_of_service_concept _id. First line of the address Second line of the address City portion of the address State portion of the address Zip code of the address County portion of the address Page 15 4.2.1 Example of ORGANIZATION source / destination data ETL View Organization Table - XML organization_source_value1 x_data_source_type place_of_service_source_value organization_address_1 organization_address_2 organization_city organization_state organization_zip organization_county UC Internal Medicine EHR Academic Practice 13199 E Montview Blvd Suite 300, Mail Stop F443 Aurora CO 80045 Arapahoe Green – Brought forward into grid model / Red – Removed in processing 1. 2. 3. Grid View Organization Table - Grid organization_id organization_source_value x_data_source_type place_of_service_concept_id place_of_service_source_value location_id x_gride_node_id 22770494 UC Internal Medicine EHR 3389 Academic Practice 39458 1 Location Table - Grid location_id location_source_value x_data_source_type address_1 address_2 city state zip x_ zip_deidentified2 county x_location_type3 x_grid_node_id 39458 UC Internal Medicine EHR 13199 E Montview Blvd Suite 300, Mail Stop F443 Aurora CO 80045 800 Arapahoe Organization 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields The organization_source_value field will be compared to the current set of locations. If the value does not already occur in the table (new location) a row will be added to the table and a new ID (location_id) will be generated. Either a newly generated value or a pre-existing value (if the record is found) of the location table Primary Key will be placed into location_id. x_zip_deidentified will be generated from organization_zip. This field was created specifically for person locations to support the creation of ‘Safe Harbor’ Limited Data Sets. x_location_type will be derived from the XML record type (Organization in this case) SAFTINet ETL Specifications Document Page 16 4.3 Table Name: CARE_SITE The Care Site table refers to the lower level of the provider care hierarchy. Individual provider care locations will be stored in this table. The field mapping is performed as follows: Destination Field care_site_source_value Data Type String (50) / Required x_data_source_type String(20) / Required String (50) / Required organization_source _value place_of_service_source _value String (50) x_care_site_name care_site_address_1 care_site_address_2 care_site_city care_site_state care_site_zip care_site_county String(50) String (50) String (50) String (50) String (2) String (9) String (20) Source Field SAFTINet ETL Specifications Document Applied Rule Comment Local reference value for care site, used to create the care_site_id field on the grid facing record. This value will also be used in other records to refer to the care site. Data Source Identifier (EHR / CDW / Medicaid) Local reference value for organization. This value will be matched against the organization table to obtain the corresponding organization_id. The type of care site. If the care site type is not defined in the source data refer to the place_of_service_type section of the Concept ID Table. Used to create place_of_service_concept _id. Name of the clinic (care site) First line of the address Second line of the address City portion of the address State portion of the address Zip code of the address County portion of the address Page 17 4.3.1 Example of CARE SITE source / destination data ETL View Care Site Table - XML care_site_source_value1 x_data_source_type organization_source_value place_of_service_source_value x_care_site_name care_site_address_1 care_site_address_2 care_site_city care_site_state care_site_zip care_site_county UC Internal Medicine EHR University of Colorado Internal Medicine Eastside Clinic 13199 E Montview Blvd Suite 300, Mail Stop F443 Aurora CO 80045 Arapahoe Green – Brought forward into grid model / Red – Removed in processing 1. 2. 3. Grid View Care Site Table - Grid care_site_id care_site_source_value x_data_source_type location_id organization_id place_of_service_concept_id place_of_service_source_value x_care_site_name x_grid_node_id 22770494 UC Internal Medicine EHR 49382 382392 39458 Internal Medicine Eastside Clinic 1 Location Table - Grid location_id location_source_value x_data_source_type address_1 address_2 city state zip x_zip_deidentified2 county x_location_type3 x_grid_node_id 49382 UPI Building EHR 13199 E Montview Blvd Suite 300, Mail Stop F443 Aurora CO 80045 800 Arapahoe Care Site 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields The care_site_source_value field will be compared to the current set of locations. If the value does not already occur in the table (new location) a row will be added to the table and a new ID (location_id) will be generated. Either a newly generated value or a pre-existing value (if the record is found) of the location table Primary Key will be placed into location_id. x _zip_deidentified will be generated from care_site_zip. This field was created specifically for person locations to support the creation of ‘Safe Harbor’ Limited Data Sets x_location_type will be derived from the XML record type (Care Site in this case) SAFTINet ETL Specifications Document Page 18 4.4 Table Name: PROVIDER The Provider table contains information on local care providers including type and specialty. Providers are assigned to an individual care site. The field mapping is performed as follows: Destination Field provider_source_value Data Type String (50) / Required x_data_source_type npi dea specialty_source_value String(20) / Required String (50) String (50) String (50) x_provider_first x_provider_middle x_provider_last care_site_source_value String (75) String (75) String (75) String (50) x_organization_source _value String (50) / Required Source Field SAFTINet ETL Specifications Document Applied Rule Comment Local reference value for provider, used to create the provider_id field on the grid facing record. This value will also be used in other records to refer to the provider. Data Source Identifier (EHR / CDW / Medicaid) Provider NPI Provider DEA Number Provider type as recorded at the source (e.g. Physican, NP, MA, etc). If the provider type is not defined in the source data refer to the Health Care Provider Specialty section of the Concept ID Table. Used to create specialty_concept_id Provider First Name Provider Middle Name (or initial) Provider Last Name Local reference value for Care Site. This value will be matched against the Care Site table to obtain the corresponding care_site_id. Local reference value for Organization. This value will be matched against the Care Site table to obtain the corresponding organization_id. Page 19 4.4.1 Example of PROVIDER source / destination data ETL View Provider Table - XML provider_source_value x_data_source_type npi dea specialty_source_value x_provider_first x_provider_middle x_provider_last care_site_source_value x_organization_source_value 349302 EHR 34930302 49492 General Practitioner Marcus W Welby UC Internal Medicine University of Colorado Green – Brought forward into grid model / Red – Removed in processing / Blue – Item under discussion SAFTINet ETL Specifications Document Grid View Provider Table - Grid provider_id provider_source_value x_data_source_type npi dea specialty_source_value specialty_concept_id x_provider_first x_provider_middle x_provider_last care_site_id x_organization_id x_grid_node_id 2399450 349302 EHR 34930302 49492 General Practitioner 20302 Marcus W Welby 22770494 3939 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields / Blue – Item under discussion Page 20 4.5 Table Name: X_Demographic The X_Demographic table stores information about individual patients, the PHI elements of this record will be stripped out in the transformation to the grid model. Address information will be limited and used to create a new location record. The field mapping is performed as follows: Destination Field Data Type person_source_value String (50) / Required x_data_source_type medicaid_id_number ssn last middle first address_1 address_2 city state zip county year_of_birth Source Field String (20) / Required String (50) String (50) String (75) String (75) String (75) String (50) String (50) String (50) String (2) String (9) String (20) Applied Rule Comment Person unique identifier at the source (MRN). Used to create the person_id field on the grid facing record. This value will also be used in other records to refer to the person. Data Source Identifier (EHR / CDW / Medicaid) Medicaid ID Number Social Security Number Last Name Middle Name or Initial First Name The first line of the person's actual address. The first line of the person's actual address. The city portion of the person's actual address. The state portion of the person's actual address. Zip code of the person's actual address. The county portion of the person’s address as recorded at source. Year of birth month_of_birth day_of_birth gender_source_value Number(4) / Required Number (2) Number (2) String (50) race_source_value String (50) Local reference value for race of the person. Used to create race_concept_id. ethnicity_source_value String (50) Local reference value for ethnicity of the person. Used to create ethnicity_concept_id. SAFTINet ETL Specifications Document Month of birth Day of birth Local reference value for gender of the person. Used to create gender_concept_id Page 21 provider_source_value String (50) care_site_source_value String (50) x_organization_source _value String (50) / Required SAFTINet ETL Specifications Document Local reference value for patient’s primary provider (if any). This value will be matched against the Provider table to obtain the corresponding provider_id. Local reference value for the patient’s primary Care Site (if any). This value will be matched against the Care Site table to obtain the corresponding care_site_id. Local reference value for patient’s organization. This value will be matched against the Organization table to obtain the corresponding organization_id. Page 22 4.5.1 Example of X_Demographic source / destination data ETL View X_Demographic Table - XML person_source_value x_data_source_type medicaid_id_number ssn last middle first address_1 address_2 city state zip county year_of_birth month_of_birth day_of_birth gender_source_value race_source_value ethnicity_source_value provider_source_value care_site_source_value x_organization_source_value 29201082 EHR 3903432 999-99-9999 Doe D John 123 Fake St Apt 566 Aurora CO 80045 Arapahoe 1965 2 9 Male White Non-Hispanic 35346346 UC Internal Medicine University of Colorado Green – Brought forward into grid model / Red – Removed in processing SAFTINet ETL Specifications Document GRID View Person Table - Grid person_id person_source_value2 location_id1 year_of_birth month_of_birth day_of_birth gender_concept_id gender_source_value race_concept_id race_source_value ethnicity_concept_id ethnicity_source_value provider_id3 care_site_id x_organization_id x_grid_node_id 22770494 49382 1965 2 9 675 Male 344 White 202 Non-Hispanic 34235556 22770494 382392 1 Location Table - Grid location_id1 39458 location_source_value x_data_source_type EHR 4 address_1 address_24 city Aurora state CO zip 80045 x_zip_deidentified5 800 county Arapahoe 6 x_location_type 34344 x_grid_node_id 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields Page 23 1. 2. 3. 4. 5. 6. The location ID value is not linked to a location_source_ value in this case. When the address information is transferred to the location table, the resulting ID value will be placed in the person record for reference The grid version of the person table contains a blank field for person_source_value to comply with the OMOP standard. The value for person_source_value on the ETL side will not be carried forward due to privacy concerns. The grid facing provider_id will be derived from the ETL field provider_source_value. When creating the location table the local values for person address will not be passed through to the grid, although they are labeled green because in other instances, such as Organization and Care Site, they do move forward to the grid facing database x _zip_deidentified will be generated from zip. This field was created specifically for person locations to support the creation of ‘Safe Harbor’ Limited Data Sets x_location_type will be derived from the XML record type (Person in this case) SAFTINet ETL Specifications Document Page 24 4.6 Table Name: VISIT_OCCURRENCE The Visit Occurrence table contains a record for each patient-provider encounter. The provider, patient and location are all stored as well as the type of visit. The field mapping is performed as follows: Destination Field x_visit_occurrence _source_identifier Data Type String (50) / Required x_data_source_type String (20) / Required String (50) / Required person_source_value visit_start_date visit_end_date place_of_service _source_value Source Field DATE/ Required DATE / Required String (50) x_provider_source_value String (50) care_site_source_value String (50) SAFTINet ETL Specifications Document Applied Rule Comment Local reference value for visit, used to create the visit_occurrence_id field on the grid facing record. Data Source Identifier (EHR / CDW / Medicaid) Person unique identifier at the source (MRN). This value will be matched against the Person table to obtain the corresponding person_id. The date on which the Visit started The date on which the Visit ended Visit type (office visit, med refill, face-to-face, telephone, med refill … etc). If the visit site type is not defined in the source data refer to the Visit_Type section of the Concept ID Table. Used to create place_of_service_concept_id Local reference value for the provider conducting the visit. This value will be matched against the Provider table to obtain the corresponding provider_id. Local reference value for the Care Site of the visit. This value will be matched against the Care Site table to obtain the corresponding care_site_id. Page 25 4.6.1 Example of VISIT OCCURRENCE source / destination data ETL View Visit Occurrence Table - XML x_visit_occurrence_source _identifier x_data_source_type person_source_value visit_start_date visit_end_date place_of_service_source_value x_provider_source_value care_site_source_value 349302 EHR 2302202 5/23/2011 5/25/2011 Physical 20302340 UC Internal Medicine Green – Brought forward into grid model / Red – Removed in processing SAFTINet ETL Specifications Document Grid View Visit Occurrence Table - Grid visit_occurrence_id x_data_source_type person_id visit_start_date visit_end_date place_of_service_concept_id place_of_service_source_value x_provider_id care_site_id x_grid_node_id 3203402 EHR 30205202 5/23/2011 5/25/2011 302023003 Physical 04594020 202033 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields Page 26 4.7 Table Name: DRUG_EXPOSURE The Drug Occurrence table contains a record for each prescribed medication. The prescriber, patient, and prescription information are all stored as well as the associated visit and condition. The field mapping is performed as follows: Destination Field drug_exposure_source _identifier Data Type String (50) / Required x_data_source_type String (20) / Required String (50) / Required person_source_value drug_source_value String (50) drug_source_value _vocabulary drug_exposure_start _date String (50) / Required Date / Required drug_exposure_end_date Date drug_type_source_value String (50) / Required stop_reason String (20) refills Number(4) Source Field SAFTINet ETL Specifications Document Applied Rule Comment Unique Transaction Identifier - Could be an Rx Order ID, used to create the drug_exposure_id field on the grid facing record. Data Source Identifier (EHR / CDW / Medicaid) Person unique identifier at the source (MRN). This value will be matched against the Person table to obtain the corresponding person_id. Local reference value for drug identifier. The types of identifiers allowed include National Drug Codes (NDCs), Generic Product Identifier (GPI) codes. Used to create the drug_concept_id field on the grid facing record. Vocabulary from which the source values are derived (used for 2-field match to concept ID) This is the Start Date for the current instance of drug utilization. Valid indicators include a start date of a prescription, the date a prescription was filled, or the date on which a drug administration procedure was recorded. This is the End Date for the current instance of drug utilization. It is not available from all sources Type of drug exposure (prescription, med history, fulfillment) as recorded in source data. If the drug type is not defined in the source data refer to the Drug Exposure Type section of the Concept ID Table. Used to create drug_type_concept_id The reason the medication was stopped, where available. Reasons include Regimen completed, Changed, Removed, etc. The number of refills for the prescription Page 27 quantity Number (8,2) days_supply Number (4) x_drug_name String (255) / Required String (50) String (500) String (50) / Required x_drug_strength sig provider_source_value x_visit_occurrence_ source_identifier String (50) relevant_condition _source_value String (50) SAFTINet ETL Specifications Document The quantity of drug recorded in the corresponding Drug Exposure Instance The number of days' supply of the medication recorded in the corresponding Drug Exposure Instance. Drug name taken verbatim from source field Strength (taken verbatim) (e.g. 20, 1000, 2-4, 1) Sig (if available) Local reference value for prescribing/administering provider (if any). This value will be matched against the Provider table to obtain the corresponding provider_id. Local reference value for the visit where the drug was prescribed/administered. This value will be matched against the Visit Occurrence table to obtain the corresponding visit_occurrence_id. Associated Diagnosis Source Code. This is the code for the condition for which the drug was given. This value is independent and will not be matched against the Condition Occurrence table. Page 28 4.7.1 Example of DRUG EXPOSURE source / destination data ETL View Drug Exposure Table - ETL drug_exposure_source_identifier x_data_source_type person_source_value drug_source_value drug_source_value_vocabulary drug_exposure_start_date drug_exposure_end_date drug_type_source_value stop_reason Refills quantity days_Supply x_drug_name x_drug_strength sig provider_source_value x_visit_occurrence_source _identifier relevant_condition_source_value 30003400 EHR 2302202 4594930302 NDC 4/19/2011 5/19/2011 Prescription Regimen Completed 1 60 30 Amoxicillin 500 239292 3499202 393821 Green – Brought forward into grid model / Red – Removed in processing SAFTINet ETL Specifications Document Grid View Drug Exposure Table - Grid drug_exposure_id x_data_source_type person_id drug_concept_id drug_source_value drug_exposure_start_date drug_exposure_end_date drug_type_concept_id stop_reason refills quantity days_Supply x_drug_name x_drug_strength sig prescribing_provider_id visit_occurrence_id relevant_condition_concept_id x_grid_node_id 9947839 EHR 30205202 499506 4594930302 4/19/2011 5/19/2011 983921 Regimen Completed 1 60 30 Amoxicillin 500 3935050 040200 059439333 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields Page 29 4.8 Table Name: CONDITION_OCCURRENCE The Condition Occurrence table contains a record for each patient condition. The codes associated with the conditions as well as the associated person, provider, and visits/encounters are also recorded. The field mapping is performed as follows: Destination Field condition_occurrence _source_identifier Data Type String (50) / Required x_data_source_type String (20) / Required String (50) / Required person_source_value condition_source_value condition_source_value _vocabulary x_condition_source_desc condition_start_date x_condition_update_date condition_end_date Source Field String (50) / Required String(50) / Applied Rule Comment Source Condition Primary Key; could be a unique record identifier. Used to create the condition_occurrence_id field on the grid facing record. Data Source Identifier (EHR / CDW / Medicaid) Person unique identifier at the source (MRN). This value will be matched against the Person table to obtain the corresponding person_id. Local diagnosis code (e.g. ICD-9, SNOMED etc…). Used to create condition_concept_id Type of code (e.g. ICD-9) used for condition. Required String (50) Date / Required Date Date condition_type_source _value String (50) / Required stop_reason String (20) associated_provider _source_value String (50) SAFTINet ETL Specifications Document Source Diagnosis Text Description Onset Date Date condition was updated/reviewed Resolved Date – Leave blank for unresolved conditions. Type of condition as recorded in source data (e.g. chief complaint, problem list, etc). If the condition type is not defined in the source data refer to the Condition_Occurrence section of the Concept ID Table. Used to create condition_type_concept_id The reason, if available, that the condition was no longer recorded, as indicated in the source data. Valid values include discharged, resolved etc… Provider ID from the source - Provider of record. This value will be matched against the Provider table to obtain the corresponding provider_id. Page 30 x_visit_occurrence _source_identifier String (50) SAFTINet ETL Specifications Document Local reference value for visit. This value will be matched against the Visit Occurrence table to obtain the corresponding Visit Occurrence ID. Page 31 4.8.1 Example of CONDITION OCCURRENCE source / destination data ETL View Condition Occurrence Table - ETL condition_occurrence_source_identifier x_data_source_type person_source_value condition_source_value condition_source_value_vocabulary x_condition_source_desc condition_start_date x_condition_update_date condition_end_date condition_type_source_value stop_reason associated_provider_source_value x_visit_occurrence_source_identifier Grid View 30003400 EHR 393030 162.9 ICD9 Malignant Neop 4/19/2011 10/19/2011 Chief Complaint 392904 403030 Green – Brought forward into grid model / Red – Removed in processing SAFTINet ETL Specifications Document Condition Occurrence Table - Grid condition_occurrence_id x_data_source_type person_id condition_concept_id condition_source_value x_condition_source_desc condition_start_date x_condition_update_date condition_end_date condition_type_concept_id stop_reason associated_provider_id visit_occurrence_id x_grid_node_id 8349393 EHR 94849303 884934 162.9 Malignant Neop 4/19/2011 10/19/2011 499404 39304 90493023 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields Page 32 4.9 Table Name: PROCEDURE_OCCURRENCE The Procedure Occurrence table contains a record for each procedure. The type of procedure as well as the associated person and visit are recorded. The field mapping is performed as follows: Destination Field procedure_occurrence_s ource_identifier Data Type String (50) /Required x_data_source_type String (20) / Required String (50) / Required person_source_value procedure_source_value String (50) / Required procedure_source_value _vocabulary procedure_date String(50) / Required DATE / Required procedure_type_source _value String (50) provider_record_source _value String (50) x_visit_occurrence _source_identifier String (50) relevant_condition _source_value String (50) Source Field SAFTINet ETL Specifications Document Applied Rule Comment Source Procedure Primary Key. Used to create the procedure_occurrence_id field on the grid facing record. Data Source Identifier (EHR / CDW / Medicaid) Person unique identifier at the source (MRN). This value will be matched against the Person table to obtain the corresponding person_id. The Procedure Code as captured from the source data. Values include CPT-4, ICD-9-CM (Procedure), HCPCS, and other procedure codes. Used to create procedure_concept_id. Type of code (e.g. CPT) used for condition. The date on which the procedure began (or was performed) The procedure type as stored in source. If the procedure type is not defined in the source data refer to the Procedure Occurrence section of the Concept ID Table. Used to create procedure_type_concept_id. Local Reference value for Provider. This value will be matched against the Provider table to obtain the corresponding provider_id. Local Reference value for visit. This value will be matched against the Visit Occurrence table to obtain the corresponding visit_occurrence_id. First Associated Diagnosis Code. Used to create relevant_condition_concept_id. Page 33 4.9.1 Example of PROCEDURE OCCURRENCE source / destination data ETL View Procedure Occurrence Table - ETL procedure_occurrence_source_identifier x_data_source_type person_source_value procedure_source_value procedure_source_value_vocabulary procedure_date procedure_type_source_value provider_record_source_value x_visit_occurrence_source_identifier relevant_condition_source_value Grid View 9848493 EHR 594928 49750 CPT 4/19/2011 Inpatient header 23902023 2302320 20230 Green – Brought forward into grid model / Red – Removed in processing SAFTINet ETL Specifications Document Procedure Occurrence Table - Grid procedure_occurrence_id x_data_source_type person_id procedure_concept_id procedure_source_value procedure_date procedure_type_concept_id associated_provider_id visit_occurrence_id relevant_condition_concept_id x_grid_node_id 393948230 EHR 3493030 39949023 49750 4/19/2011 884934 34040222 20923042 23032009 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields Page 34 4.10 Table Name: OBSERVATION The Observation table contains records for labs, measurements such as height and weight, etc… It is also where information from Past Medical History, Past Surgical History, Allergy, and Social/Personal History are stored. The field mapping is performed as follows: Destination Field observation_source _identifier Data Type String (50) / Required x_data_source_type String (20) / Required String (50) / Required person_source_value observation_source _value observation_source _value_vocabulary observation_date observation_time value_as_number String (50) / Required String(50) / Required Date / Required Time NUMBER(14,3) value_as_string String (60) unit_source_value String (50) range_low NUMBER(14,3) Source Field SAFTINet ETL Specifications Document Applied Rule Comment Source Primary Key for Observation Record. Used to create the obs_occurrence_id field on the grid facing record. Data Source Identifier (EHR / CDW / Medicaid) Person unique identifier at the source (MRN). This value will be matched against the Person table to obtain the corresponding person_id. The Observation Code as it appears in the source data. Used to create obs_concept_id Vocabulary used for the observation The date of the Observation The time of the observation The observation result stored as a numeric value. This is applicable to observations where the result is expressed as a numeric value. The observation result stored as character string. It is applicable to the observations where the result is expressed as a character string. Used to create obs_value_as_concept_id. Unit of measure for Observation result when measured as a numeric value. Used to create unit_concept_id The lower limit of the numeric range of the Observation value. It is not applicable if the observation results are non-numeric or categorical, and must be in the same units of measure as the observation value Page 35 range_high NUMBER(14,3) observation_type_source _value String (50) / Required associated_provider _source_value String (50) x_visit_occurrence_ source_identifier String (50) relevant_condition _source_value x_obs_comment String (50) String (500) SAFTINet ETL Specifications Document The upper limit of the numeric range of the Observation value. It is not applicable if the observation results are non-numeric or categorical, and must be in the same units of measure as the observation value Type of observation (e.g. PRO, Lab, History of, Social History, Allergies). If the visit site type is not defined in the source data refer to the Observation section of the Concept ID Table. Used to create observation_type_concept_id Provider ID from the source. This value will be matched against the Provider table to obtain the corresponding provider_id. Local reference value for visit. This value will be matched against the Visit Occurrence table to obtain the corresponding visit_occurrence_id. First Associated Diagnosis Code. Used to create relevant_condition_concept_id. Contains Result Comments – do not use this field for now Page 36 4.10.1 Example of OBSERVATION source / destination data ETL View Observation Table - ETL observation_source_identifier x_data_source_type person_source_value observation_source_value observation_source_value_vocabulary observation_date observation_time value_as_number value_as_string unit_source_value range_low range_high observation_type_source_value asociated_provider_source_value x_visit_occurrence_source_identifier relevant_condition_source_value x_obs_comment 40230320 EHR 20202302 BP_Systolic University Lab 7/12/2011 4:53:00 PM 148 mmHg 50 200 Lab Value 930392 2020200 401.2 Green – Brought forward into grid model / Red – Removed in processing SAFTINet ETL Specifications Document Grid View Observation Table - Grid observation_id x_data_source_type person_id observation_concept_id observation_source_value observation_date observation_time value_as_number value_as_string value_as_concept_id unit_concept_id unit_source_value range_low range_high observation_type_concept_id associated_provider_id visit_occurrence_id relevant_condition_concept_id x_obs_comment x_grid_node_id 23902323 EHR 3903030 102190 8393929 7/12/2011 4:53:00 PM 148 020333 mmHg 50 200 2032002 939393 2002303 302023 1 Green – Brought forward from ETL / Yellow – Generated from ETL field / Red – Generated locally or from multiple ETL fields Page 37 Appendix A: Table Specific Rules Person Table o Recordset should consist of all information (including inpatient and outpatient visits) about any patients with activity (outpatient visits) at a participating primary care site within the past 5 years (back to 1/1/2007 for initial SAFTINet load) o For any patient seen within the past 5 years we request data retrospectively as described below. SAFTINet ETL Specifications Document Page 38 Appendix B: Row filters This section details the types of data that will go into each table. For each table, the rightmost columns lists the general data domains (e.g. Lab values) along with the specific concepts (e.g. Blood Pressure) within each domain that should be gathered for the table. When a date is listed with a concept, please gather all records after that date. For most concepts, this will mean gathering the last 5 years of data (2007-2012), though some concepts go back further such as colonoscopy and pneumovax. Organization Care Site Provider Person One record per grouping of care sites operating under a single health care hierarchy Include a record for any location where care is provided (examples include clinics, mobile units and "home-health care"). Multiple separate care-sites in a single building could be grouped together, or not depending on partner's preference Include a record for every provider who appears in the "provider" table OR the subset of the table that can be linked to a claim, a visit, or a prescription, whatever is easiest. If filtering, include all providers who have been active since 1/1/2007 even if not currently active. Include a record for each person who has had some sort of contact with the participating clinics since 1/1/2007 (regardless of current activity status). This set of persons can be used to filter the rest of the clinical data - only pull data related to this set of patients. SAFTINet ETL Specifications Document Page 39 For the following four tables, we wish to collect the specified record types. Please check the ‘Collected?’ column for any record types that will be included in the source data file. Also, please list the local source value for that type. Example: If the local tag for Systolic BP that will go into the observation_source_value field is ‘SBP’, put that in the local name column where systolic BP is listed. Record Type Drug Exposure Condition Occurrence Observation Minimum Date Result Type Collected? Local Name Include a record for each prescription / fill / drug administration. Prescription Medication List Administered Drugs Fulfillment Include a record for each entry on the problem list as well as a record for each encounter level diagnosis code. Generally, these will be ICD-9 codes. Problem list Visit-level diagnosis codes ICD-9 codes from claims record Data that do not fit in another table belong here. Observation table contains data from the following categories: lab observations (i.e. test results), general clinical findings, signs, and symptoms, along with other domains listed below. Vital Signs Height 1/1/2007 Height Percentile (for children) 1/1/2007 Weight 1/1/2007 Weight Percentile (for children) 1/1/2007 Pulse oximetry 1/1/2007 Pulse 1/1/2007 Blood Pressure - Systolic Blood Pressure - Diastolic Social History Smoking Status (Current/Past/Former/Second Hand Exposure) Drinking Status 1/1/2007 1/1/2007 All Records / No Date Limit All Records / No Date Limit Past Medical History (To be defined) SAFTINet ETL Specifications Document Page 40 Past Surgical History (To be defined) Lab Results Cholesterol 1/1/2007 LDL 1/1/2007 Alanine transaminase 1/1/2007 Albumin 1/1/2007 Alkaline Phosphatase Aspartate aminotransferase Bilrubin (Total, Indirect and Direct) Blood Urea Nitrogen_Serum Calcium-Serum CBC_% lymphocytes CBC_% Neutrophils CBC_White Blood Cell Count Chlamydia trachomatis DNA assay (procedure) Chol HDL Chol_LDL, calculated Chol_LDL, measured directly Chol_Total Creatinine_Serum Free T4 Glucose, Fasting_Serum Glucose, Random_Serum Glucose_Serum Hemoglobin A1c Hemoglobin_Serum Hepatitis B core antibody Hepatitis B e antibody Hepatitis B e antigen Hepatitis B surface antibody Hepatitis B surface antigen Hepatitis C antibody Hepatitis C antigen INR Platelet Count Potassium SAFTINet ETL Specifications Document 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 Page 41 Prostate specific antigen measurement (procedure) Pulmonary Function Test Sodium Triglycerides TSH Urinary Protein Urine microalbumin/creatinine ratio measurement (procedure) Urine protein/creatinine ratio measurement (procedure) Urine_Microalbuminuria measurement (procedure) Urine_Protein measurement (procedure) Creatinine_phosphokinase GFR, estimated influenza assay influenza rapid assay (poct) pertussis test respiratory syncytial test FEV1, pre, number FEV1, pre, percent FEV1, post, number FEV1, post, percent FVC, pre, number FVC, pre, percent FVC, post, number FVC, post, percent PFT: Peak expiratory flow Allergies Family History Family History of CVD Patient Reported Outcomes Medication Adherence Survey MAS 1a MAS 1b MAS 1c MAS 1d SAFTINet ETL Specifications Document 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 Yes/No Yes/No Yes/No Yes/No Page 42 MAS 1e MAS 1f MAS 1f Text MAS 1g MAS 1g Text MAS 2 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 MAS Q1 MAS Q2 MAS Q2a 1/1/2007 1/1/2007 1/1/2007 Yes/No Yes/No Text Yes/No Text Categorical or Numeric Yes/No Categorical Yes/No MAS Q2b Asthma Control Test ACT Total Score ACT Category1 ACT-1 ACT-2 ACT-3 ACT-4 ACT-5 Childhood Asthma Control Test C-ACT Total Score C-ACT Category1 C-ACT-1 C-ACT-2 C-ACT-3 C-ACT-4 C-ACT-5 C-ACT-6 C-ACT-7 PHQ-2 Q1 score PHQ-2 Q2 score PHQ-2 total score PHQ-9 FuncQ score PHQ-9 Q1 score PHQ-9 Q2 score PHQ-9 Q3 score PHQ-9 Q4 score PHQ-9 Q5 score PHQ-9 Q6 score 1/1/2007 Yes/No SAFTINet ETL Specifications Document 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 Page 43 PHQ-9 Q7 score PHQ-9 Q8 score PHQ-9 Q9 score PHQ-9 Total score Demographic Information Highest Education Level Achieved Language Preference Procedure Occurrence 1/1/2007 1/1/2007 1/1/2007 1/1/2007 All Records / No Date Limit All Records / No Date Limit Imputed Race / Ethnicity All Records / No Date Limit Person % Fed Poverty level 1/1/2007 Person family size 1/1/2007 Family income 1/1/2007 Person relationship status 1/1/2007 Person Practice Status (active or Most Recent moved or gone elsewhere) / No Date Limit Include a record for each procedure performed on a patient (CPT-4, ICD-9-CM (Procedures), and HCPCS codes). If you want to filter the procedure table, at least include the following procedures Procedures Bone mineral density (DEXA scan) Colonoscopy Diabetic Eye Exam Diabetic Foot Exam Double contrast barium enema Mammogram Pap Smear Pulmonary Function Test Spirometry Mechanical Ventilation Continuous nebulized therapy Endotracheal intubation Critical Care Fecal occult blood test Immunizations SAFTINet ETL Specifications Document 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 1/1/2007 Page 44 Pneumovax Other Immunizations Education Education Nutrition Education Weight loss management 1/1/2007 1/1/2007 1/1/2007 1. ACT and C-ACT categories should be one of the following: 1 = ACT in control (Total score > 19) 2 = ACT poorly controlled (Total score 16-19) 3 = ACT very poorly controlled (Total score < 15) SAFTINet ETL Specifications Document Page 45 Appendix C: Sending data using flatfiles Some users may wish to send their data in a standard flatfile as opposed to the current XML. ROSITA is being modified to handle such files. The basic file should be a .txt style text file with columns arranged in the order listed in this document. Individual column values should be separated by a pipe ‘|’character. A total of 9 files should be loaded in the initial round, one for each table in Sections 4.2-4.10. The files will be processed in the same fashion as the current XML files (see ROSITA Admin Guide for further details) Example: 1 row from a sample Organization file This record (from Section 4.2): Organization Table - XML organization_source_value1 x_data_source_type place_of_service_source_value organization_address_1 organization_address_2 organization_city organization_state organization_zip organization_county UC Internal Medicine EHR Academic Practice 13199 E Montview Blvd Suite 300, Mail Stop F443 Aurora CO 80045 Arapahoe Should be represented as follows in the file (the actual text should be all on one line): UC Internal Medicine|EHR|Academic Practice|13199 E Montview Blvd|Suite 300, Mail Stop F443|Auora|CO|80045|Arapahoe Users should apply the following rules when generating flatfiles: - Send a separate file for each data table - Files should be named using the following convention [table name].txt - Column values should be separated with the | character used as a delimiter - Files should contain one record per row. No header row is needed, the first row should be actual data - Quotation marks occurring within column values should be ‘escaped’ so the processor can locate them. This should be done with the \ character – the end result should look like \” SAFTINet ETL Specifications Document Page 46 - Backslash marks occurring within column values should also be ‘escaped’ with a second backslash – the end result should look like \\ - Datetime values should be in the following format 2012-01-09T12:00:00Z (example: 2012-01-09 4:15:00 PM) and dates should be use the following format YYYY-MM-DD SAFTINet ETL Specifications Document Page 47