Download SQL Server Business Intelligence for disease

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bioterrorism wikipedia , lookup

Transcript
Disease Monitoring with SQL
Server BI
SQL Server Business Intelligence for disease surveillance
and monitoring
Matt F. Smith
SQL Server Data Engineering
Presentation for SQL Saturday #596, Denver CO USA
20170225
> whoami
• Enjoy working with the MSFT SQL stack, TSQL
code & infrastructure
• My recent learning has been around Cloudera,
HDFS, Sqoop, Flume and Spark…I am very excited
for SQL Server on Linux
• I write code every day
Today’s Agenda
A BI Solution Overview
• Big Words & Useful Terminology.
• A SQL Server BI Solution for Disease Surveillance
and Monitoring.
• Other interesting things…
Relevant terminology
 Epidemiology: Study and analysis of the patterns, causes, and effects of
health and disease conditions in defined populations
 MMWR: Morbidity and Mortality Weekly Report (MMWR)
 Diagnosis of Tuberculosis in Three Zoo Elephants and a Human Contact — Oregon, 2013
 You can subscribe on the CDC web site!
 Incident Proportion (important metric!) : [# of new cases during a specific
time period] / [Population] (varies by neighborhood)
 5-week rolling window: (current week +- 2 weeks), last 5 years. Used to
trend and compare current rate (now) to historical rate.
 Standard Deviation: sqrt of variance (T-SQL STDEV)
 Alert: >=1sd and <2sd
 Outbreak: >=2sd
ETL Architecture
Software: Microsoft SQL Server Stack (SQL 2016, SSIS, SSRS, PowerBI Desktop)
Infra: SQL Server running on VmWare, Dell Servers
(Please see Dell/SQL Server Reference Architectures)
Systems Overview
 Flat files and MS Access databases dropped on network share from partners via sFTP
 SQL Agent Jobs execute SSIS packages to pick files and load into raw data staging area (minimal
transformations in this stage)
 Composite primary keys are created from incoming data, hashes created for keys and row
data, custom CDC process merges raw data into secondary data staging area
 Data is loaded into data warehouse – Normalized EDW containing 30+ source systems, schema
based on HL7 data model
 Data is provisioned into a data mart for reporting and analysis twice daily
 SSRS points to data mart, mart contains some SQL Server views for access by named power
users who query from SSMS, Microsoft Excel and Microsoft Power BI
Reporting & BI Architecture
SQL Server Reporting Services, Microsoft Excel, Power BI
SQL Server Reporting Services
•
•
•
•
•
SSRS reports are useful for ~80% of customer base
Dashboards: Anything that has alerted or outbreak over the past 5 weeks (let’s keep an eye on it)
Relevant information from incoming phone calls: Disease incidents
Team Metrics: Incoming Disease Count, Count for Investigation, Count Closed, etc.
SSRS Mapping components used extensively – Data points geo-coded for neighborhood mapping,
web services for accurate geo-coding
SQL Server Views reference data for Power BI users
• Views present data from a data mart
• Customers simply pull data into excel and pivot
Project Team
Lean team, 1- PM/Business Analyst, 2 – Developers/QA/Analyst
Experienced team has breadth and depth to deliver the project (It’s about quality, not quantity!)
• Methodology: Scrum-fall – Some requirements pre-defined, Agile used to
evolve/work through these and other undefined requirements
• PM does BA work in addition to project management
• Developers do BA work, code, write test cases, refine user stories, meet and work
with end-users
• Product owners, developers, PM, major stakeholders attend daily scrum
• Stakeholders accept work, perform their own QA using SSMS & excel, are most
familiar with the data and the reporting requirements (internal as well as external
reporting – State of CO)
Challenges
 Not being an Epidemiologist – knowledge gap
 Solution: Document process, learn/engage/understand stakeholder operations, partner with
customer and share information. Grow the relationship.
 Schedule and availability of resources to assist with QA
 Test driven development ensures tests are written before development starts. T-SQL tests help
make QA meetings move more quickly
Stakeholders
Tracking, Investigating alerts and outbreaks of infectious disease
 Epidemiology Team: 12 persons: Epidemiologists, Nurses, Managers, Directors,
other team members
 Incoming cases tracked, filtered by disease type, assigned to individuals for followup (interviews)
 Outbreaks or major issues (Measles, Mumps) may require participation from
entire team until investigation is complete
 State of Colorado - reporting
…and yes, zombies are real.
Zombie Ants: adaptive parasite manipulation
https://www.scientificamerican.com/article/fungus-makes-zombie-ants/
Fungus Makes Zombie Ants Do All the Work
A tropical fungus has adapted to infect ants and force them to chomp, with surprising specificity, into
perfectly located leaves before killing them and taking over their bodies
By Katherine Harmon on July 31, 2009