* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SQL_Saturday_#600_
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational algebra wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Database model wikipedia , lookup
The Lost Science of Set Theory March 11, 2017 Aaron N. Cutshall, MSCIS, MSHI Just who is this guy? Sr. Data Architect B.S. Computer Science 2 | 03/11/2017 M.S. Computer Information Systems Local Group Leader Speaker – various events 30+ M.S. Health Informatics SQL Saturday #600 – Chicago 2017 Years Agenda What is Set Theory? Set Operations Set Operations in Relational Databases Card Tricks Database Join Operation What does this mean to me? Procedural vs Set: What’s the big deal? In closing… 3 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Something to consider… “A set is a Many that allows itself to be thought of as a One.” -- Georg Cantor, 1845-1918 German mathematician and inventor of set theory 4 | 03/11/2017 SQL Saturday #600 – Chicago 2017 What is Set Theory? Is a collection of objects Group objects together in a set Treat the set as a single entity The objects are known as members of the set The members are related to one another in some fashion An operation is performed against all members of the set simultaneously Sets can be divided into sub-sets Multiple sets can be merged into a single set 5 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Some examples in the real world Examples of sets: Real Numbers (3.14159, 23, -14.76, 256, -12, etc.) Countries of the world Fruits (apples, bananas, oranges, peaches, etc.) Database professionals (SQL Server, Oracle, etc.) Deck of cards 6 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Some examples in the real world Examples of sub-sets: Integers (-12, 256, 65536, -351, etc.) Countries of Europe Citrus fruits (oranges, lemons, pineapples, etc.) SQL Server professionals Card Suits (spades, clubs, hearts, diamonds) 7 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Why is this important? Set theory typically taught only in Computer Science classes (if at all) Many DB Developers started out as procedural programmers in Object-Oriented Programming Systems (OOPS) Java Ruby-on-Rails C# Therefore, most DB Developers have had very little if any exposure to Relational DB theory 8 | 03/11/2017 SQL Saturday #600 – Chicago 2017 What is the result? Set theory is avoided like the plague Many DB solutions end up developed in a procedural fashion Many are simply just scared of the prospect of handling set-based operations! 9 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Set Operations Union – Combine members of all sets 10 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Set Operations Intersection – Common members of all sets 11 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Set Operations Difference – Members in one set not in another OR 12 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Set Operations Symmetric Difference – Members in one set or the other but not in both 13 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Set Operations in Relational Databases 14 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Set Operations in Relational Databases Relational Algebra Mathematical relationship between the MEMBERS of each set Results not in just the members of the sets, but the RELATIONSHIPS between the sets Establishing relationship criteria between sets Explicit – Specifically stated relationships Implicit – Implied relationships Resultant sets identify all possible relationships between members of the sets based on criteria 15 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Now for some card tricks! 16 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Characteristics of cards Suits & Color 17 | 03/11/2017 Face Cards SQL Saturday #600 – Chicago 2017 Numeric Cards How set theory is used Identify sub-sets based upon member characteristics: All cards that are Diamonds. All face cards that are Clubs. All Aces that are black. All numeric cards. 18 | 03/11/2017 SQL Saturday #600 – Chicago 2017 How sets are used in a Relational DB Query data based upon criteria: SELECT Card FROM Room WHERE Suit = ‘Diamond’; SELECT Card FROM Room WHERE Suit = ‘Club’ AND ISNUMERIC(CardNbr) = 0; -- Face Card SELECT Card FROM Room WHERE Color = ‘Black’ AND CardNbr = ‘A’; SELECT Card FROM Room WHERE ISNUMERIC(CardNbr) = 1; -- Numeric Card 19 | 03/11/2017 SQL Saturday #600 – Chicago 2017 INNER JOIN (Intersect) Most frequently used join Produces a set of records which match both tables Pure Set: All cards from both Left Side and Right Side that have only the Hearts suit. Relational Set: SELECT L.Card, R.Card FROM LeftSide L INNER JOIN RightSide R ON L.CardNbr = R.CardNbr WHERE L.Suit = ‘Hearts’; 20 | 03/11/2017 SQL Saturday #600 – Chicago 2017 LEFT JOIN (Difference & Intersect) Second most frequently used join All records from left table and matching ones from right table Pure Set: All cards from both Left Side and Right Side that have red suits. Relational Set: SELECT L.Card, R.Card FROM LeftSide L LEFT JOIN RightSide R ON L.CardNbr = R.CardNbr WHERE L.Color = ‘Red’; 21 | 03/11/2017 SQL Saturday #600 – Chicago 2017 RIGHT JOIN (Intersect & Difference) Rarely Used – Similar to LEFT JOIN in reverse All records from right table and matching ones from left table Pure Set: All cards from both Right Side and Left Side that are face cards. Relational Set: SELECT L.Card, R.Card FROM LeftSide L RIGHT JOIN RightSide R ON R.CardNbr = L.CardNbr WHERE ISNUMERIC(L.CardNbr) = 0; 22 | 03/11/2017 SQL Saturday #600 – Chicago 2017 FULL OUTER JOIN (Union) Lesser value – Union of LEFT JOIN and RIGHT JOIN All records from left and right tables regardless of matches Pure Set: All cards from both Left Side and Right Side. Relational Set: SELECT L.Card, R.Card FROM LeftSide L FULL OUTER JOIN RightSide R ON L.CardNbr = R.CardNbr; 23 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Database Join Operation Join Operation Selection Criteria System constraints (memory, disk space, etc.) Table structure Data volume Indexing structure The set of values in the participating columns Actual join operation performed in memory Increased performance More memory, better performance 24 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Database Join Operation Adaptable to changing conditions Changes in query or join criteria Changes in data and/or data relationships (via updated statistics) Changes in hardware status (memory usage, disk usage, etc.) Previous executions (saved execution plans) Join Operation Join Operation selected is dependent upon type of query, tables sizes, indexes available, etc. Engine attempts to select proper join operation for given circumstances 25 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Database Join Operation Merge Both tables are joined on sorted indexed columns SQL Server scans once through the sorted inputs and merges the data together Nested Loop When one of the tables in the join clause contains a small number of records (inner table) compared to the number of records in the other participating table (outer table) Nested loops scan once through each inner table row and searches for the corresponding row in the outer table 26 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Database Join Operation Hash When large, unsorted, non-indexed columns are used in the join condition Engine prepares a hash table for the smaller relation by applying a hash function to the join attribute of each row (sort of like creating a temporary index on the fly) Scans the larger relation and finds the relevant rows by looking at the hash table Note One join operation no more costly than another except when used improperly 27 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Procedural vs Set: What’s the big deal? Procedural (or Object-Oriented) processing Data is very abstract Structure or organization of the data is unimportant Separated by different layers of abstraction Concerned with what to do and how to do it Step-by-step Statements, Loops, Object manipulation, etc. Different logic paths depending upon data attributes Record by record processing (RBAR) Provides total control You know exactly what it will do Execution method will be consistent each and every time 28 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Procedural vs Set: What’s the big deal? Set-based processing Describes what should be done Indicates the operation Identifies the conditions Allows the engine to determine an optimal method to obtain results based upon: Data structure, organization and volume Statistics from previous runs Frees developer from mechanical/specific details Allows for changing conditions without altering query RDBMS is optimized to select or manipulate records sets as a whole (often at the expense of procedural operations) 29 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Example Scenario Process Healthcare Measures for multiple hospitals Measures are typically ratios (numerator over denominator) Ex. Measure: identify diabetic patients (denominator) who have an LDL-C < 100 mg/dL (numerator) Calculation session may be for as many as 70+ hospitals (120k+ patients each) Each hospital may require 40 or more measures Restrictions Each measure must be calculated independently Each measure may be processed for one or more hospitals and/or time-periods concurrently Same data may be used for multiple measures 30 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Procedural Example Start Job (Call SP to create job record) Start Session (Call SP to create session record) Load temp table for hospitals from a very complex view construct (multiple nested views referencing same tables) For each row in the temp table (via cursor) Calculate Reporting Period Begin/End Date for hospital Start Program Session (Call SP to create record) Load temporary tables For each measure a hospital requires (via cursor) Clear working tables EXEC @MeasureSP (@HospitalId, @MeasureId) Copy working tables to permanent tables End Program Session (Call SP to update record) End Session (Call SP to update session record) End Job (Call SP to update session job) 31 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Procedural Example Example of a xxx_Start: Determine @JobId Call SP Job_Insert INSERT INTO tbl_Job VALUES (@JobId, @Parameter1, @Parameter2, …); Example of a xxx_End: Call SP Job_Update UPDATE tbl_Job SET StopTime = GETDATE() WHERE JobId = @JobId; 32 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Set-Based Example SELECT Facilities & Measures INTO tbl_FacilityMeasure FROM SimplifiedQuery with Reporting Period Begin/End Date calculated SET @CurrSeqNbr = 1; WHILE @ CurrSeqNbr <= @MaxSeqNbr Load temporary tables EXEC @MeasureSP (@CurrSeqNbr) Copy working tables to permanent tables SET @ CurrSeqNbr = @ CurrSeqNbr + 1; 33 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Why is this better? Eliminated nested SP calls which resulted in a single record insert/update Loaded all data into temp table in one query Grouped records allow for all hospitals per measure instead of just one at a time Eliminated performance killing cursors A loop was still needed because SPs cannot be executed in a set Reduced loops from 2800 (70*40) to just 40! 34 | 03/11/2017 SQL Saturday #600 – Chicago 2017 In closing… Procedural programming is not set-based Intended for greater control over processing More efficient with loops, recursion, and string-based manipulation Better with small amounts of data at a time Relational database engines are set-based More efficient when processing large data sets Can generate dynamic and efficient execution plans Outperforms procedural processing with large data sets Unfortunately, not everything can be done as a set Use the right tool for the right job! 35 | 03/11/2017 SQL Saturday #600 – Chicago 2017 In closing… Change your thinking to set-based operations Consider set-based operations in your logic Review cursors and loops and look for set-based methods Give up mechanical control for greater overall control Work WITH the engine – not against it Yoda: “Unlearn what you have learned you must!" 36 | 03/11/2017 SQL Saturday #600 – Chicago 2017 In closing… Barely scratched the surface Simple examples shown here (only two simple sets) Much more complex operations are typical Multiple tables in joins of various types Complex relationship criteria A whole course could be given on this subject alone An additional set of courses could be given on transforming procedural processes into set-based processes 37 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Useful resources Understanding set-based logic http://www.codinghorror.com/blog/2007/10/a-visualexplanation-of-sql-joins.html http://sbuweb.tcu.edu/bjones/20263/Access/AC101_F undamentalsDB_Design.pdf http://sqlmag.com/t-sql/t-sql-foundations-thinking-sets http://weblogs.sqlteam.com/jeffs/archive/2007/04/30/th inking-set-based-or-not.aspx 38 | 03/11/2017 SQL Saturday #600 – Chicago 2017 More useful resources Converting procedural code to set-based http://sqlmag.com/t-sql/programming-sql-set-basedway http://www.exacthelp.com/2013/02/write-sql-queriesin-set-based-approach.html https://www.simple-talk.com/sql/performance/writingefficient-sql-set-based-speed-phreakery/ 39 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Visit the Sponsors! 40 | 03/11/2017 SQL Saturday #600 – Chicago 2017 Questions & Comments BONUS: A TON of free eBooks can be found from Microsoft and RedGate! PRESENTATION FEEDBACK: • Your thoughts needed • Improve presentations • Make this event even more valuable!!! Aaron N. Cutshall @ancutshall aaron.n.cutshall http://www.linkedin.com/in/aaroncutshall [email protected] 41 | 03/11/2017 SQL Saturday #600 – Chicago 2017