Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
How In-Memory Affects Database Design Louis Davidson Certified Nerd drsql.org 1 Who am I? • • • • drsql.org Been in IT for over 19 years Microsoft MVP For 10 Years Corporate Data Architect Written five books on database design • Ok, so they were all versions of the same book. They at least had slightly different titles each time • Basically: I love Database Design, and In-Memory technologies are changing the game 2 Contact info • Louis Davidson - [email protected] • Website – http://drsql.org <-- Get slides here • Twitter – http://twitter.com/drsql(@drsql) • SQL Blog http://sqlblog.com/blogs/louis_davidson • Simple Talk Blog – What Counts for a DBA http://www.simple-talk.com/community/blogs/drsql/default.aspx drsql.org 33 Questions are Welcome • Please limit questions to one’s I know the answer to. drsql.org 4 A tasty allegory… • Bacon is awesome • Bacon is an extremely powerful tool for rapid fat and calorie intake • Even bacon isn't good for everything drsql.org http://www.lazygamer.net/general-news/diablo-iii-players-burned-off-820-968-kgs-of-bacon/ https://www.flickr.com/photos/runnerone/6232183896/in/photostream/ 5 The process I went through • Start with basic requirements – Sales system – Stream of customer and order data – Apply In-Memory OLTP to see how it changed things – Keep it very simple • Learn a lot – This presentation was borne out of what I learned from that process (and Kalen Delaney’s precon, whitepaper, and other reading that is linked throughout the slides) drsql.org • Build a test and apply what I have learned and morph until I get to what works • Build something real in my day job, if applicable 66 Introduction: What exactly is In-Memory OLTP in SQL Server 2014? • A totally new, revamped engine for data storage, co-located in the same database with the existing engine – Obviously Enterprise Only… • Purpose built for certain scenarios • Terminology can be confusing –Existing tables: Home - On-Disk, but ideally cached In-Memory –In-Memory tables: Home - In-Memory: but backed up by On-Disk Structures drsql.org • If you have enough RAM, On-Disk tables are also in memory –But the implementation is very very different • In-Memory is both very easy, and very difficult to use 8 Design Basics (And no, I am not stalling for time due to lack of material) • Designing and Coding is Like the Chicken and the Egg I was first As if… – Design is what you do before coding – Coding patterns can greatly affect design – Engine implementation can greatly affect design and coding patterns – Developing software follows a natural process Children Relics 9 drsql.org • We will discuss how In-Memory technologies affect the entire design/development lifecycle Design Basics - Separate your design mind into three phases 1. Logical (Overall data requirements in a data model format) 2. Physical Implementation Choice A. Type of database system: Paper, Excel, Access, SQL Server, NoSQL, etc B. Engine choices: In-Memory, On-Disk, Compression, Partitioning, etc Note: Bad choices usually involve pointy hair and a magazine article with very little thinking and testing 3. Physical (Relational Code) drsql.org • Before the engine choice I always suggested 3 before 2 • We will look at each of these phases and how in-mem may affect your design of each output 10 Logical Design (Though Not Everyone’s Is) • This is the easiest part of the presentation • You still need to model –Entities and Attributes –Uniqueness Conditions –General Predicates • As I see it, nothing changes… drsql.org 11 Logical Data Model drsql.org 12 Physical Implementation Overview Client App TDS Handler and Session Management No improvements in communication stack, parameter passing, result set generation 10-30x more efficient (Real Apps see 2-30x) Reduced log bandwidth & contention. Log latency remains Natively Compiled SPs and Schema Engine for Memory_optimized Tables & Indexes Query Interop Proc/Plan cache for ad-hoc T-SQL and SPs Interpreter for TSQL, query plans, expressions Access Methods Existing SQL Component Hekaton Component Buffer Pool for Tables & Indexes SQL Server.exe Memory-optimized Table Filegroup Transaction Log Data Filegroup http://download.microsoft.com/documents/hk/technet/techdays2014/Day2/Session2/DBI394-SQL%20Server%202014%20In-Memory%20OLTP%20-%20Depp%20Dive.pdf drsql.org Checkpoints are background sequential IO Hekaton Compiler Parser, Catalog, Algebrizer, Optimizer Key Physical Implementation (Technically it’s all software!) • Everything is different, and I am going to give just an overview of physical details… • In-Mem data structures coexist in the database alongside On-Disk ones • Data is housed in RAM, and backed up in Delta Files and Transaction Logs –Delta files are stored as filestream storage –The transaction log is the same one as you are used to (with lighter utilization) drsql.org • Tables and Indexes are extremely coupled • MVCC (Multi-Valued Concurrency Control) used for all isolation 14 Physical Design (No, let’s not get physical) • Your physical design will almost certainly need to be affected • So much changes, even just changing the internal table structure • In this section, we will discuss: – Creating storage objects • Table Creation • Index Creation (which is technically part of the table creation) • Altering a Table’s Structure – Accessing (Modifying/Creating) data drsql.org • Using Normal T-SQL (Interop) • Using Compiled Code (Native) • Using a Hybrid Approach • No Locks, No Latches, No Waiting 15 Creating Storage Objects - Tables • The syntax is the same as on-disk, with a few additional settings • You have a durability choices – Individual In-Mem Table: Schema_Only or Schema_and_Data – Database level for transactions: Delayed (also for on-disk tables) • Basically Asynchronous Log Writes • Aaron Bertrand has a great article on this here: http://sqlperformance.com/2014/04/io-subsystem/delayed-durability-insql-server-2014 • You also have less to work with... – Rowsize limited to 8060 bytes (Enforced at Create Time) • Not all datatypes allowed (LOB types,CLR,sql_variant, datetimeoffset, rowversion) – No check constraints – No foreign keys – Limited unique constraints (just one unique index per table) • drsql.org • Every durable (Schema_and_Data) table must have a primary key Note: There are memory optimized temporary tables too: See Kendra Little’s article here: http://www.brentozar.com/archive/2014/04/table-variables-good-temp-tables-sql-2014/ 16 Data quality…What if? Hello, my name is Fred Smith Hello, my name is Fred Smith Hey, we are the same person, why do I have two customer numbers? drsql.org 17 Data quality…What if? drsql.org If Train A is given access to Location L on Track 1 at 11:30 AM, and Train B is given access to the same Location at the same time going in a different direction. How many years after you get out of prison will you feel bad about killing multiple people because you didn’t think about the effect of concurrency on data quality? Dealing with Un-Supported Datatypes… drsql.org • Say you have a table with 10 columns, but 1 is not allowed in a In-Memory table • First: Ask yourself if the table really fits the criteria we aren’t done covering • Second: If so, consider vertically partitioning • CREATE TABLE In_Mem (KeyValue, Column1, Column2, Column3) CREATE TABLE On_Disk (KeyValue, Column4) • It is likely that uses of disallowed types wouldn’t be good for the OLTP aspects of the table in any case. 19 Creating Storage Objects - Index creation • Syntax is inline with CREATE TABLE • Indexes are linked directly to the table – 8 indexes max per table due to internals – Only one unique index allowed (the primary key) – Indexes are never persisted, but are rebuilt on restart • String index columns must be a binary collation (case AND accent sensitive) • Cannot index nullable column • Two types – Hash • Ideal for single row lookups • Fixed size, you choose the number of hash buckets (approx 1-2 * # of unique values http://msdn.microsoft.com/enus/library/dn494956.aspx) – Bw Tree drsql.org • Best for range searches • Very similar to a BTree index as you (hopefully) know it, but optimized for MVCC and pointer connection to table 20 A Taste of the Physical Structures • Basic data record for a row Record Header Data For Columns (Payload) • Record Header Begin Timestamp End Timestamp StatementId IndexCount IndexPointers ... 8 3 2 1 drsql.org 21 21 Hash Index - Simplified TableNameId Country OtherColumns 1 USA Values 2 USA Values 3 Canada Values Identity Column Country 1 1 2 0 Φ 1 USA 2 3 3 4 4 5 0 Φ Φ 2 USA 0 Φ Φ 3 Canada 5 6 7 drsql.org 8 9 10 22 22 Hash Index - Simplified TableNameId Country OtherColumns 1 USA Values 2 Canada Values 3 Canada Values Identity Column 1 Country 0 Φ 1 USA 2 2 3 1 3 0 100 Φ 2 USA 4 4 5 5 6 100 Φ Φ 2 Canada Φ 3 Canada 7 drsql.org 8 9 0 10 23 23 Bw Tree Index – Even More Simplified Page 0 Page Mapping Table C R Z Non-Leaf Pages By Page ID Page 0 Page 1 Page 1 A B Page 2 C D Page 3 G J S T Z Leaf Pages Page 2 Page 3 0 0 Φ OtherVals B DifferentRow B JustDifferent Data Pages 0 Φ D AnotherRow drsql.org 240 240 B 24 24 Do you want to know more? • For more in-depth coverage – check Kalen Delaney's white paper ... http://t.co/T6zToWc6y6 – Or for an even deeper (nerdier?) versions: “Hekaton: SQL Server’s Memory-Optimized OLTP Engine” http://research.microsoft.com/apps/pubs/default.aspx?id=193594 or The Bw-Tree: A B-tree for New Hardware Platforms (http://research.microsoft.com/pubs/178758/bw-tree-icde2013-final.pdf) – Books Online: http://technet.microsoft.com/en-us/library/dn133186.aspx drsql.org – TechDays Presentation: http://download.microsoft.com/documents/hk/technet/techdays2014/Day2/ Session2/DBI394-SQL%20Server%202014%20In-Memory%20OLTP%20%20Depp%20Dive.pdf – Buy Kalen Delaney’s Ebook: http://www.amazon.com/gp/product/B00QMWX8PO/ref=docs-os-doi_0 25 Creating Storage Objects - Altering a Table ALTER drsql.org • The is the second easiest slide in the deck (to write!) • No alterations allowed Strictly Drop and Recreate • Add Column: Drop and Recreate Table • Add Index: Drop and Recreate Table • Rebuild Index: Drop and Recreate Table 26 drsql.org DEMO IN SLIDES – PREPARING TO (AND ACTUALLY) CREATING TABLES 27 Setting the Database To Allow In-Mem drsql.org CREATE DATABASE HowInMemObjectsAffectDesign ON PRIMARY ( NAME = N'HowInMemObjectsAffectDesign', FILENAME = Add a filegroup to hold the N‘Drive:\HowInMemObjectsAffectDesign.mdf' , SIZE = 2GB , delta files MAXSIZE = UNLIMITED, FILEGROWTH = 10% ), FILEGROUP [MemoryOptimizedFG] CONTAINS MEMORY_OPTIMIZED_DATA ( NAME = N'HowInMemObjectsAffectDesign_inmemFiles', FILENAME = N'Drive:\InMemfiles' , MAXSIZE = UNLIMITED) LOG ON ( NAME = N'HowInMemObjectsAffectDesign_log', FILENAME = N'Drive:\HowInMemObjectsAffectDesign_log.ldf' , SIZE = 1GB , MAXSIZE = 2GB , FILEGROWTH = 10%); GO 28 28 Creating a Memory Optimized Permanent Table CREATE TABLE [Customers].[Customer] ( [CustomerId] integer NOT NULL IDENTITY ( 1,1 ) , [CustomerNumber] char(10) COLLATE Latin1_General_100_BIN2 NOT NULL, CONSTRAINT [XPKCustomer] PRIMARY KEY NONCLUSTERED HASH ( [CustomerId]) WITH ( BUCKET_COUNT = 50000), INDEX [CustomerNumber] NONCLUSTERED ( [CustomerNumber]) drsql.org ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA) go 29 29 Creating a Memory Optimized Permanent Table CREATE TABLE [Customers].[Customer] Character column must be ( binary to index/compare in Hash Index used for Primary native code [CustomerId] integer NOT NULL IDENTITY ( 1,1 ) , Key. Estimated Rows in Table 25000-50000 [CustomerNumber] char(10) COLLATE Latin1_General_100_BIN2 NOT NULL, CONSTRAINT [XPKCustomer] Bw Tree Index PRIMARY on Customer KEY NONCLUSTERED Number HASH ( [CustomerId]) WITH ( BUCKET_COUNT = 50000), This table is memory optimized This table is as durable as the (ok, that was kind of obvious) database settings allow INDEX [CustomerNumber] NONCLUSTERED ( [CustomerNumber]) drsql.org ) WITH ( MEMORY_OPTIMIZED = ON , DURABILITY = SCHEMA_AND_DATA) go 30 30 Accessing the Data - Using Normal T-SQL (Interop) • Using typical interpreted T-SQL • Most T-SQL will work with no change (you may need to add isolation level hints, particularly in explicit transaction) • A few Exceptions that will not work –TRUNCATE TABLE - This one is really annoying :) –MERGE (In-Mem table cannot be the target) –Cross Database Transactions (other than tempdb) –Locking Hints drsql.org 31 Accessing the Data using Compiled Code (Native) • Instead of being interpreted, the stored procedure is compiled to machine code • Limited syntax (Like programming with both hands tied behind your back) • Allowed syntax is listed in what is available, not what isn't – http://msdn.microsoft.com/en-us/library/dn452279.aspx • Some really extremely annoying ones: drsql.org – SUBSTRING supported; LEFT, RIGHT, not so much – No Subqueries – OR, NOT, IN, not supported in WHERE clause – Can’t use on-disk objects (tables, sequences, views, etc) – String Comparisons must be with columns of Binary Collation • So you may have to write some "interesting" code 32 DEMO IN SLIDES – NATIVE STORED PROCEDURE drsql.org 33 Creating a Natively Optimized (I write my C# the new fashioned way, with T-SQL) CREATE PROCEDURE Customers.Customer$CreateAndReturn Works just like for views and There is no Ownership @Parameter1 Parameter1Type =functions. 'defaultValue1', Can’t change the chaining. All code executes as @Parameter2 Parameter2Type =underlying 'defaultValue2', object while this the procedure owner object references it … Alert parser that this will be a @ParameterN ParameterNType = 'defaultValueN‘ natively compiled object WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER AS BEGIN ATOMIC WITH ( TRANSACTION ISOLATION LEVEL = SNAPSHOT, LANGUAGE = N'us_english' ) <code> drsql.org END Procedures are atomic transactions 34 Accessing Data Using a Hybrid Approach • Native code is very fast but very limited • Use Native code where it makes sense, and not where it doesn’t • Example: Creating a sequential value drsql.org –In the demo code I started out by using RAND() to create CustomerNumbers and SalesOrderNumbers. –Using a SEQUENCE is far more straightforward –So I made one Interpreted procedure that uses the SEQUENCE outside of native code, then calls the native procedure 35 Accessing the Data - No Locks, No Latches, No Waiting • On-Disk Structures use Latches and Locks to implement isolation • In-Mem use Optimistic-MVCC • You have 3 Isolation Levels: –SNAPSHOT, REPEATABLE READ, SERIALIZABLE –Evaluated before, or when the transaction is committed –This makes data integrity checking "interesting" • Essential difference, your code now must handle errors drsql.org 36 Concurrency is the #1 difference you will deal with • Scenario1: 2 Connections - Update Every Row In 1 Million Rows • Any Isolation Level • On-Disk –Either: 1 connection blocks the other –Or: Deadlock • In-Mem drsql.org –One connection will fail, saying: “the row you are trying to update has been updated since this transaction started” EVEN if it never commits. 37 Another slide on Concurrency (Because if I had presented it concurrently with the other one, you wouldn’t have liked that) • Scenario2: 1 Connection Updates All Rows, Another Reads All Rows (In an explicit transaction) • On-Disk –Either: 1 connection blocks the other –Or: Deadlock • In-Mem drsql.org –Both Queries Execute Immediately –In SNAPSHOT ISOLATION the reader will always succeed –In REPEATABLE READ or SERIALIZABLE • Commits transaction BEFORE updater commits: Success • Commits transaction AFTER updater commits: Fails 38 The Difficulty of Data Integrity • With on-disk structures, we used constraints for most issues (Uniqueness, Foreign Key, Simple Predicates) • With in-memory code, we have to implement in stored procedures –Uniqueness on > 1 column set suffers from timing (If N connections are inserting the same data...MVCC will let them) –Foreign Key can't reliably be done because: drsql.org • In Snapshot Isolation Level, the row may have been deleted while you check • In Higher Levels, the transaction will fail if the row has been updated –Check constraint style work can be done in stored procedures for the most part. 39 Problem: How to Implement Uniqueness on > 1 Column Set: INDEXED VIEW? • CREATE VIEW Customers.Customers$UniquenessEnforcement WITH SCHEMABINDING AS SELECT customerId, emailAddress, customerNumber FROM customers.Customer GO • CREATE UNIQUE CLUSTERED INDEX emailAddress ON Customers.Customers$UniquenessEnforcement (emailAddress) GO drsql.org • Msg 10794, Level 16, State 12, Line 8 The operation 'CREATE INDEX' is not supported with memory optimized tables. 40 Problem: How to Implement Uniqueness on > 1 Column Set: Multiple Tables? drsql.org • Wow, that seems messy… And what about duplicate customerId values in the two subordinate tables? 41 Problem: How to Implement Uniqueness on > 1 Column Set: Simple code • You can’t…exactly. But what if EVERY caller has to go through the following block: • DECLARE @CustomerId INT SELECT @CustomerId = CustomerId FROM Customers.Customer WHERE EmailAddress = @EmailAddress drsql.org IF @customerId is null… Do your insert • This will stop MOST duplication, but not all. Two inserters can check at the same time, and with no blocks, app locks, or constraints even available, you may get duplicates. • Remember the term: Optimistic Concurrency Control 42 When Should You Make Tables In-Memory Microsoft's Advice • From http://msdn.microsoft.com/en-us/library/dn133186.aspx Implementation Scenario Benefits of In-Memory OLTP High data insertion rate from multiple concurrent connections. Primarily append-only store. Unable to keep up with the insert workload. Eliminate contention. Reduce logging. Read performance and scale with periodic batch inserts and updates. High performance read operations, especially when each server request has multiple read operations to perform. Unable to meet scale-up requirements. Eliminate contention when new data arrives. Lower latency data retrieval. Minimize code execution time. Intensive business logic processing in the database server. Insert, update, and delete workload. Intensive computation inside stored procedures. Read and write contention. Eliminate contention. Minimize code execution time for reduced latency and improved throughput. Low latency. Require low latency business transactions which typical database solutions cannot achieve. Eliminate contention. Minimize code execution time. Low latency code execution. Efficient data retrieval. Session state management. Eliminate contention. Frequent insert, update and point lookups. Efficient data retrieval. High scale load from numerous stateless Optional IO reduction or removal, when web servers. using non-durable tables drsql.org Implementation Scenario 43 When Should You Make Tables In-Memory Louis's Advice • More or less the same as Microsoft's really (duh!) • Things to factor in –High concurrency needs/Low chance of collisions –Minimal uniqueness protection requirements –Minimal data integrity concerns (minimal key update/deletes) –Limited searching of data (binary comparisons only) –Limited need for transaction isolation/Short transactions –You are able to answer all “What If?” scenarios successfully. drsql.org • Basically, the “hot” tables in a strict OLTP workload... • NOT a way to “FIX” bad code… Not at all… 44 The Choices I made drsql.org Louis has improved his methods for estimating performance, but your mileage will still vary. • Louis’ tests are designed to reflect only one certain usage conditions and user behavior, but several factors may affect your mileage significantly: • How & Where You Put Your Logs • Computer Condition & Maintenance • CPU Variations • Programmer Coding Variations • Hard Disk Break In • Therefore, Louis’ performance ratings are a minimally useful tool for comparing the performance of different strategies but may not accurately predict the average performance you will get. • I seriously suggest you test the heck out of the technologies yourself using my code, your code, and anyone else’s code you can to make sure you are getting the best performance possible. Model Choices – Logical Model drsql.org 46 Model Choices – Physical Model drsql.org 47 Model Choices – Tables to Make In-Mem (First Try) drsql.org 48 Model Choices – Tables to Make In-Mem (Final) drsql.org The Grand Illusion (So you think your life is complete confusion) • Performance gains are not exactly what you may expect, even when they are massive • In my examples (which you have seen), I discovered when loading 20000 rows (10 connections of 2000 each) – (Captured using Adam Machanic's http://www.datamanipulation.net/SQLQueryStress/ tool) A. On-Disk Tables with FK and Instead Of Trigger - 0.0472 seconds per row - Total Time – 1:12 B. On-Disk Tables withOUT FK or Instead Of Trigger - 0.0271 seconds per row - Total Time – 0:51 C. In-Mem Tables using Interop code - 0.0202 seconds per row - Total Time 0:44 D. In-Mem Tables with Native Code - 0.0050 second per row - Total Time – 0:31 E. In-Mem Tables, Native Code, SCHEMA_ONLY – 0.0003 seconds per row - Total Time – 00:30 F. In-Mem Tables (except CustomerAddress), Hybrid code – 0.0163 – Total Time – 0:42 drsql.org • But should it be a lot better? Don't forget the overhead... (And SQLQueryStress has extra for gathering stats) Contact info • Louis Davidson - [email protected] • Website – http://drsql.org <-- Get slides here • Twitter – http://twitter.com/drsql (@drsql) • SQL Blog http://sqlblog.com/blogs/louis_davidson • Simple Talk Blog – What Counts for a DBA http://www.simple-talk.com/community/blogs/drsql/default.aspx drsql.org 51 51 Demo As Much Code Review As We Have Time For! drsql.org