* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download SQL Server Database Files
Survey
Document related concepts
Tandem Computers wikipedia , lookup
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Team Foundation Server wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Transcript
Instructor: Craig Duckett Lecture 17: Thursday, May 25th, 2017 SQL Server Keys, Files, Functional Concepts and Architecture, Database Files 1 Today will be the last lecture in BIT275. Starting Tuesday, May 30th, class time will be dedicated to Assignment 3 Workshops. I will be supplying templates for building tables, inserting data, creating views, and creating stored procedures that you can use to construct your individual database projects. I will also be offering one-on-one help for those who feel they need additional help in order to complete their project. If you are comfortable with your work on Assignment 3, then the next two BIT275 class days are optional (Tuesday, May 30th and Thursday, June 1st). Both ASSIGNMENT 3 and the DATABASE PRESENTATION are due on Tuesday, June 6th • Assignment 3 is due LECTURE 20, Tuesday, June 6th • Database Presentation is due LECTURE 20, Tuesday, June 6th • Final Exam is LECTURE 12, Thursday, June 8th 2 3 x 150 Points (450 points Total) • Assignment 1 GRADED! • Assignment 2 GRADED! • Assignment 3 (Stage 3): DUE LECTURE 20 Tuesday, June 6th • Database Presentation: DUE LECTURE 20 Tuesday, June 6th 3 Tuesday/Thursday (LECTURE 16) • Database Design for Mere Mortals: Chapters 12, 13 4 • • • • SQL Server Keys SQL Server Database Files Transactions and Locks (Separate PowerPoint) Triggers (Separate PowerPoint) FOR REFERENCE ONLY • Functional Concepts and Architecture • Database Files 5 SQL Server Primary and Foreign Keys An Overview Setting Up Your Primary Key (PK) 1. 2. 3. 4. From the main window, select your database Double-click on Tables to expand Select table of your choice, right-click on it, and select Design Find the Column Name that you want to make the Primary Key, right-click on it a select Set Primary Key 5. This creates the Primary Key and places a yellow key icon beside it in the Design table ERROR: Setting Up Your Primary Key (PK) From Allowed NULL to NOT Allowed NULL From the main window, select Tools > Options > Designers > Table and Database Designers • Uncheck the option Prevent saving changes that require table re-creation • OK Add AUTO INCREMENT to the Primary Key 1. In the Designer section, highlight the Primary Key row (by clicking on the yellow Primary Key icon) 2. In the Column Properties tab, scroll down to Identity Specification, and in the dropdown menu beside (Is Identity) change this from No to Yes 3. You can go with the default values of '1' for Increment and Seed or you can change these to your specifications Inserting Data into a Table (See the AI at Work) 1. Right-click on table of your choice, and select Edit Top 200 Rows Adding Foreign Keys (FK) NOTE: Before adding FOREIGN KEYS you must already have all your PRIMARY KEYS set and configured in your pertinent tables 1. 2. 3. 4. 5. Open your database, and expand your Tables directory tree Right-click on the table that is going to contain the Foreign Key(s), and select Design Right-click on the blank white area and select Relationships In the Foreign Key Relationships window, click Add Click Tables And Columns Specification, then the 'three ellipsis' button […] Adding Foreign Keys (FK) 6. Select the Primary Key Table and field that has been designated the primary Key, and then select the Foreign Key Table and the key that will be the associated Foreign Key, then OK 7. Save, and click by the warning about saving Adding Foreign Keys (FK) 8. I repeat the process for my second Foreign Key field, this time pointing to the pertinent Primary Key tables, fields, Foreign Key tables, fields, etc, then saving Adding Foreign Keys (FK) 9. Back in the main window, I select my database, then right-click on Database Diagrams > New Database Diagram 10. I highlight the tables I want to add, then Add Adding Foreign Keys (FK) 11. The Database Diagram comes up showing the selected tables with their associated relationships using the Primary and Foreign Keys 12. At this point you have the option of saving the diagram (I usually save mine, because I can always delete them later as I build complexity into the database). Adding Foreign Keys (FK) 13. Back in the main window, you can expand all your tables and Key folders to see the Primary Keys. To see the Foreign Keys on the linking table, you will have to expand the Columns folder. SQL Server Database Files SQL Server Database Files SQL Server System Databases • • • • master model msdb tempdb SQL Server System Databases are very simple and pretty small databases (msdb might be huge in some cases if we are not properly maintaining it) at a very high level, but they are also most important databases within SQL Server. Why am I stressing 'the most important'? Because SQL Server stores the majority of its configuration information in these databases, besides storing some in the Registry and few in configuration(ini/xml) files, and SQL Server needs them to coexist in a healthy and happy shape in order to remain in a running state. So what are the System Databases in SQL Server? http://www.youtube.com/watch?v=Wjv1c1lgitY SQL Server Database Files MASTER MSDN Information about Master The master database stores all the systemlevel information for SQL Server. The data stored by the master database includes information for: configuration settings, logon accounts, linked servers and endpoints, user database file locations, and properties. The master database is the most important database on SQL Server, because due to the nature of the data stored, SQL Server cannot operate without the master database. So it is a very good idea to backup this database after changing the SQL Server configuration, modifying, adding, or removing any databases. Examples: Tables > System Tables Views > select * from sys.databases Programmability > sp_attach_db SQL Server Database Files MSDB MSDN Information About MSDB The msdb database is used by SQL Server to store information on operations performed by SQL Server. This includes information for: the SQL Server Agent, Database Mail, the Service Broker, SSIS packages, log shipping, backup, and maintenance plan job parameters. Examples: Tables > System Tables > dbo.sysjobs - SQL Server Agent - SSIS Packages (SS Integration Services for crunching/manipulating data) Views > dbo. sysjobs_view Programmability > Stored Procedures SQL Server Database Files TEMPDB MSDN Information About TEMPDB As the name implies, SQL Server uses the tempdb database for storing temporary data and data objects. The tempdb database is used when an operation requires a temporary table, stored procedure, or other database object to be performed. Intermediary data for large sort operations is also stored in the tempdb database as well as temporary data for internal SQL Server operations. It is where instances of database queries are cached. Every time SQL Server is restarted, the tempdb system database is recreated thus clearing any temporary data stored during the last SQL Server session. In cases where a high volume of users and operations are performed with SQL Server the tempdb database can grow to use a significantly large amount of disk space. It is important to plan accordingly in these scenarios since running out of disk space where the tempdb database is stored will have catastrophic effects on the operation of SQL Server. DBAs will often store the tempdb on a completely different drive SQL Server Database Files MODEL MSDN Information About MODEL SQL Server uses the model database for creating new databases. Simply put, this is a template. When the “create database” statement is used, SQL Server copies the contents of the model database to the newly created database. If there are any common database objects that would prove useful in all databases created by SQL Server, it is possible to add those objects to the model database. Then when a database is created by the SQL Server instance, the user defined objects will be copied to it along with the default objects. Since SQL Server recreates the tempdb database every time it is started, the model database is required in order for SQL Server to start. Demo: Create 'Awesome' Table in Model Create instance of a new Database SQL Server Database Files RESOURCE MSDN Information About RESOURCE The resource db is a "read-only" database used for storing all the system views and stored procedures. Logically, each SQL Server database will contain all these system objects, however, they are physically stored within the resource database. The resource database is read-only and does not include any user data. In previous versions of SQL Server, the system objects were stored in the master database. The motivation behind moving the objects to a separate database is to make updating the SQL Server more efficient. Improvements and fix-ups to the SQL Server system generally manifest mostly on the system objects. A separate database to store the system objects reduces the number of files that need to be replaced with an update. LOCATION: C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\Binn FYI: It is recommends that you don't move the resource databases to a separate directory or drive but to leave 'as is' SQL Server Books Online MSDN Information About SQL SERVER BOOKS ONLINE Accessing SQL Server Books Online You can access SQL Server Books Online in the following ways: From the Start menu: Click Start > All Programs > Microsoft SQL Server 2008 R2 > Documentation and Tutorials > SQL Server Books Online From SQL Server Management Studio: On the Help menu, click How Do I, Search, Contents, Index, or Help Favorites. From F1 or Help buttons in the user interface - For context-sensitive information, press F1 on your keyboard, or click Help in the user interface dialog box. From the Dynamic Help window - The Dynamic Help window automatically displays links to Books Online topics related to the task you are doing. To launch Dynamic Help, click Dynamic Help on the Help menu in SQL Server Management Studio or Business Intelligence Development Studio. SQL Server Tutorials Accessing SQL Server Tutorials MSDN Information About SQL Server Tutorials From the Start menu: Click Start > All Programs > Microsoft SQL Server 2008 R2 > Documentation and Tutorials > SQL Server Tutorials Tutorial: SQL Server Management Studio Tutorial: Writing Transact-SQL Statements Computed | Calculated Columns Computed (Calculated) Columns A computed column is a column that contains values that are calculated, rather than inserted. When the column is defined, a computed column specification is provided. This definition includes the expression that is used to determine the value of the column in each row of the table. It is not possible to specify the values for these columns in INSERT or SELECT statements. Adding a Computed Column 1. Select a database table that might benefit from having a calculated column http://www.blackwasp.co.uk/SQLComputedColumns.aspx Computed (Calculated) Columns 2. Right-click on the table, select Design, and a create a column and data type that will be used for the computed or calculated column 3. While having the focus, go down into Column Properties and select the Computed Column Specification, click down in the (Formula) section, and type in your calculation Computed (Calculated) Columns 4. Now when I examine the table after right-clicking and invoking Select Top 1000 Rows, I see the calculated column Total Computed (Calculated) Columns MSDN Information: Specify Computed Columns in a Table Functional Concepts and Architecture The Following Slides Have Been Included For Reference Only Functional Concepts and Architecture SQL – Structure Query Language Most vendors, back in the day, began to adopt proprietary versions of that although they had a lot of overlap of certain types of language elements, specifically the language elements intended for DML – Data Manipulation Language ANSI SQL DML tended to be pretty much the same across different vendors with just really minor variations. To help that cleaner across the industry ANSI SQL was developed and there have been 2 or 3 revisions of this over the years. Now, for the majority of its Data Manipulation Language, SQL SERVER 2008 fully complies with the ANSI SQL standard. Now Data Manipulation basically means: • Selecting data to a database (SELECT) • Adding data to database (INSERT) • Removing data from database (DELETE) • Changing data in database (UPDATE) This doesn't give you any management capability, though. It doesn't let you create new databases, change or check performance, optimize data or databases, etc. ANSI = American National Standards Institute ISO = International Standards Organization Functional Concepts and Architecture DDL – Data Definition Language Because every DBMS works a little bit differently (MySQL, SQL Server, Oracle, DB2, etc) the DDL really can't be standardized, so each vendor comes up with its own variant (or "dialect") of the SQL language. CREATE | DROP | ALTER Microsoft's superset of the SQL language, or variant, is called Transact-SQL (or T-SQL, for short), and part of that does comply with the ANSI SQL standard for data manipulation language, Example: SELECT ID, Name, Address FROM Customers WHERE state = 'Nevada'; DELETE from Customers WHERE ID = 5; It's when we move over to the DDL side of things that the variants or "dialects" between the different DMBS really come into play. The way you use CREATE TABLE in MS SQL Server 2008 is a bit different than the way you'd CREATE TABLE in MySQL or PostgreSQL or DB2, etc. Functional Concepts and Architecture What is a Table? Tables can be thought of in either logical or physical terms. For example, when most DBAs think of a table, they see a two-dimensional set of values arranged into rows and columns, similar in appearance to a spreadsheet. Each column represents a different data element, and each row represents a collection of related data elements. For example, here is a representation of a logical table that has three columns and four rows. This should look very familiar to DBAs as this is how most of us visualize what a table in a database looks like. It also looks similar to the results we see when we perform a SELECT statement on a table within Management Studio. Functional Concepts and Architecture What is a Table? While a logical visualization of table is very handy for DBAs to use in their work, this is not how SQL Server sees a table. When a table is created, it has to be implemented physically inside a database. For example, the most basic structure is the database file (.mdf and .ndf files). The database file can include many objects, but for now, let's focus on tables. I'll talk about a third type of file, the transaction log file (.ldf) later on. Every database has one primary data file. Also, all the data in the database objects (tables, stored procedures, views, triggers.. etc.) are stored in the primary data files. The recommended and the most common file name extension for primary data files is .mdf You can only have one primary data file for a database, as well as secondary data files. Some databases may not have any secondary data file, but its also possible to have multiple secondary data files for a single database. Secondary data files are usually denoted with the .ndf extension. Log files in SQL Server databases hold all the log information. This information can be later used to recover the database. The size of the log file is determined by the logging level you have set up on the database. There must be at least one log file for each database. But it is also possible to have more than one log file for a single database. The recommended file name extension for log files is .ldf. Locations of all the files in a database (primary, secondary and log files) are stored in the primary file of the database and in the master database. When the SQL Server Database Engine want to use the file location information of those files, it retrieves the data from the master database. Functional Concepts and Architecture Each table in a database can be broken into multiple components. Let’s take a brief look at each of them so we better understand what makes up a physical table within SQL Server. Functional Concepts and Architecture The first component is a partition. A partition is simply a way DBAs can sub-divide rows in a table (and the table’s indexes) into separate sections, generally for the purpose of horizontally spreading rows across more than one filegroup. This often makes it easier for DBAs to manage and scale very large tables. The topic of partitioning is outside the scope of this class, so we'll assume there is a single partition (which is the default option when creating a table). Only the Enterprise and Developer editions of SQL Server support partitioning. Partitions contain data rows stored in either the physical form of a heap (a table without a clustered index) or a B-Tree structure (a table with a clustered index). We'll talk about the heap, b-tree structure, clustered and non-clustered indexes in a moment, but first we need to discuss how the data types are sub-divided into three different categories before they are stored in allocation units, comprised of pages. • In_Row_Data: This is the most common type of allocation unit, and is used to store data and index pages, except for Large Object (LOB) data. In other words, most of your row data (and related indexes) are stored in this type of allocation unit. • Lob_Data: Used to store Large Object (LOB) data, such as text, ntext, xml, image, varchar(max), nvarchar(max), varbinary(max), and Common Language Runtime (CLR) user-defined data types. • Row_Overflow_Data: Used to store data pages when the variable data types—varchar, nvarchar, varbinary, and sql_variant data columns—that exceed the 8,060 bytes that can fit onto a single data page. Functional Concepts and Architecture Allocation units themselves are made up of extents. In our example, we see that each allocation unit has one extent. In the real world, each allocation unit would include many, many extents. An extent is a collection of eight contiguous 8K pages (for a total of 64K). A page is the most elemental unit of data storage used by SQL Server, and will be the focus of our next discussion. Functional Concepts and Architecture Primary Storage Structure in MS SQL Server How does SQL Server store data? Logically, Data is stored in records, and records are stored in pages. A database page is exactly 8KB (8192-byte) in size, and acts as an individual piece of a database data file. If you have 1 MB worth of data, that data will be stored on 128 separate pages. These 8K pages aren't dedicated just for the data only; the first 96 bytes are used to store information about the page type, free space, object id, and location of the table that this page belongs to. This is the header. After this header information, the data itself comes in (in the form of data rows which are inserted one after another in a serial manner). This is the body. Following the page header, starting at byte 96 on the page, are the actual data rows. Each data row has a unique row number written within the page. Data rows in SQL Server cannot cross page boundaries. The maximum available space in a SQL Server page is 8096 bytes (8192 bytes minus the 96 byte header), but including 36 bytes for holding log information actually lowers the amount of data that can be held on the page down to 8060 bytes. http://sqlblog.com/blogs/linchi_shea/archive/2008/03/03/is-the-8kb-page-obsolete-or-aging.aspx Functional Concepts and Architecture As stated earlier, the limit on data row size does not take into account columns of text, ntext, or image data types because these data values are stored separately from the data row, as we'll look at a bit later on. Data rows are put on the page serially, starting immediately after the header. A row offset table starts at the end of the page, and each row offset table contains one entry for each row on the page. Each entry records how far the first byte of the row is from the start of the page. The entries in the row offset table are in reverse sequence from the sequence of the rows on the page. Every SQL Server page (8,192 bytes) begins with a 96-byte header used to store metadata about the page. This includes the page’s number, page type, amount of free space on the page, the allocation unit ID of the object that owns the page. Immediately following the page header is the data to be stored, one row after another. Each page can store a maximum of 8,060 bytes. The number of rows that can be stored on a page depends on the size of the rows. For example, the smaller the rows, the more rows that can fit on a single data page. When a page becomes full, any new data will be inserted in another data page. In the real world, each row may be a different length, which means that the beginning and ending of each row can be anywhere within the page. If rows are packed in, one after another, and if each row can vary in size, how does SQL Server know when a row starts and when it ends? That is what row offsets are used for. Every row in a page has a row offset, which indicates how far the first byte of the row is from the start of the page. Row offset data is stored at the bottom of a page in an area reserved for this information. For example, when the first row is added to a page, the location of the first byte of that row is stored in offset row one. When row two is added, the location of the first byte of the second row is stored in offset row two, and so on. This way, SQL Server knows where each row begins and ends. Functional Concepts and Architecture Large Row Support Rows cannot span pages, however portions of the row may be moved off the row's page so that the row can actually be very large. The maximum amount of data and overhead that is contained in a single row on a page is 8,060 bytes (8 KB). However, this does not include the data stored in the Text/Image page type. This restriction is relaxed for tables that contain varchar, nvarchar, varbinary, or sql_variant columns. When the total row size of all fixed and variable columns in a table exceeds the 8,060 byte limitation, SQL Server dynamically moves one or more variable length columns to pages in the ROW_OVERFLOW_DATA allocation unit, starting with the column with the largest width. This is done whenever an insert or update operation increases the total size of the row beyond the 8060 byte limit. When a column is moved to a page in the ROW_OVERFLOW_DATA allocation unit, a 24-byte pointer on the original page in the IN_ROW_DATA allocation unit is maintained. If a subsequent operation reduces the row size, SQL Server dynamically moves the columns back to the original data page. ROW_OVERFLOW_DATA We'll look how this relates to BLOBS and FILESTREAMS in a minute Functional Concepts and Architecture Extents Extents are the basic unit in which space is managed. An extent is eight physically contiguous pages, or 64 KB. This means SQL Server databases have 16 extents per megabyte. To make its space allocation efficient, SQL Server does not allocate whole extents to tables with small amounts of data. SQL Server has two types of extents: • Uniform extents are owned by a single object; all eight pages in the extent can only be used by • the owning object. Mixed extents are shared by up to eight objects. Each of the eight pages in the extent can be owned by a different object. A new table or index is generally allocated pages from mixed extents. When the table or index grows to the point that it has eight pages, it then switches to use uniform extents for subsequent allocations. If you create an index on an existing table that has enough rows to generate eight pages in the index, all allocations to the index are in uniform extents. Functional Concepts and Architecture Databases are optimized for structured data that is relatively small – about 8KB per row. At that rate, you can put 130,000 packed-out rows into a database and still have a database under 1 gigabyte. Generally speaking, smaller databases are easier to backup, restore, and maintain, so keeping them small is good. Documents, however, tend to be orders of magnitude larger than the average database row size and can quickly bloat the database; a content database upwards of 100 gigabytes is by no means uncommon and that size is comprised mostly by the physical files that reside inside the database. When your content database size starts to expand to the point of unmanageability, one of the first questions that may come to mind is why the heck are all those files in there in the first place? It makes perfect sense to store the structured document metadata like the author, location, keywords, document type, and file size directly in the database, for querying purposes, but the physical file itself is just taking up room until someone goes to download it. So the option to move files outside of the database and back into the file system, where they have traditionally resided, seems very sensible. Functional Concepts and Architecture What is a Blob? As much as you want your documents outside of the database, the folks who make databases really don't like them being in there either. Documents tend to represent unstructured data that makes it difficult to use them in querying, defining relationships, indexing, or any other database operation that normally makes putting data into a database useful. As far as the database is concerned, a document is just a bunch of bits that it has to hold onto until someone asks for them back. Logically, storing a document in a database row looks something like this Databases performance is optimized for rows that are under 8KB in size. If the actual implementation looked like this, that size would easily be breached and performance would lag. Functional Concepts and Architecture Fortunately, database developers came up with the concept of a Binary Large Object (BLOB). In essence, the BLOB is a pointer that stores the location of binary data. The pointer, which is relatively small, resides with the normal database record. The binary data, which is comparatively huge, resides in some dark, magical location in the database called BLOB storage. With the binary data offloaded to another location in the database, the database record becomes much smaller and more optimal for index and query performance. If we actually need the binary data associated with the row, the database uses the pointer in the row to acquire the binary data from BLOB storage. The process of going to BLOB storage is completely transparent, so it appears as though the data is coming directly from the row itself. There is a slight performance cost for going to BLOB storage for raw data retrieval, but the query performance benefits tend to outweigh the retrieval cost. Functional Concepts and Architecture What is Remote Blob Storage? BLOB storage is effectively abstracted from database users. You ask for it, the database goes out and gets it. This brings up a new question: why does the BLOB storage have to be in the database? In SQL Server 2005 it had to be in the database simply because that's how they built it. But in SQL 2008 and new newer versions, you've got a few more options because Microsoft implemented a FileStream provider model for the BLOB storage mechanism that opens up the possibility of storing it elsewhere. Remote BLOB Storage is the term used to denote that a BLOB provider stores BLOB data outside of the database (i.e. remotely). Functional Concepts and Architecture Where Can I Store Remote Blob Data? A marketing guy would eagerly tell you that you can store BLOBs anywhere you want, but the reality is that you can store it anywhere you want as long there is a provider available capable of storing it there. If you're really ambitious you can build your own (not recommended), but the majority of us are just going to use what comes out of the box, or whatever is available from third party providers. Out of the box you get two providers: • Native SQL BLOB Provider (Default) Stores BLOB data directly in the database, just like • SQL Server 2005. This is the default BLOB provider whenever you setup a new instance of SQL server. FileStream BLOB Provider This is the only Remote BLOB provider that ships with SQL Server and it allows you to store BLOB data on any local hard drive on the SQL server. http://www.ceservices.com/adding-filestream-existing-table-database-sql-2008-r2 Free PDF Book: http://download.red-gate.com/ebooks/SQL/Art_of_SS_Filestream_Sebastian_and_Aelterman.pdf Functional Concepts and Architecture Now that you have a basic understanding of how SQL Server stored data, it’s time to talk about the heaps and B-Trees structures that we talked about briefly a little earlier. What is a Heap? A heap is simply a table without a clustered index. (We will talk about clustered indexes a bit later too). When rows are added to a heap, they are not stored in any particular order on a data page, and data pages are not stored in any particular sequence within a database. In other words, rows are stored wherever there is room available. This means that the data pages that contain the rows of the heap may be scattered throughout a database, in no particular order. Since a table can’t exist as a bunch of scattered pages, SQL Server provides a way to link them all together so that they act as a single table. This is done using what are called Index Allocation Map (IAM) pages. IAM pages manage the space allocated to heaps (among other tasks), and is what is used to connect the scattered pages (and their rows) into a table. Functional Concepts and Architecture Functional Concepts and Architecture Let’s say that a query is executed against a heap, and a table scan has to be performed to find the data to be returned. In order to find the rows in the table that need to be scanned, the SQL Server Database Engine first goes to an IAM page, which includes pointers to the various extents and pages that contain the rows of the table that belong to the heap. If a heap is large, numerous IAM pages and data pages have to be examined before all the rows are located and scanned. If the data pages happen (by coincidence) to be physically contiguous, then such scans can be I/O efficient because sequential reads can be used to read the data. But in many cases, the various heap pages are scattered about, which requires less efficient random reads to scan through all the rows of the table, which can hurt performance. Functional Concepts and Architecture What is an Index? In the previous section we talked about heaps, which are tables without a clustered index. In this section, we will talk about what an index is. Later, we will talk about specific kinds of indexes that can be added to SQL Server tables. As I have already mentioned, an index is simply a way to help queries return data faster from tables. In SQL Server, all indexes physically take the form of what is called a B-Tree. The “B” in B-Tree refers to “balanced,” so B-Tree is short for Balanced Tree. This is because a B-Tree index self balances, which means that every time a tree branches (splits into two pages because the original page is full), about half of the data is left on the old page, and the other half is put on the new page. One of the key benefits of having a Balanced Tree is that finding any particular row in a table requires about the same amount of SQL Server resources because the index has the same depth (number of levels) throughout its structure. Functional Concepts and Architecture Notice that B-Trees have several levels. They include: Root Level: A B-Tree starts with a single index page called the root level. This is where a query begins to look for specific data within the B-Tree. In our example, our root level page only contains two values. Most root level pages have many values, each referring to a page in the intermediate level. Intermediate Level: Most B-Trees have one or more intermediate levels. The number of intermediate levels depends on the amount of data stored in the table being indexed. Our example has only one intermediate level, which includes two index pages. Leaf Level: A B-Tree has a single leaf level which may include many, many data pages, depending on the amount of the data stored as part of the leaf level. The leaf level is where the data you are looking for is stored. For example, in the leaf level page that starts with “1”, all the rows with a customer number that ranges from “1” through “2499” are located. On the leaf level page that begins with “2500”, all the rows that range from “2500” to “4999” are located, and so on. Functional Concepts and Architecture Here’s how a B-Tree index works. In a B-Tree index search, we always begin at the root level. For example, to find the rows that have a customer number of “4”, the first step is to scan the rows at the root level. In our example, there are two rows than can be scanned: a “1” and a “5000”. Each of these rows points to different pages in the intermediate level. The rows we are seeking have a customer number of “4”, but as you can see, there is no “4” at the root level. But since this index is in numerical order, we know that “4” follows “1”, but is less than “5000”. So, to find the rows that have a value of “4”, we must hop down (transverse) to the intermediate level to further narrow our search. As you can see in the example, the number “1” in the root level points to an index page in the intermediate level that also starts with the number “1”. Functional Concepts and Architecture When we go to the intermediate level and look at the index page that begins with the number “1”, notice that it only has two records: a “1” and a “2500”. Notice that the number “4” is still nowhere to be found. Again, we know that the number “4” is greater than 1, but less than “2500”, so we need to transverse to the leaf level and see if we can find it there. Functional Concepts and Architecture So the next step is to transverse the index to the leaf level page that begins with “1” and ends with “2499”, as we know that number “4” falls between these numbers. Once we are at the correct leaf level page, the rows in this page are examined until we find one or more records that have a customer number of “4”. While this seems like a long way around to find the rows that have a customer number of “4”, it is actually very short. Instead of having to potentially scanning through thousands and thousands of data pages looking for the rows to be returned, the query only had to scan through several different index pages until the rows that matched the WHERE clause of the query are found and returned. This is usually a much faster way to find the rows you want to return. Of course, this example was kept simple. In the real world, there may be more intermediate levels that need to be transversed, but using a B-Tree index to retrieve data is usually much faster than a scan of every row in a table. Now that you have a basic understanding of how B-Tree indexes work, we are ready to talk about the specific kinds of indexes available to SQL Server. Keep in mind that although each of the following indexes work slightly differently, that are all still B-Tree indexes, and work on the principles described above. Functional Concepts and Architecture Types of Indexes SQL Server includes many different types of B-Tree indexes, including: • • Clustered Non-clustered • • • • • Indexed view XML Filtered (New with SQL Server 2008) Spatial (New with SQL Server 2008) Compressed Indexes (New to SQL Server 2008) Today, I'll only be looking at Clustered and Non-Clustered B-Tree Indexes. Functional Concepts and Architecture Clustered Technically speaking, a clustered index is a B-Tree data structure where all the rows in a table are stored at the leaf level of the index. In other words, a clustered index not only includes the root level and intermediate level index pages found in B-Tree indexes, it also includes all the data pages that contain every row in the entire table. In addition, the rows in a clustered table are logically ordered by the key selected for the clustered index (unlike a heap whose rows and pages are unordered). Because of this, a table can only have one clustered index. Functional Concepts and Architecture A clustered index orders the data stored in it logically. Often, DBAs confuse logical ordering with physical ordering. For example, what if a new row is added between two other rows, and there is no room on the existing page for the new row? When SQL Server faces this problem, here’s what it does. Once SQL Server has determined more space is needed on a page, the page will be split. This means that about half of the rows stay on the existing page and about half of the rows are added to a new page. The physical order of the rows stays intact on the two pages, with the new row being place in the proper order on either the original or the new page. Think for a moment, if SQL Server required that all rows in a clustered index were to be physically contiguous, then as part of the page split, the new page would have to be physically inserted directly after the original page, and all of the following data pages would have to be shifted down one page. If there were 100,000 pages below the page that was split, that would mean that 100,000 pages would have to be physically moved, which would be hugely resource intensive and inefficient. Instead, what SQL Server does when a page split occurs, is to place the new page where there is room for it, which in many cases won’t be very close to the original page. In other words, the physical order of the pages aren’t maintained. But what is maintained is the logical ordering of the pages. For example, when a page is split between the original and the new page, they are logically connected using the page chain, which is used to enforce the logical ordering of the data. Functional Concepts and Architecture Non-Clustered Index Like a clustered index, a non-clustered index is a B-Tree data structure (although it is a separate data structure from the clustered index). The main difference between the two is that while a clustered index includes all of the data of a table at the leaf level, a non-clustered index does not. Instead, at the leaf level, a non-clustered index includes the key value and a bookmark (pointer) that tells SQL Server where to find the actual row in the clustered index. Because a non-clustered index doesn’t normally store the actual data at its leaf level, it is possible to have many non-clustered indexes on a table. For example, a table can have one clustered index and up to 249 non-clustered index. Let’s take a brief look at how a non-clustered index works. In this example, we are going expand upon our previous example of a Customer table that has three columns: Customer Number, First Name, and Last Name. As in our previous example, we are going to have a clustered index that uses the Customer Number as its key. In addition, we are going to add a non-clustered index on the Last Name column. Given this, let’s say we want to return all the rows (and all the row’s columns) in the Customer Table that have a last name of “Victoria”, and that we want to use the non-clustered index on the Last Name column to quickly find and return the data. Functional Concepts and Architecture xxx Functional Concepts and Architecture The first step in retrieving all of the rows with a Last Name of “Victoria” is to use the non-clustered index on the Last Name column to identify any rows that match. To find the rows, SQL Server begins at the root level of the non-clustered index, looking for any matching rows. The root page is scanned and two rows are found: an “A” and an “M”. Each of these rows points to different pages in the intermediate level. The first letter of “Victoria” is “V”, but there is no row in the root level page that begins with the letter “V”. But since this index is in alphabetical order, we know that “V” follows “M”. So, to find the letter “V”, we must transverse to the intermediate level to further narrow our search. As you can see, the letter “M” points to an index page in the intermediate level that starts with the letter “M”. When we go to the intermediate level and look at the index page that begins with the letter “M”, notice that it only has two records: an “M” and an “S”. Notice that the letter “V” is still nowhere to be found. Since we can’t find the letter “V” on this index page, we need to transverse to the leaf level and see if we can find it there. Once we get to the leaf level page that begins with “S”, we now identify all the rows on that page that begin with “V”. Each row that begins with “V” is then compared to the last name “Victoria”. In our case, there is a single match. The last name “Victoria” is found in the Last Name non-clustered index key. Now we know that a row does exist that meets our selection criteria, the problem is that all of the data for that row (Customer Number, First Name, and Last Name) aren’t stored at the leaf level. Instead, the only data stored at the leaf level is the key “Victoria” and a bookmark that points to the corresponding record in the table—which as we know—is a clustered index. In this example, the bookmark correspondents with the Customer Number “4”. Now that SQL Server knows that a row does exist, and that it exists in the clustered index as Customer Number 4, the actual contents of the row (the three columns) must be retrieved. So next, SQL Server must take the bookmark value, the number “4”, and then start at the root level of the clustered index, and then transverse the intermediate and leaf level until it locates the matching row. At this point, the data is returned back the original query. SQL Server Database Files SQL Server Database Files .MDF File • "Main Data File" • Database Tables • Indexes • Triggers • Store Procedures • Etc. .LDF File • • "Log Data File" Transaction Log file .NDF File(s) • • • • "aNother Data File" Secondary Database file Can make Primary Can change path, size By default, a database will only have an .MDF and .LDF file. It's up to the database designer / developer / administrator whether or not it also has an .NDF file (or multiple .NDF files). File placement strategies are all about performance and reliability. Typically, you are going to want to place your .LDF files on a separate disk/drive from everything else. Ideally you don't want them on the same disk as the operating system or the SQL Server program files, and you definitely don't want them on the same disks as your databases. SQL Server Database Files By default, a database will only have an .MDF and .LDF file. It's up to the database designer / developer / administrator whether or not it also has an .NDF file (or multiple .NDF files) SQL Server Database Files DATABASE 1 DATABASE 2 FILESTREAM SQL Server Database Files DATABASE 1 DATABASE 2 FILEGROUPS FILESTREAM SQL Server Database Files Where are the SQL Server database files actually located? C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA SQL Server Database Files How do you create or alter the .MDF, .NDF, or . LDF files? 1. Open SQL Server Management Studio (SSMS), and login 2. In the Object Explorer column, expand the Databases directory 3. Right-click on your database (e.g., Books) and select Properties from the popup SQL Server Database Files 4. In the Database Properties window, select Files (from Select a page column) SQL Server Database Files 5. In the Database Files window, under the Logical Name column, can be found the file names for your .mdf and .ldf files 6. Here you can change the Initial Size (MB) of the files as well as set up the Autogrowth properties of both files - Click in the Initial Size (MB) window to change the size - Click on the "three ellipsis" button under Autogrowth to change properties 7. Here too, you can scroll to the right to see the Path and File Names of your database files SQL Server Database Files 8. To create a .NDF file for your database, click the Add button in the Database Properties windows; this will create a new row in the Database files window SQL Server Database Files 9. Give the new Database file a name (since I’m going to use it as a “secondary” database file I appended a ‘_secondary’ after my name), and then I selected <new filegroup > from the dropdown menu. Give your new Filegroup a name (e.g., zTest_Secondary). SQL Server Database Files 8. To create a .NDF file for your database, click the Add button in the Database Properties windows; this will create a new row in the Database files window SQL Server Database Files 9. I create a new Database file called zTest_secondary and a new Firegroup called SECONDARY. If I go look in the MSSQL\DATA directory on the server, I can see my files C:\Program Files\Microsoft SQL Server\MSSQL11_50.MSSQLSERVER\MSSQL\DATA SQL Server Database Files How to Move Database Files to a Different Physical Location 1. Open SQL Server Management Studio (SSMS), and login 2. In the Object Explorer column, expand the Databases directory 3. Right-click on your database (e.g., zTest2013) and select Properties from the popup 4. From the left side Select a page column menu, click on Files to see the Path of your current Data files. Write it down or use Control + C to copy the path and save it for later use (in a text or Word file, whatever you prefer). 5. Back in the main window, detach the database by right-clicking on the database, then selecting Tasks, then Detach… SQL Server Database Files How to Move Database Files to a Different Physical Location 7. A new window will appear. Click on Drop Connections, then the OK button. You'll notice back in the main window that your database is no longer listed. SQL Server Database Files How to Move Database Files to a Different Physical Location 8. Navigate to the location of your MSSQL\DATA files that you wrote or copied down earlier. C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA 9. Copy/Move the datafiles to your new physical location (I created a directory on my E: drive called zDatafiles). Note: I always recommend copying instead of moving in case the file(s) get corrupted during the move, you will still have the original file(s) to go back to. Once the copy is successful, you can now safely delete the original files. SQL Server Database Files How to Move Database Files to a Different Physical Location 10. Once the files are copied, return to the main window, click on the databases folder, and choose Attach. SQL Server Database Files How to Move Database Files to a Different Physical Location 11. From the new window, click Add. SQL Server Database Files How to Move Database Files to a Different Physical Location 12. I kept my original files in the MSSQL\DATA directory on the C: drive AND copied them to a new zDatafiles directory on the E: drive (just to show how this is done). Now, I could point to ALL the files on the E: drive and attach them there, OR I could reattach my original MDF file on my C: drive, then attach my LDF file from my E: drive, thus separating my MDF/NDF files from my LDF transaction log file on two physically different drives. Navigate to the location of your original MDF file and select it. SQL Server Database Files How to Move Database Files to a Different Physical Location 13. Return to the main window, and your database should now be listed (if not, right-click on the Databases folder and select Refresh) Adding a Table to a New Database 1. Right-click on your database, and select New Table… Adding a Table to a New Database 2. In the Properties table, give your table a pertinent name, then start adding your column names, data types, etc. SQL Server FILESTREAM SQL Server FILESTREAM How do you create or alter the FILESTREAM files? To enable and change FILESTREAM settings 1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2008 R2, point to Configuration Tools, and then click SQL Server Configuration Manager. 2. In the list of services, right-click SQL Server Services, and then click Open. 3. In the SQL Server Configuration Manager snap-in, locate the instance of SQL Server on which you want to enable FILESTREAM. 4. Right-click the instance, and then click Properties. 5. In the SQL Server Properties dialog box, click the FILESTREAM tab. 6. Select the Enable FILESTREAM for Transact-SQL access check box. SQL Server FILESTREAM 7. If you want to read and write FILESTREAM data from Windows, click Enable FILESTREAM for file I/O streaming access. Enter the name of the Windows share in the Windows Share Name box. 8. If remote clients must access the FILESTREAM data that is stored on this share, select Allow remote clients to have streaming access to FILESTREAM data. 9. Click Apply. 10. In SQL Server Management Studio, click New Query to display the Query Editor. 11. In Query Editor, enter the following Transact-SQL code: EXEC sp_configure filestream_access_level, 2 RECONFIGURE 12. Click Execute. 13. Restart the SQL Server service. An Introduction to SQL Server FILESTREAM (Simple Talk) SQL Server FILESTREAM 1. At the top of the Object Explorer, right-click on your SQL Server, select Properties SQL BOOKS ONLINE http://technet.microsoft.com/en-us/library/bb933993%28SQL.100%29.aspx SQL Server FILESTREAM 2. On the left panel click on the Advanced tab, then click on the drop down list next to Filestream Access Level and select Full access enabled option, then OK SQL Server FILESTREAM 1. 2. 3. 4. 5. Create a new database, type a database name In the left-hand column, click Filegroups Under the Filestream grid (the lower window), click the lowermost Add button Add a name, check Default checkbox Select General from left-hand column, add new database file, select Filestream Data from File Type, the Filegroup name with auto-fill using filegroup you just created DO NOT PRESS OK YET, but see instructions on next slide SQL Server FILESTREAM 6. Scroll over and set the Path to the drive/directory where the filestream files will be saved, then OK Scroll over and set the Path to the drive/directory where the filestream files will be saved, then OK http://sqlbeyond.blogspot.com/2011/08/how-to-use-filestream.html SQL Server FILESTREAM 7. In SQL Server 2008 and 2008 R2, SSMS does not support FILESTREAM attribute in CREATE TABLE designer. That means you cannot create a table that has FILESTREAM column by using SSMS. So TSQL is used to create FILESTREAM enabled table. CREATE TABLE FSTable ( -- ROWGUID column is required [RowId] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE, [Id] int NOT NULL, -- This is FILESTREAM column [Data] VARBINARY(MAX) FILESTREAM NULL ) SQL Server FILESTREAM 8. FileStream table can be used or handled just like other table. This means we can treat FileStream column as normal varbinary column. One can select, insert, update, delete the same way as other table. For Information on how to save and retrieve files to and from a filestream directory, see: http://weblogs.asp.net/aghausman/archive/2009/03/16/saving-and-retrieving-file-usingfilestream-sql-server-2008.aspx http://www.mssqltips.com/sqlservertip/1489/using-filestream-to-store-blobs-in-the-ntfsfile-system-in-sql-server-2008/