Download Slide 1: In this demonstration, we are going to discuss Data

Slide 1: In this demonstration, we are going to discuss Data Definition Language (DDL) and Indexing. Before doing this, I would like to clear up some issues in the previous DML demo. In the following query : Alter table OrderDetailsCopy-- this yields an error Drop column TotalSale When I tried to drop a column, I got an error because the column has a default constraint on it. So the column CANNOT be dropped. Then I had NO PROBLEM to drop a table. drop table OrderDetailsCopy If you really want to drop the column, you need to drop the constraint first. In this demo I just used a DEFAULT 0 to add a constraint to the TotalSale column ADD TotalSale Money DEFAULT 0 We can find the system-defined name for the constraint and DROP the constraint first, then drop the column. Now I am going to discuss DDL and Indexing in this video so you will learn what you need to know in this class. I am going to discuss indexing deeper in a separate, optional video. Slide 2: Now I am reviewing some of the DDL statements used in the textbook. CREATE and ALTER are commands. DATABASE, TABLE, INDEX, SEQUENCE, FUNCTION, PROCEDURE, TRIGGER, and VIEWS are objects. Slide 3: More DDL statements. DROP statements are the most dangerous ones, assuming you have security permissions to do that. Once you DROP the objects, they will be GONE unless you have some sort of backups to bring them back. You are not only drop the data, but also drop the entire structures. DDL statements need to be used conservatively. Always make sure that you have backups. Slide 4: Next two slides are just the reviews of what I just said. Sometimes for CREATE DATABASE we just used the simplest form like this: CREATE DATABASE New_AP; This will create a new database called New_AP, which will be based on several defaults of the model database. We will see that later. I also want to show you that, in “Attach and existing database file”, it is important to know the location of your files. You can create a database in a specific file location. There is some performance benefits to do this as well. There are two files always associated with a database. One of them is the “Master Data File” (mdf), the other is the “Log Data File” (LDF). Sometimes people will put them in different servers or different hard drives (boxes) on the same server for performance reasons, allowing less contentions for reading mdf and writing ldf to the single “box” (Note: Contention means the conflicts over access to a shared resource). Once again, this is the job of the administrator. Slide 5: This is the basic syntax of the CREATE TABLE statement. Then the common column attributes you can use. We see that NULL and NOT NULL. NOT NULL means that the column is required and you are not allowed to put the NULL value in it. We have seen PRIMARY KEY and we can put UNIQUE constraints to it. Similar to the PRIMARY KEY, we can put indexing on the UNIQUE constraints such as a social security number, or some column which should be unique such as a log-in, a user name, or some sort. It will be a valid UNIQUE column, which will not be a PRIMARY KEY in the system, but it will be in an UNIQUE constraint to keep that value unique and no other value can get into that column. IDENTITY column will generate a value for you. We have also seen DEFAULT. SPARSE will be covered later. Slide 6: Here is a CREATE TABLE statement without column attributes and contains only the basic information you need to create a table, which is the name of the table, the names of the columns, and their datatypes. Then you can CREATE TABLE with all the column attributes. But I prefer to use the barebone CREATE TABLE statement add the column attributes later. I will show you in the demo. Here you can put the PRIMARY KEY IDENTITY column, NULL or NOT NULL, and create a constraint which is a DEFAULT constraint. The SPARSE example here means that if we know there are a lot of NULLs stored in the column, we use SPARCE in the column definition and allow the system to minimally stored what is actually in the column and ignore a lot of the NULL values (knowing they are NULLs, just don’t have to store them all). So SPARCE actually can help to save a lot of space when the column is not used as often. Slide 7: Here is the basic syntax of the CREATE INDEX statement. If you use the CREATE INDEX statement, by default it will create a nonclustered index. If you can recall the clustered index which is the only way the data is actually stored in the database, like the main index. But you can have multiple nonclustered indexes. I like to think this as the index on the back of the book and you can have more than one index. The nonclustered index is a specialized index to facilitate the search of a specific column (VendorID) often used in queries. For example, in the following query: CREATE INDEX IX_VendorID ON Invoices (VendorID); If VendorID is not already a primary key, it makes sense to create a nonclustered index for it. To create a clustered index: CREATE INDEX CLUSTERED IX_VendorID ON Invoices (VendorID); -- Be aware that there is already a clustered index in the table You can create a nonclustered primary key in the CREATE TABLE statement (default for primary key: CLUSTERED): CREATE TABLE Invoices (InvoiceID INT PRIMARY KEY NONCLUSTERED, …. The statement will enforce primary key and foreign key relationships and referential integrity between tables but it would not be CLUSTERED, in other word, stored in that form. Here is one example of creating a nonclustered index on two columns. CREATE INDEX IX_Invoices ON Invoices (InvoiceDate DESC, InvoiceTotal); -- Create an index on the two composite keys, or any two columns being queried by or queried together extensively -- The order of these two columns is very important so be careful of choosing that. The index will be organized by the first column FIRST, just like some kind of sort, then by the second. NOTE: This is very important to know!! SQL Server automatically creates a clustered index for a table’s primary key. Interview questions: How many clustered indexes can they be on a table? One, and ONLY ONE. How many nonclustered indexes can be on a table? Depending on the version of SQL server. I believe the number is 256. You should double check on that in the text. But you should not build hundreds of indexes on a table, which will hurt the performance. Because SQL Server will have to look at all of them to determine if they are useful for the given query or not. The more you have, the more time and resources spent on checking those. You need to create indexes that are intelligent and manage them properly. I will talk more about that too. Slide 8: Column-level constraints restrict the information in the domain of the single column itself. (see text in the video) Slide 9: Table-level constraints (as well as Column-level constraints): PRIMARY KEY, UNIQUE, CHECK, and (FOREIGN KEY) REFERENCES. (see text in the video). NULL is not here because it does not in a TABLE level constraining feature to it. Slide 10: I use a lot of graphics to tell you how indexing works in the optional video. I really recommend you to look at that video. DEMO: /* Prog 140 Module 3: More on DDL */ We are going to use a small database TestDB. In the pop-up menu, there is a DELETE but no DROP. DELETE means that you remove all the data but keep the structure. In my opinion it should say DROP. ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- More DDL (Data Definition Language) : Create, Alter, Drop; and INDEXES/Indexing -- We will cover the basics here and leave all the options and sophistication to an advanced -- administration course. However to be effective SQL programmers, as we have already seen, we need -- to have a solid grounding in the DDL statements that we will need to use and understand. -- We will also talk about indexing and its effects on performance because, as discussed, it is not just -- your job to know how to write a query to return correct results but you also need to be able to write -- effective and efficient queries. --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -- in Chapter 11 we will SKIP snippets (user interface functionality) and sequencing (really cool feature, instead of IDENTITY, produce some special sequences for your company database) Use Master ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Creating SQL objects ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- All new databases are based on the Model database -- Your DBA will provide those tables, views, and stored procedures, and the size of the database specific for your company in the Model database folder. You can decide where to store the database but you need to check with your DBA. Usually, CREATE and DROP databases are in the domain of the DBAs. You can create your own personal databases with certain algorithms or stored procedures which can be later used for the company databases. Create database TestDB Drop database TestDB -- the .mdf (master data file/primary data file/DBName_Data) and .ldf (log data file/transaction log file/ DBName _log) are stored HERE: -- C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA -- In SQL Server, there must be a log file. If there is any problem occurred in the system, SQL server is very robust to recover from it because of the log file. By default, mdf and ldf files are stored together. But as I mentioned, your DBA may very likely place the log file in a different box for performance and safety reasons. If we lost the data file, we may be able to recover the database using the log file. -- remember our first assignment and the creation of the Testing database? Use Testing select * from Employee; -- drop table JobsEmps; drop table Jobs; Create Table Jobs (JobID int IDENTITY , JobDescription varchar(100) not null); --Putting a semicolon at the end of the statement is ANSI standard! You just want to get it into a habit. Create Table JobsEmps(JobEmpID int IDENTITY , JobID int, EmpID int DEFAULT 0 , TotalHours float null); -- Adding Constraints Alter table Jobs ADD CONSTRAINT pk_JobID Primary Key Clustered (JobID); Alter table JobsEmps ADD CONSTRAINT pk_JobEmpID Primary Key NonClustered (JobEmpID); Alter table JobsEmps ADD CHECK (TotalHours >= 0 and TotalHours <= 1000); Alter table JobsEmps ADD CONSTRAINT DF_stuff DEFAULT 0 for TotalHours; Alter table JobsEmps ADD CONSTRAINT FK_EmpID Foreign Key (EmpID) REFERENCES Employee(EmpID); Alter table JobsEmps ADD CONSTRAINT FK_JobID Foreign Key (JobID) REFERENCES Jobs(JobID); --JobsEmps and Jobs: One-to-Many relationship; JobsEmps and Employee: One-to-Many relationship Both Jobs and Employee tables are parent tables. JobsEmps is a child table. --JobsEmps has two foreign keys: JobID and EmpID. JobEmpID is a surrogate key to facilitate easy search. -- Add Indexes --The first index has two columns as a composite key (JobID and EmpID) because people are more likely to use these to search rather than using the surrogate key JobEmpID. Create CLUSTERED Index IX_JobIDEmpID on JobsEmps (JobID, EmpID) --The Employee table does have a primary key, but I am going to create an index for last name because I assume the last name is going to be searched by a lot. Create NonClustered Index IX_Lastname on Employee(Lastname) -- why can't I do this? select count(*) from Jobs drop table Jobs; drop table JobsEmps; --Because of the foreign key constraint. At the beginning of the video, we talked about dropping the FK constraint first, then the table. But there is an easier way here: Drop the child table first, then the parent table. -- but I can do this: drop table JobsEmps; drop table Jobs; --JobsEmps is a child table. Jobs is a parent table. -- tables must be dropped "in order" with all child tables dropped first -- We do not index on Low Cardinality fields -- Cardinality is the number of unique values for that field within the table. -- High cardinality = more unique values (like SSN, log-in ID) -- Low cardinality = few unique values (like gender, marital status) -- if in the WHERE clause you reference a field frequently - then that's a good field to index upon --if the field is queried frequently – good candidate for indexing. But how do you know? Use two tools from the Tools pull-down menu: Database Engine Tuning Advisor and SQL Server Profiler. I encourage you to take a look on the Database Engine Tuning Advisor when you have time. What it would do is that it allows you to submit a script you or your user would use to query the database on a regular basis. Database Engine Tuning Advisor will recommend the index(es) you should have, and some index(es) are not necessary and need to be removed to improve performance. This tool DOES NOT create indexes for you, it just gives you advice. Then you create/remove indexes manually. You should do this regularly. If you are not allowed to do this, ask your DBA. SQL Server Profiler is running in the background. It will determine the query that is running and create a query for you to provide to the Database Engine Tuning Advisor. Usually a company will let the SQL Server Profiler run 2-3 days during important time or normal time, then submit the query to the Database Engine Tuning Advisor on a monthly or quarterly basis. -- More on indexes! /******************************************************************/ /* Creating indexes */ /******************************************************************/ Use AdventureWorks2012 -- finding out about indexes in this database using system stored procedures --HumanResources is the Schema name. Schema, an organizing principle, logically organizes related tables/views, which I found very useful on the job for large databases. exec sp_help [HumanResources.Employee]; -- The output has huge amount of info for the table exec sp_helpindex [HumanResources.Employee]; -- info ONLY for indexes. Preferred way in SQL Server. exec sp_helpindex 'HumanResources.Employee'; -- here we need delimiters. Used in some systems. select * from HumanResources.Employee; -- looking at the data, would maritalstatus or gender be good columns to index? NO -- How about Hiredate? ModifiedDate? YES! Dates are usually of high interests for companies, alone or in combination with other field(s), and usually high cardinality! Create nonclustered index AK_Employee_JobTitle on HumanResources.Employee (JobTitle ASC); --I don’t know what this AK_ is. Usually I create an index starting with IX_. These are just nomenclatures. You don’t have to do them, but you do need to keep naming consistent. drop index AK_Employee_JobTitle on HumanResources.Employee; -- here we don't need delimiters (and can't have them!)- go figure. -- Back to Northwind! Use Northwind exec sp_help [Orders]; exec sp_helpindex [Orders]; exec sp_helpindex 'Orders'; -- here we need delimiters -- Composite key indexes: CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_EmpID_OrderDate ON Orders ( CustomerID ASC, EmployeeID ASC, OrderDate Desc) --Too many columns in an index is also a performance issue because it’s like storing a whole new table in that. So keep the index short and sweet whenever possible. -- Cover Index - is an index that includes ALL columns for a specific report thus really speed up that particular report (ONLY for really critical reports. Your users will tell you.) -- an alternative is to include columns in the index. Can be very useful for reporting! -- (these aren't considered by the DB engine when it calculates -- the # of index key columns or index key size): CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_EmpID ON Orders ( CustomerID ASC, EmployeeID ASC) INCLUDE (OrderDate) -- No more than 7-10 columns. -- using this you can create a "Cover" index that can cover columns in a key report; -- CAREFUL not to create too many indexes!! This can start impacting performance as well DROP INDEX Orders.IX_Orders_CustomerID_EmpID_OrderDate DROP INDEX Orders.IX_Orders_CustomerID_EmpID /* Free space in indexes: affected by the fillfactor and padindex options */ --Important for my optional video for indexing --Your indexes are stored in balanced trees in pads. You can specify, when an index itself was created, that the pad is not completely filled to allow growth. CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_EmpID_OrderDate ON Orders ( CustomerID ASC, EmployeeID ASC, OrderDate Desc) WITH (FILLFACTOR = 65, PAD_INDEX = ON, DROP_EXISTING = ON) --Here I say that I want the fillfactor of 65%, which means that the leaves where the index was created of were 65% filled, allowing those inserts happening in this Orders table. If the fillfactor is high (95% for example), the pads get splitted to accommodate the growth, which causes system reconfiguration and is very resource-intensive. So for dynamic tables we need to have a relatively low fillfactor, while static tables we need a high fillfactor for efficiency. Pad indexes is for VERY dynamic tables. In this case of PAD_INDEX = ON, not only the leaf nodes, but also root nodes and intermediate nodes will be filled with the same fillfactor (OFF: only fill the leaf nodes with the fillfactor). DROP_EXISTING allows us to drop the existing index and re-create the index, like organizing your closet and make more room so that the index will be more efficient. (For more details, see the optional video) /* Getting info about indexes: */ exec sp_help [tablename] exec sp_helpindex [tablename] exec sp_help Orders; exec sp_helpindex Orders; -- using catalog views: select * from sys.indexes; --metadata (data about the SQL Server databases) stored in the master database --Other system DBs: model, tempDB stores temp tables; msdb may be deleted by some DBA, stores SQL Server jobs and alerts. When you use sys., meaning that you are querying the system databases. select * from sys.index_columns; select * from sys.stats_columns; select * from sys.indexes where name = 'IX_Orders_CustomerID_EmpID_OrderDate'; --The third video is optional, but highly recommended. You will not be held responsible for not learning its contents.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Slide 1: In this demonstration, we are going to discuss Data