Download Slide 1: In this demonstration, we are going to discuss Data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Concurrency control wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

PL/SQL wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
Slide 1:
In this demonstration, we are going to discuss Data Definition Language (DDL) and Indexing.
Before doing this, I would like to clear up some issues in the previous DML demo. In the following query :
Alter table OrderDetailsCopy-- this yields an error
Drop column TotalSale
When I tried to drop a column, I got an error because the column has a default constraint on it. So the
column CANNOT be dropped.
Then I had NO PROBLEM to drop a table.
drop table OrderDetailsCopy
If you really want to drop the column, you need to drop the constraint first. In this demo I just used a
DEFAULT 0 to add a constraint to the TotalSale column
ADD TotalSale Money DEFAULT 0
We can find the system-defined name for the constraint and DROP the constraint first, then drop the
column.
Now I am going to discuss DDL and Indexing in this video so you will learn what you need to know in this
class. I am going to discuss indexing deeper in a separate, optional video.
Slide 2:
Now I am reviewing some of the DDL statements used in the textbook. CREATE and ALTER are
commands. DATABASE, TABLE, INDEX, SEQUENCE, FUNCTION, PROCEDURE, TRIGGER, and VIEWS are
objects.
Slide 3:
More DDL statements. DROP statements are the most dangerous ones, assuming you have
security permissions to do that. Once you DROP the objects, they will be GONE unless you have some
sort of backups to bring them back. You are not only drop the data, but also drop the entire structures.
DDL statements need to be used conservatively. Always make sure that you have backups.
Slide 4:
Next two slides are just the reviews of what I just said. Sometimes for CREATE DATABASE we
just used the simplest form like this:
CREATE DATABASE New_AP;
This will create a new database called New_AP, which will be based on several defaults of the model
database. We will see that later.
I also want to show you that, in “Attach and existing database file”, it is important to know the location
of your files. You can create a database in a specific file location. There is some performance benefits to
do this as well. There are two files always associated with a database. One of them is the “Master Data
File” (mdf), the other is the “Log Data File” (LDF). Sometimes people will put them in different servers or
different hard drives (boxes) on the same server for performance reasons, allowing less contentions for
reading mdf and writing ldf to the single “box” (Note: Contention means the conflicts over access to a
shared resource). Once again, this is the job of the administrator.
Slide 5:
This is the basic syntax of the CREATE TABLE statement. Then the common column attributes
you can use. We see that NULL and NOT NULL. NOT NULL means that the column is required and you
are not allowed to put the NULL value in it. We have seen PRIMARY KEY and we can put UNIQUE
constraints to it. Similar to the PRIMARY KEY, we can put indexing on the UNIQUE constraints such as a
social security number, or some column which should be unique such as a log-in, a user name, or some
sort. It will be a valid UNIQUE column, which will not be a PRIMARY KEY in the system, but it will be in
an UNIQUE constraint to keep that value unique and no other value can get into that column. IDENTITY
column will generate a value for you. We have also seen DEFAULT. SPARSE will be covered later.
Slide 6:
Here is a CREATE TABLE statement without column attributes and contains only the basic
information you need to create a table, which is the name of the table, the names of the columns, and
their datatypes. Then you can CREATE TABLE with all the column attributes. But I prefer to use the
barebone CREATE TABLE statement add the column attributes later. I will show you in the demo. Here
you can put the PRIMARY KEY IDENTITY column, NULL or NOT NULL, and create a constraint which is a
DEFAULT constraint. The SPARSE example here means that if we know there are a lot of NULLs stored in
the column, we use SPARCE in the column definition and allow the system to minimally stored what is
actually in the column and ignore a lot of the NULL values (knowing they are NULLs, just don’t have to
store them all). So SPARCE actually can help to save a lot of space when the column is not used as often.
Slide 7:
Here is the basic syntax of the CREATE INDEX statement. If you use the CREATE INDEX
statement, by default it will create a nonclustered index. If you can recall the clustered index which is
the only way the data is actually stored in the database, like the main index. But you can have multiple
nonclustered indexes. I like to think this as the index on the back of the book and you can have more
than one index. The nonclustered index is a specialized index to facilitate the search of a specific column
(VendorID) often used in queries. For example, in the following query:
CREATE INDEX IX_VendorID
ON Invoices (VendorID);
If VendorID is not already a primary key, it makes sense to create a nonclustered index for it.
To create a clustered index:
CREATE INDEX CLUSTERED IX_VendorID
ON Invoices (VendorID); -- Be aware that there is already a clustered index in the table
You can create a nonclustered primary key in the CREATE TABLE statement (default for primary key:
CLUSTERED):
CREATE TABLE Invoices
(InvoiceID
INT
PRIMARY KEY NONCLUSTERED, ….
The statement will enforce primary key and foreign key relationships and referential integrity between
tables but it would not be CLUSTERED, in other word, stored in that form.
Here is one example of creating a nonclustered index on two columns.
CREATE INDEX IX_Invoices
ON Invoices (InvoiceDate DESC, InvoiceTotal);
-- Create an index on the two composite keys, or any two columns being queried by or queried
together extensively
-- The order of these two columns is very important so be careful of choosing that. The index
will be organized by the first column FIRST, just like some kind of sort, then by the second.
NOTE: This is very important to know!!
SQL Server automatically creates a clustered index for a table’s primary key.
Interview questions: How many clustered indexes can they be on a table? One, and ONLY ONE.
How many nonclustered indexes can be on a table? Depending on the version of
SQL server. I believe the number is 256. You should double check on that in the
text. But you should not build hundreds of indexes on a table, which will hurt the
performance. Because SQL Server will have to look at all of them to determine if
they are useful for the given query or not. The more you have, the more time and
resources spent on checking those. You need to create indexes that are intelligent
and manage them properly. I will talk more about that too.
Slide 8:
Column-level constraints restrict the information in the domain of the single column itself. (see
text in the video)
Slide 9:
Table-level constraints (as well as Column-level constraints): PRIMARY KEY, UNIQUE, CHECK, and
(FOREIGN KEY) REFERENCES. (see text in the video). NULL is not here because it does not in a TABLE
level constraining feature to it.
Slide 10:
I use a lot of graphics to tell you how indexing works in the optional video. I really recommend
you to look at that video.
DEMO:
/* Prog 140 Module 3: More on DDL */
We are going to use a small database TestDB. In the pop-up menu, there is a DELETE but no DROP.
DELETE means that you remove all the data but keep the structure. In my opinion it should say DROP.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- More DDL (Data Definition Language) : Create, Alter, Drop; and INDEXES/Indexing
-- We will cover the basics here and leave all the options and sophistication to an advanced
-- administration course. However to be effective SQL programmers, as we have already seen, we need
-- to have a solid grounding in the DDL statements that we will need to use and understand.
-- We will also talk about indexing and its effects on performance because, as discussed, it is not just
-- your job to know how to write a query to return correct results but you also need to be able to write
-- effective and efficient queries.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-- in Chapter 11 we will SKIP snippets (user interface functionality) and sequencing (really cool feature,
instead of IDENTITY, produce some special sequences for your company database)
Use Master
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Creating SQL objects
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- All new databases are based on the Model database
-- Your DBA will provide those tables, views, and stored procedures, and the size of the database specific
for your company in the Model database folder. You can decide where to store the database but you
need to check with your DBA. Usually, CREATE and DROP databases are in the domain of the DBAs. You
can create your own personal databases with certain algorithms or stored procedures which can be later
used for the company databases.
Create database TestDB
Drop database TestDB
-- the .mdf (master data file/primary data file/DBName_Data) and .ldf (log data file/transaction log file/
DBName _log) are stored HERE:
-- C:\Program Files\Microsoft SQL Server\MSSQL11.MSSQLSERVER\MSSQL\DATA
-- In SQL Server, there must be a log file. If there is any problem occurred in the system, SQL server is
very robust to recover from it because of the log file. By default, mdf and ldf files are stored together.
But as I mentioned, your DBA may very likely place the log file in a different box for performance and
safety reasons. If we lost the data file, we may be able to recover the database using the log file.
-- remember our first assignment and the creation of the Testing database?
Use Testing
select * from Employee;
-- drop table JobsEmps; drop table Jobs;
Create Table Jobs (JobID int IDENTITY
, JobDescription varchar(100) not null);
--Putting a semicolon at the end of the statement is ANSI standard! You just want to get it into a habit.
Create Table JobsEmps(JobEmpID int IDENTITY
, JobID int, EmpID int DEFAULT 0
, TotalHours float null);
-- Adding Constraints
Alter table Jobs ADD CONSTRAINT pk_JobID Primary Key Clustered (JobID);
Alter table JobsEmps ADD CONSTRAINT pk_JobEmpID Primary Key NonClustered (JobEmpID);
Alter table JobsEmps ADD CHECK (TotalHours >= 0 and TotalHours <= 1000);
Alter table JobsEmps ADD CONSTRAINT DF_stuff DEFAULT 0 for TotalHours;
Alter table JobsEmps ADD CONSTRAINT FK_EmpID Foreign Key (EmpID) REFERENCES
Employee(EmpID);
Alter table JobsEmps ADD CONSTRAINT FK_JobID Foreign Key (JobID) REFERENCES Jobs(JobID);
--JobsEmps and Jobs: One-to-Many relationship; JobsEmps and Employee: One-to-Many relationship
Both Jobs and Employee tables are parent tables. JobsEmps is a child table.
--JobsEmps has two foreign keys: JobID and EmpID. JobEmpID is a surrogate key to facilitate easy search.
-- Add Indexes
--The first index has two columns as a composite key (JobID and EmpID) because people are more likely
to use these to search rather than using the surrogate key JobEmpID.
Create CLUSTERED Index IX_JobIDEmpID on JobsEmps (JobID, EmpID)
--The Employee table does have a primary key, but I am going to create an index for last name because I
assume the last name is going to be searched by a lot.
Create NonClustered Index IX_Lastname on Employee(Lastname)
-- why can't I do this? select count(*) from Jobs
drop table Jobs; drop table JobsEmps;
--Because of the foreign key constraint. At the beginning of the video, we talked about dropping the FK
constraint first, then the table. But there is an easier way here: Drop the child table first, then the
parent table.
-- but I can do this:
drop table JobsEmps; drop table Jobs;
--JobsEmps is a child table. Jobs is a parent table.
-- tables must be dropped "in order" with all child tables dropped first
-- We do not index on Low Cardinality fields
-- Cardinality is the number of unique values for that field within the table.
-- High cardinality = more unique values (like SSN, log-in ID)
-- Low cardinality = few unique values (like gender, marital status)
-- if in the WHERE clause you reference a field frequently - then that's a good field to index upon
--if the field is queried frequently – good candidate for indexing. But how do you know?
Use two tools from the Tools pull-down menu: Database Engine Tuning Advisor and SQL Server Profiler. I
encourage you to take a look on the Database Engine Tuning Advisor when you have time. What it
would do is that it allows you to submit a script you or your user would use to query the database on a
regular basis. Database Engine Tuning Advisor will recommend the index(es) you should have, and some
index(es) are not necessary and need to be removed to improve performance. This tool DOES NOT
create indexes for you, it just gives you advice. Then you create/remove indexes manually. You should
do this regularly. If you are not allowed to do this, ask your DBA. SQL Server Profiler is running in the
background. It will determine the query that is running and create a query for you to provide to the
Database Engine Tuning Advisor. Usually a company will let the SQL Server Profiler run 2-3 days during
important time or normal time, then submit the query to the Database Engine Tuning Advisor on a
monthly or quarterly basis.
-- More on indexes!
/******************************************************************/
/* Creating indexes */
/******************************************************************/
Use AdventureWorks2012
-- finding out about indexes in this database using system stored procedures
--HumanResources is the Schema name. Schema, an organizing principle, logically organizes related
tables/views, which I found very useful on the job for large databases.
exec sp_help [HumanResources.Employee]; -- The output has huge amount of info for the table
exec sp_helpindex [HumanResources.Employee]; -- info ONLY for indexes. Preferred way in SQL
Server.
exec sp_helpindex 'HumanResources.Employee'; -- here we need delimiters. Used in some
systems.
select * from HumanResources.Employee;
-- looking at the data, would maritalstatus or gender be good columns to index? NO
-- How about Hiredate? ModifiedDate? YES! Dates are usually of high interests for companies, alone or
in combination with other field(s), and usually high cardinality!
Create nonclustered index AK_Employee_JobTitle
on HumanResources.Employee (JobTitle ASC);
--I don’t know what this AK_ is. Usually I create an index starting with IX_. These are just nomenclatures.
You don’t have to do them, but you do need to keep naming consistent.
drop index AK_Employee_JobTitle on HumanResources.Employee;
-- here we don't need delimiters (and can't have them!)- go figure.
-- Back to Northwind!
Use Northwind
exec sp_help [Orders];
exec sp_helpindex [Orders];
exec sp_helpindex 'Orders'; -- here we need delimiters
-- Composite key indexes:
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_EmpID_OrderDate
ON Orders ( CustomerID ASC, EmployeeID ASC, OrderDate Desc)
--Too many columns in an index is also a performance issue because it’s like storing a whole new table in
that. So keep the index short and sweet whenever possible.
-- Cover Index - is an index that includes ALL columns for a specific report thus really speed up that
particular report (ONLY for really critical reports. Your users will tell you.)
-- an alternative is to include columns in the index. Can be very useful for reporting!
-- (these aren't considered by the DB engine when it calculates
-- the # of index key columns or index key size):
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_EmpID
ON Orders ( CustomerID ASC, EmployeeID ASC)
INCLUDE (OrderDate) -- No more than 7-10 columns.
-- using this you can create a "Cover" index that can cover columns in a key report;
-- CAREFUL not to create too many indexes!! This can start impacting performance as well
DROP INDEX Orders.IX_Orders_CustomerID_EmpID_OrderDate
DROP INDEX Orders.IX_Orders_CustomerID_EmpID
/* Free space in indexes: affected by the fillfactor and padindex options */
--Important for my optional video for indexing
--Your indexes are stored in balanced trees in pads. You can specify, when an index itself was created,
that the pad is not completely filled to allow growth.
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_EmpID_OrderDate
ON Orders ( CustomerID ASC, EmployeeID ASC, OrderDate Desc)
WITH (FILLFACTOR = 65, PAD_INDEX = ON, DROP_EXISTING = ON)
--Here I say that I want the fillfactor of 65%, which means that the leaves where the index was created
of were 65% filled, allowing those inserts happening in this Orders table. If the fillfactor is high (95% for
example), the pads get splitted to accommodate the growth, which causes system reconfiguration and is
very resource-intensive. So for dynamic tables we need to have a relatively low fillfactor, while static
tables we need a high fillfactor for efficiency. Pad indexes is for VERY dynamic tables. In this case of
PAD_INDEX = ON, not only the leaf nodes, but also root nodes and intermediate nodes will be filled with
the same fillfactor (OFF: only fill the leaf nodes with the fillfactor). DROP_EXISTING allows us to drop
the existing index and re-create the index, like organizing your closet and make more room so that the
index will be more efficient. (For more details, see the optional video)
/* Getting info about indexes: */
exec sp_help [tablename]
exec sp_helpindex [tablename]
exec sp_help Orders;
exec sp_helpindex Orders;
-- using catalog views:
select * from sys.indexes; --metadata (data about the SQL Server databases) stored in the
master database
--Other system DBs: model, tempDB stores temp tables; msdb may be deleted by some DBA, stores SQL
Server jobs and alerts. When you use sys., meaning that you are querying the system databases.
select * from sys.index_columns;
select * from sys.stats_columns;
select * from sys.indexes where name = 'IX_Orders_CustomerID_EmpID_OrderDate';
--The third video is optional, but highly recommended. You will not be held responsible for not learning
its contents.