Download Lecture 1 - California State University, Los Angeles

Document related concepts

Microsoft Access wikipedia , lookup

IMDb wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Ingres (database) wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

SQL wikipedia , lookup

Functional Database Model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

PL/SQL wikipedia , lookup

Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Using Relational Databases and SQL
Lecture 1
Introduction
John Hurley
Department of Computer Science
California State University, Los Angeles
Introduction
John Hurley
Call me John, especially outside class.
If that’s too informal for you, you can call me “Instructor”
[email protected]
xxx6aTWOb VI Xxx4cATEd
7 8eONEf5gFORE!hij
d xxx
(text preferred)
Office hours listed on course page. I will often be in A310A (inside A-310) at other times, too.
Attendance
Administrative
Course page:
http://www.calstatela.edu/faculty/jhurley2/classes/cs122
Syllabus
Software download links
Assignment dates
Grading
Grading: A, B, C, (with + and -), NC.
If you are an undergraduate and don’t get a C or
better, you get an NC
If you are a graduate student and don’t get a B or
better, you get an NC
See the grading scale on the syllabus; no curve
In past terms, I have assigned all grades from A to C
as well as NC in this class. Median grade is usually
B or B+, which is lower than the median grade in
my CS120 sections.
Deadbeats Will Fail!
About 10% of the all the course grades I have ever given in
CS122 were NCs. Everyone to whom I have ever given an
NC missed significant portions of the coursework.
If you decide not to take the class, drop it yourself. Don’t
expect me to drop you!
I can’t drop anyone after the no-record drop deadline
You will have your midterm grades before the drop-withW deadline
Labs
Assignments are in text files linked from the course web page
Turn in on CSNS
Posted before the weekly lecture
Part A usually due at the end of the week’s lab class
Part B usually due before the following week’s lecture
Let me know *in advance* if you won’t be able to attend a lab for
some good reason
I may give quizzes towards the end of lab periods if attendance is
poor.
Last labs before the midterm and final will be ungraded, very
realistic practice exams.
Assignments
Labs and exams will contain the following types of questions:
short answers and multiple choice
1-paragraph answers
SQL to English
English to SQL
Quizzes
Quizzes will be administered either during lectures or labs
Quizzes usually unannounced but open-notes
I may give one pre-announced closed-book quiz which requires
you to memorize a few very important definitions
No makeups unless you provide a satisfactory explanation *in
advance* for why you won’t be in class.
Cheating
You may discuss general material about databases and the
techniques taught in this class with other students
You may give or receive help understanding assignments and
debugging work
Cheating
You may copy examples from the lecture notes and then change
them to meet assignment requirements.
Working programmers often solve problems in similar ways.
Other instructors may not allow this. I am only saying that it is OK
and expected in my sections of CS122.
You may not directly use language from the lecture notes to
answer short-answer questions; restate the answers in your own
words. This is difficult to do; that’s the point.
Cheating; Copying
There are grey areas in cheating in CS, but presenting an answer
that is copied directly from any source other than your brain is
always over the line.
You may not copy code from other students or allow anyone to
copy your code.
Few to none of my assignment questions are taken from the
textbook or other sources, so don’t bother copying published
solutions to the textbook exercises.
If you copy code posted by past students, you will likely do
poorly anyway, because I change many questions slightly each
term.
Cheating on LAB WORK
OK on lab work:
Copying examples from the lecture notes and modifying them to fulfill the
assignment
Examples of legitimate help for other students:
“The problem with your query is that you forgot to write the join
condition”
“That isn’t working because on a Mac the table name is case sensitive”
“You accidentally copied a character from Powerpoint that is invalid in
mySQL”
“You need to use a float instead of an int for that field because the values
might not be an integer”
Cheating on lab work:
Copying code from other students or internet sources
Copying text from other students for short-answer or essay questions
Copying text from the internet or a book for short-answer and essay questions
Cheating on Exams and Quizzes
OK on exams and open-book quizzes
Consulting lecture notes, textbooks, your own notes
Checking Wikipedia or other internet sources that do not
involve real-time communication with human beings
Copying examples from the lecture notes and modifying
them to answer the questions
Cheating on exams and quizzes:
Copying code or text from other students or internet sources
Answering short-answer questions with direct quotes form
the notes (restate them in your own words!)
Communicating with any human being other than me via
email, chat, phone, or any other means
Cheating Detection
It is obvious to me when students answer short-answer questions with text
copied from professional-level sources like Wikipedia and textbooks.
Even for SQL code, there are only a few correct answers to each question
using the material we cover. However, if you copy answers from other
students, you will sooner or later copy an identifiable incorrect answer or
trip up in some other way.
I will be comparing all students’ lab and exam papers using an automatic
tool designed to detect copying. I developed this application specifically
to detect cheating in CS122!
If I do detect copying, I will penalize all students involved equally. If you
understand the material, it is foolish to take this risk by letting other
students copy your work.
People who do well on labs but poorly on exams and quizzes receive
careful scrutiny!
Using Relational Databases and SQL
Part I
Databases
Database Definition
Data (information) + base (foundation)
A database is a structured collection of persistent
data.
Structured: organized according to a set of rules. In
this case, organized according to a database model.
Persistent: stored in permanent storage, not just RAM.
If you shut down the application or the power goes
off, the data is not lost.
Database Definition
Many definitions are like this one:
A collection of data, typically modelling the activities of
one or more related organizations (Ramkrishnan and
Gherke, Database Management Systems.)
 I don’t like this definition, because databases don’t
always model anything in particular. Database
designers don’t always know what the data will be
used for.
What is a Database?
Structured using a database model
No database model, no database!
Often, not always, used to model organizational
activities
Examples:
Companies
Stores
Universities
Database Skills
Database skills are foundational in CS
The great majority of modern applications use
databases to store information
You will put these skills together with your OOP
programming skills a little later if you are an
undergraduate, very soon if you are a grad student
As a working software engineer, you will
probably use the skills you learn in this class
every day
Database Skills
Some applications you are familiar with that rely
heavily on large databases:
Wikipedia
GET
Amazon.com
ITunes
Tables
User can add and remove tables, get information
from them, update or delete information in them,
change them
These are the skills we will study in this class
Database Background
Storage was bulky, expensive, and slow in the old
days!
Physical/Logical Separation
Previous to the inventions of DBMS, one had to
write a program that traversed pointers at the
physical level to extract data from a database
By abstracting the physical level and writing a
program at the logical level instead, extracting
data from a database became much easier
Database Models
The Two Levels of a Database Model
Physical Level (how data is stored)
The things we don’t have to worry about
Logical Level (how data is organized)
The things we do care about
The Basic Models
Hierarchical Model (IBM’s IMS) represented data as
a tree
Network Model (CODASYL)
Relational Model (ALPHA, SEQUEL)
Hierarchical Database Model
Hierarchical Database Model
Example of a query to retrieve info:
for book in (get_children("Programming/J.Smith”))
print book.field("Title"), book.field("Publisher")
Mostly superseded by relational model
Has an afterlife with XML
XML Data
<CATALOG>
<CD>
<TITLE>When a Man Loves A Woman</TITLE>
<ARTIST>Percy Sledge</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Atlantic</COMPANY>
<PRICE>8.70</PRICE>
<YEAR>1987</YEAR>
</CD>
<CD>
<TITLE>Black Angel</TITLE>
<ARTIST>Savage Rose</ARTIST>
<COUNTRY>EU</COUNTRY>
<COMPANY>Mega</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1995</YEAR>
</CD>
</CATALOG>
Network Model
Built on hierarchical model but allows multiple parents
and multiple children
Relational Model
Proposed by Edgar F. Codd (circa 1969)
Database is a collection of tables (relations)
Relational comes from ‘Relational
Algebra/Calculus’ and not from ‘Relationships’
Relational model is based on extensive
mathematical theory, which we will not cover in
this class
Dominant database model
Oracle was the first to aggressively market a
commercial relational database product
Dr. Edgar F(rank) Codd
MA Mathematics, MA Chemistry
MS and PhD in Communication Sciences
ACM Turing Award (1981)
Tables
Artists = Table (Relation)
ArtistID, City, Region, ... = Columns (Attributes)
Each row is called a Record (Tuple)
Using Relational Databases and SQL
Part II
Database Management Systems
Database Management Systems
(DBMS)
A DBMS handles these functions:
Data definition: Defining new data structures for a database,
removing data structures from the database, modifying the
structure of existing data.
Update: Inserting, modifying, and deleting data.
Retrieval: Obtaining information either for end-user queries
and reports or for processing by applications.
Administration: Registering and monitoring users,
enforcing data security, monitoring performance,
maintaining data integrity, dealing with concurrency control,
and recovering information if the system fails.
Source: Wikipedia
Database Management Systems
(DBMS)
Some common relational DBMSs:
MySQL, PostgreSQL (free, open source)
Oracle, MS SQL Server (commercial)
Database Schemas
The definition of the database, where you define
Tables
Relationships
Constraints
Stored Functions and Procedures
Views
Indexes
Schemas are typically represented by a schema
diagram; see the Lyric diagram linked from the
course page
Database Management Systems
(DBMS)
You can have multiple databases, each with a
single schema
A separate database for each application
Toystore (First database)
Bookstore (Second database)
Furniture Store (Third database)
Etc.
You can also have a single database, with
multiple schemas
Database Management Systems
(DBMS)
Using Relational Databases and SQL
Part III
Query Languages
Query Languages
Query: question
Query Language = A computer language used to
extract data from a database
Data Sublanguage = A computer language used to
extract and manipulate database data
SEQUEL/SQL (1974) developed at IBM
Query Languages
Data Sublanguage Alpha (Codd’s original query
language)
Data Sublanguage SEQUEL (SQL)
SQL
Stands for Structured Query Language
A non-procedural, domain-specific language (not like
Java, C or C++)
An open ANSI and ISO standard
Supported by most major DBMS
Some variations in implementations
Used by programmers, managers, and database
administrators
SQL
SQL is “nonprocedural” or “declarative”
Procedural languages, like Java or C, require
programmers to implement an algorithm (“a series of
instructions that will solve a problem in a finite
amount of time”) to accomplish each task
Nonprocedural / declarative languages, like SQL,
require the programmer to describe *what data* s/he
wants. The platform (in this case, DBMS) determines
how to produce the data
This is an important distinction, but as we will see, it
is not as clear-cut for SQL as it is for, say, HTML.
SQL Functions
View information from relational databases
Single and multiple table selections
Calculation and analysis
Manipulate information in relational databases
Insert and delete records
Update records
Create relational databases
Create databases, tables, constraints, ...
Nonstandard Features
• SQL is an open standard, but developers of DBMSs often
add additional features that are not part of the standard
• Differentiate their products from competitors
• Vendor lock-in
• What happens when you want to switch to a different DBMS?
• Is it a good idea to use features like this?
Using Relational Databases and SQL
Part IV
Lyric Database Discussion
Primary Keys
Primary key is used to uniquely identify every
record in a table
Must be a field or combination of fields with
unique values
What would happen if we needed to identify
individuals in the university DB and tried to do this
using first name? Last name? Both? Height? DOB?
If more than one field is required, we have a
composite primary key
The Lyric Database
Database for a web-based company that provides
services to artists and the studios that they work
for
Before we start extracting data from a database,
we must understand the database completely first
Let’s go over all the tables and attributes
Primary Key Example
What is the primary key of the Studios table?
What is the primary key of the
XRefArtistsMembers table? (hint: it may require
more than one field to make up a primary key!)
Using Relational Databases and SQL
Part V
MySQL
MySQL
For coursework, we will use MySQL, which you
must install on a USB drive.
– Bring a USB drive to the next class meeting!
You may also install it on your own laptop, but
note that you will have to use the lab computers
for the midterm and final exam, so be sure you
can run it from a USB drive before the midterm.
Downloading MySQL, Part I
Go to CS122 web page and follow the links to MySQL site
Get MySQL Community Server
mysql-5.5.x has the MySQL database client and server
programs
Get the .zip files (not the MSIs) for your OS (Windows vs.
OSX) and processor (32 vs 64 bit).
The files are labelled in a way that may confuse you into
downloading the source code, which you don’t need. Be careful
to get the binaries instead. MySQL 5.5.8-win32, for example, is
132 MB. The 27 MB file is the source code.
Downloading MySQL, Part II
Extract the zip files; you will have two directories
You may also want to use the MySQL Workbench, which is a
GUI tool for working with MySQL. However, Workbench only
works with the 32 bit version and is buggy in any case. Please
don’t ask me to help you with it until at least week 3, after
everyone is working smoothly with the main MySQL software.
mysql-workbench-gpl-5.2.x.... is the MySQL GUI Tools
Using MySQL In Windows
This process should only be slightly different in OSX
Open up a Windows command line console
Use the cd command to navigate to the
mysql-5.x.xx-xx/bin directory
• If you add this directory to your PATH, you won’t have to
navigate there every time. However, you *won’t* be able to add
anything to the PATH on the lab computers.
Type in the following to start the database server:
start mysqld
Then type in the following to start the database client:
mysql –u root
Some MySQL Commands
Once MySQL has started and you see the mysql
prompt:
At mysql> prompt type in: show databases;
At mysql> prompt type in: create database lyric;
At mysql> prompt type in: use lyric;
At mysql> prompt type in: show tables;
You shouldn’t see any yet
Adding Data to a Database
Now that the database is selected, let's load a
database script
Download lyric.sql from the course webpage
At mysql> prompt type in: source [path] lyric.sql;
Where [path] stands for the path to the location where you
saved lyric.sql.
If you put lyric.sql in mysql’s bin directory, all you will
have to type is source lyric.sql
You should see a bunch of messages like this:
Query OK, 1 rows affected (0.01 sec).
Verify that the database is set up
To check whether everything has worked correctly, type
SELECT * FROM Salespeople;
The output should look like this:
+---------+-----------+----------+----------+--------+------------+
| SalesID | FirstName | LastName | Initials | Base | Supervisor |
+---------+-----------+----------+----------+--------+------------+
|
1 | Bob
| Bentley | bbb
| 100.00 |
4|
|
2 | Lisa
| Williams | lmw
| 300.00 |
4|
|
3 | Clint
| Sanchez | cls
| 100.00 |
1|
|
4 | Scott
| Bull
| NULL | NULL |
| sjb
+---------+-----------+----------+----------+--------+------------+
4 rows in set (0.39 sec)
Using MySQL in the lab
If you will be using your own laptop
in the lab, bring it to the next class
meeting
If you will be using MySQL on a lab
computer, *bring a USB drive to the
lab* on Wednesday