Introduction to Databases and MySQL
Table of Contents
Audience and Pre-Requisites
This tutorial covers the basics of databases and MySQL. The
pre-requisites are access to a MySQL database.
Introduction
One of my summer jobs during high school was working for a company called
Plastronics in St. Joseph, MI. They made packaging for all kinds of
different products. The founder of the company, George, was the first
manufacturer of the styrofoam peanut.
He was also my father's golf buddy (which is how I acquired this
stimulating summer position).
Usually, I worked in the factory cutting up styrofoam with big frames of
hot wires, but one day, I was asked to come and work in the Office. I was
given the task of sorting about 200 order forms, and typing the various
pieces of information on those forms into a paper grid on a typewriter.
Now, computers
did exist at that time, but Plastronics did not
have one. So, all order processing was done manually.
A computer would have saved me much time. But then, I didn't really
mind typing when the alternative was burning my fingers on hot wires.
Nonetheless, what I needed was a
database system. A database is
a collection of
structured information; like the 200 order forms,
all with the same information. Contrast this to unstructured information,
like the text of a book.
A
database system would have allowed me to enter my information
into a grid, but it would have been stored in a computer, rather than
on paper. Once stored, I could sort the orders, organize and manipulate
the data anyway I like, print various reports based on conditions such
as: "print the name, phone number and order size for all orders where
total_amount > $1000000". This would represent our biggest customers,
and potential new golf buddies for George and my father.
In database terminology, the grid is called a
table.
The information corresponding to one order (or row in the grid)
would be stored in a
record. The individual data elements,
for example, name, phone number, order size (stored in the columns)
are called
fields.
Most databases consist of more than one table. In fact, it is the norm
to have multiple tables in order to, among other things, avoid storing
redundant data. For example, maybe several of the 200 Plastronics orders
are all from the same customer. It's inefficient to store the name,
address, and other contact information multiple times, as shown in the
following with Fred Yo:
One of the things we most want to avoid in designing our databases is
redundant data. Here is a more efficient structure:
Notice each table has a field that contains a distinct identifier for
each record. These are called
primary keys. A primary key is
a field or combination of fields that uniquely identify a record in a
table, so that an individual record can be located without confusion.
Such keys make it possible to relate the tables to one another, so
we can, for example, find the customer information given a key value
in the order table. A
foreign key is a key used to link two
tables together. Typically you take the primary key field from one table
(customer) and insert it into the other table (order) where it becomes a
foreign key (it remains a primary key in the original (customer) table).
This type of structure is called a
relational database: we have
multiple tables that connect to one another via keys.
How Not to Design a Database
When we talk about database design, we mean the organization of the
fields in each table, and how keys are used to relate the tables.
It's surprisingly easy to create a bad database design. Watch me do it
right now.
Let's go back to that summer at Plastronics, only this time George
gives me a computer and a database system to organize his orders. This
was long before I knew anything about computers, databases or programming.
So, my first pass would probably mimic the paper grid. We saw earlier
that this was a bad idea because of the redundant customer information.
We created a customer table and related it via a customer number.
The following diagram shows the actual contents of the order grid,
once we have broken out the customer information.
This second design has some serious problems too.
Practice: Take a moment and see if you can fix the problems in
this table by
normalizing, that is, coming up with a design that
is free of redundant data and that can be consistently and correctly
modified.
Here are some of the problems we need to fix:
- Wasted space: Some customers did not order a third item, so we're
wasting space on those records. This is not a big deal with four records,
but with 200 or
more, it can build up.
- Data insertion problems: What if a customer comes along and she wants
to order a fourth item? We don't want our order processing system to
limit our ability to make money! Maybe we can add another record for
that customer and just fill in one of the item fields. But that would
violate the primary key rule which says each record must have a distinct
identifier.
- Redundant data entry. What if the price of item #34 goes up
to $1200? We'd have to find each record with item 34 in one of its
three item columns, and change the price.
- Querying problems: What if I want to find all customers that
ordered item #34? I'd have to query all three item columns.
- Redundant data storage: If 6 customers order item #34, we have to
enter both the item and the price each time.
- Inconsistent data: Notice that there are conflicting prices for item
#21? Should it be $740 or $750? This happens when one record is updated
and another isn't.
We can fix some of the problems by removing the recurring items from
the table, that is, item, count and price.
This is better, but we still have the problem of inconsistent prices
($740 vs. $750 for item #21), and having to change a price increase
or decrease in multiple places. We can try breaking out the pricing
information into its own table:
The design process we just illustrated is called
data
normalization. As mentioned earlier, this is the process we go
through to develop a design that is free of redundant data and that can
be consistently and correctly modified. It's important to go through the
iterative steps involved in data normalization, or at some point later,
when it is much more difficult to fix, you might find you have problems
keeping tables in sync.
Other Important Concepts
As one goes through the data normalization process, there are some rules
to keep in mind regarding primary and foreign keys. Recall that primary
keys uniquely identify a record in a table, and foreign keys are used
to relate tables. Here are the rules regarding primary keys:
- Every record must have a value for a primary key. It cannot be empty or null .
- Primary keys must be unique.
- Primary keys should not be reused when the record is deleted. A
primary key should not be reassigned to another record because it will
cause errors with
foreign keys.
Here are the rules regarding foreign keys:
- Since the foreign key in a linked table is a primary key in the
original table, the primary key rules apply in the original table.
- The data in a foreign key needs to match the primary key in
some record of the original table. This is called referential
integrity and assures there are no broken links between tables.
Note that if you delete a record in the original table, for example,
you delete one of the customers from the customer table, referential
integrity requires that you delete all references to that customer in
the order table.
Most database systems allow you to establish
constraints which
make it impossible to break the rules defined above. For example, in
MySQL when you create a new table, you can define right in the CREATE
statement that a particular field is a primary key. Then in the course
of using that table, an error will occur when you try to INSERT or UPDATE
a record that causes a primary key violation, say if you tried to INSERT
a record with a duplicate value in the primary key field.
What is MySQL?
MySQL is an open-source database system with which we can do the following things:
- design the structure of the tables (called schema) and how they relate to one another
- add, edit and delete data
- sort and manipulate data
- query the database (that is, ask questions about the data)
- produce listings based on queries
To interact with MySQL, we enter commands on a command line.
These commands, such as CREATE, INSERT, UPDATE, etc. are based on a
more general language called
SQL (Structured Query Language).
SQL has a long and colorful history, starting at IBM in the 70's (based
on the research of E.F. Codd who developed the relational data model in
1970), and later through the work of a small company called Relational
Software, Inc. In 1979, this group produced the first commercially
available relational database system and implemented SQL as its query
language. They called the product Oracle.
Getting Started with MySQL
The first thing you need to do to start learning MySQL is get access.
We'll assume that you have access and can start up the command line
interface. It will look something like:
mysql -uroot -ppassword
You should see the following:
Welcome to the MySQL monitor. Commands end with ; or \q.
Your MySQL connection id is 2129621 to server version: 3.23.58
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql>
In the exercises that follow, we will create a database and a set of
tables for Plastronics order-processing. Then, we'll learn the basic
commands for entering, editing and manipulating orders, customers, etc.
We will spend most of our time learning how to query these tables,
since that's the most interesting part.
If you have worked with other SQL systems or with MySQL in other
environments, the following articles and documents may be helpful:
Setting Up the Tables
Step one is to create a database which is a container for our tables.
Enter the following command:
create database plastronics;
Note that the commands and names are not case-sensitive. Also, note
that the ending semi-colon is required or you'll see something like this:
mysql> create database plastronics
->
If this happens to you, just type the semi-colon on the line with the
"->" and press Enter.
mysql> create database plastronics
-> ;
Query OK, 1 row affected (0.00 sec)
mysql>
The "Query OK" is your signal that the command was accepted and the
task performed. Creating a database does not select it for use; you
must do that explicitly. Enter the following command:
use plastronics;
The system will respond "Database changed". Now, we can work with
the database. Let's take a minute to review the final structure we
designed for the order processing database.
We need to set up four tables, that relate to one another as defined
in the structure above. We use the CREATE TABLE command which allows
us to specify the fields and their contents, as well as primary keys
and constraints. Here is the command to create the customer table:
create table customer (cust_no int not null auto_increment, name varchar(40),
address varchar(40), phone varchar(12), primary key(cust_no));
After the "create table" part of the command, we name our table
"customer". Then, in parentheses, we define the fields. The minimum
information required is a fieldname, and a
type, indicating what
kind of data we want to store in the field.
For cust_no, we want an integer (whole number), which is abbreviated
"int" in MySQL. Then, we indicate that cust_no cannot be null (which
means empty) and we would like the system to fill the value in for us
by auto-incrementing from the previous value, every time we insert a
new record.
The other fields are of type "varchar" which means variable-length
strings, or sequences of characters. The numbers following "varchar"
in the command above indicate maximum length for the data stored in
the field. So, the name, address and phone fields are all sequences of
characters with maximum lengths 40, 40 and 12.
Finally, we set cust_no to be the primary key.
You can find more information on types and how to use them in the
MySQL documentation.
For more details on the CREATE TABLE command, check the MySQL
reference.
Here is the command to create the orders tables:
create table orders (order_no int not null auto_increment, FK_cust_no int not null,
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no));
There are only two fields in this table. The order_no field is a primary
key, and is an integer, not null and it will auto increment. The cust_no
field is a foreign key. We have named it FK_cust_no in the orders table
to distinguish it from the cust_no field in the customer table.
Recall that a foreign key is a field that references a primary key in
another table. In the command, we indicate that the FK_cust_no field
is a foreign key referencing the cust_no field in customer, indicated
by the "foreign key(FK_cust_no) references customer(cust_no)" part
of the command. By setting the table up this way, MySQL will enforce
constraints, that is, any order that we enter into the orders table must
reference a valid customer in the customer table. If we enter a cust_no
in orders that does not exist in customers, an error will result.
Note: If you want MySQL to enforce foreign key constraints, you need to
add "type=InnoDB" at the end of the CREATE TABLE statement as in:
create table orders (order_no int not null auto_increment, FK_cust_no int not null,
foreign key(FK_cust_no) references customer(cust_no), primary key(order_no)) type=InnoDB;
"type=InnoDB" may or may not be available in your MySQL installation
(Note: in some distribution, to avoid "ERROR 1005", you have to add
"type=InnoDB" on both table creation statement, customer
and
orders). Support for InnoDB tables requires a specific compilation
parameter when compiling MySQL from source.
Let's see what we have so far. Enter the following command:
show tables;
Then enter the following command to look at the structure of the orders table:
describe orders;
Here is the command to create the items table:
create table items (item_no int not null auto_increment, price int, primary key(item_no));
Practice: Take a couple minutes to make sure you understand all the parts of the command for creating the items table. Using the CREATE TABLE command,
create the fourth table, which has two foreign key fields and the count. Call this table "item_details".
Here is the command for creating item_details:
create table item_details (FK_order_no int not null, foreign key(FK_order_no)
references orders(order_no), FK_item_no int not null, foreign key(FK_item_no)
references items(item_no), count int);
If your table does not match the structure defined in this command,
just delete it by entering:
drop table item_details;
Then, enter the command given above for creating item_details.
Entering and Updating Data
Now that we have our tables created, we need to populate them with data.
Enter the following commands. Just copy the whole block and paste at
your MySQL command prompt.
insert into customer set name="Joe Boo", address="123 West", phone="412-773-5322";
insert into customer set name="Rich Wrack", address="332 East", phone="412-773-8374";
insert into customer set name="Ken Bend", address="225 Main", phone="412-773-9822";
insert into customer set name="Kim Slim", address="415 Bent", phone="412-773-6721";
insert into customer set name="Tom Plom", address="633 North", phone="412-773-4156";
select * from customer;
The INSERT command sets the fields to the corresponding values.
The SELECT command with a "*" outputs all the data. Notice how the
cust_no field auto incremented. Next, we populate the items table:
insert into items set price=666;
insert into items set price=700;
insert into items set price=450;
insert into items set price=1200;
select * from items;
Now, the orders table:
insert into orders set FK_cust_no=4;
insert into orders set FK_cust_no=3;
insert into orders set FK_cust_no=4;
insert into orders set FK_cust_no=1;
insert into orders set FK_cust_no=2;
insert into orders set FK_cust_no=1;
insert into orders set FK_cust_no=2;
insert into orders set FK_cust_no=3;
insert into orders set FK_cust_no=4;
insert into orders set FK_cust_no=5;
select * from orders;
Finally, the item_details table:
insert into item_details set FK_order_no=1, FK_item_no=4, count=12;
insert into item_details set FK_order_no=1, FK_item_no=3, count=56;
insert into item_details set FK_order_no=2, FK_item_no=1, count=10;
insert into item_details set FK_order_no=3, FK_item_no=2, count=43;
insert into item_details set FK_order_no=3, FK_item_no=4, count=16;
insert into item_details set FK_order_no=4, FK_item_no=2, count=87;
insert into item_details set FK_order_no=5, FK_item_no=1, count=62;
insert into item_details set FK_order_no=5, FK_item_no=2, count=48;
insert into item_details set FK_order_no=5, FK_item_no=3, count=5;
insert into item_details set FK_order_no=6, FK_item_no=3, count=87;
insert into item_details set FK_order_no=7, FK_item_no=2, count=32;
insert into item_details set FK_order_no=7, FK_item_no=1, count=27;
insert into item_details set FK_order_no=8, FK_item_no=4, count=91;
insert into item_details set FK_order_no=9, FK_item_no=2, count=34;
insert into item_details set FK_order_no=9, FK_item_no=3, count=72;
insert into item_details set FK_order_no=10, FK_item_no=4, count=2;
select * from item_details;
If you need to edit a record, MySQL provides an UPDATE command:
update item_details set count=12 where FK_order_no=7 and FK_item_no=2;
select * from item_details where FK_order_no=7 and FK_item_no=2;
Notice how we can define the exact record for both UPDATE and SELECT using
the WHERE clause. We have also used the AND connector. UPDATE can also
be used to edit a group of records. For example, we could set count=12
for both of the items in order #7:
update item_details set count=12 where FK_order_no=7;
select * from item_details where FK_order_no=7;
We can also DELETE in a similar manner:
delete from item_details where FK_order_no=7;
select * from item_details where FK_order_no=7;
Let's put those records back in now:
insert into item_details set FK_order_no=7, FK_item_no=2, count=32;
insert into item_details set FK_order_no=7, FK_item_no=1, count=27;
For more information on these MySQL commands, check the
MySQL Documentation.
Practice: See if MySQL checks for primary key constraints by
trying to insert a new record in a table with a primary key that has
the same primary key value as a record already in the table. Note that
foreign key constraints are only checked if you created the tables using
"type=InnoDB".
Doing Queries
We have already seen the use of the SELECT command. This is the command
we use to query tables. We can see all the fields in a table using "*"
or we can specify certain fields. We can also limit the output using
the WHERE clause. Here are some examples:
Practice: See if you can create SELECT statements to find the following data.
- All orders that include item #4 (only list the order numbers once). Answer: orders #1, 3, 8, 10.
- All orders for customer #2. Answer: orders #5 and 7
- Try entering a query that returns no results. How does MySQL communicate this?
We can also search within varchar fields using LIKE and the "%" wildcard:
select * from customer where address like "%West";
select * from customer where name like "%o%";
This last command outputs all records where there is an "o" anywhere
in the name field. Note that using LIKE is
very slow on large
databases - this is not something you would do on the Ads database.
The real power of the SELECT command becomes evident when we
join
tables using foreign key relationships. In order to illustrate how this
works, we need to add some records to the orders table:
insert into orders set FK_cust_no=10;
insert into orders set FK_cust_no=11;
Notice that we have entered orders for customers that do not exist in the customer table.
Take a look at the following command:
In this SELECT, we have combined the orders and customer tables using the
customer number. This is called an
inner join. We are listing
fields from both tables, and the ON clause makes the connection between
the primary and foreign key. Notice that the records are listed in
customer number order. Thus, an inner join takes all the records from
the orders table and finds the matching record(s) in the customer table.
Another type of join is the
left join. It returns all the
values from left table (orders) and the matched values from right table
(customers). In this join, if there are records in the left table that
do not match records in the right table (orders with customers #10 and
11), then MySQL will output records for these orders with NULL values
for the customer fields.
Practice: See if you can create joins for the following:
- List the price field with the item_details fields.
- List the price field with the item_details fields, but ordered by
order number, not item number.
Another thing we can do with SELECT commands is GROUP BY. We can also
use mathematical functions on numeric fields. Try entering the following
commands and see if you can figure out what GROUP BY does:
select * from item_details;
select FK_order_no, sum(count) from item_details group by FK_order_no;
We are summing the count fields by order number. So, we can use the
GROUP BY clause to group together the values in a given column. We can
then use the HAVING clause to exclude groups.
select FK_order_no, sum(count) from item_details group by FK_order_no having sum(count) > 100;
This outputs only the rows where the sum of the counts is greater than 100.
For more on the use of mathematical functions in MySQL, click
here. For more on
the use of GROUP BY and HAVING, click
here.
Practice: Can you list the total amount for each order as well
as the sum of the item counts? This requires a join along with GROUP BY.
Here is the solution to the practice exercise. It's important to
understand how this works - it illustrates some of MySQL's most powerful
features.
Practice: See if you can design a SELECT statement that outputs
the three columns above, along with the customer name for each order.
Hint: It is possible to have successive joins in a single SELECT statement
as in:
select ... from item_details left join items on FK_item_no=item_no left join orders
on FK_order_no=order_no ...
A common need in working with the Ads database system is doing queries
on dates. We can add a date field to our orders table to indicate the
date the order was
placed:
alter table orders add column order_date date;
Then we will populate it like this. Note the date format required for
a date type in MySQL
update orders set order_date="2006/11/01";
update orders set order_date="2005/11/01" where order_no < 6;
To do queries, we use comparison operators as in:
select * from orders where order_date < "2006/06/01";
For more information on the SELECT command, check the
MySQL Documentation.
Before we leave this section, we should mention that there are other
ways of setting up tables. Instead of marking foreign keys with an FK
prefix as we did above, it is also common to name the keys the same.
For example,
create table customer2 (cust_no int not null auto_increment, name varchar(40),
address varchar(40), phone varchar(12), primary key(cust_no));
create table orders2 (order_no int not null auto_increment, cust_no int not null,
foreign key(cust_no) references customer(cust_no), primary key(order_no)) type=InnoDB;
Then, when you do selects, you need to identify the table name with
cust_no, so MySQL knows to which one you are referring.
select * from orders2 inner join customer2 on customer2.cust_no = orders2.cust_no;
Transactions in MySQL
A
transaction is a sequence of individual database operations
that are grouped together. A transaction will never be complete unless
each individual
operation within the group is successful. If any operation within the
transaction fails, the entire transaction will fail. Transactions are a
relatively new addition to MySQL but not to relational database systems
in general.
A good example where transactions are useful is in banking. Say you
want to transfer $100 between two accounts. In order to deposit money
into one account, you must first take money from another account. Without
using transactions, you would have to do the following MySQL operations:
- Check that the balance of the first account is greater than $100.
- Deduct $100 from the first account.
- Add $100 to the second account.
If we think of this sequence as a transaction, then if any one operation
fails, the whole transaction fails and we
rollback, that is, the
tables and the data inside them revert to their previous state. If the
transaction is successful, we
commit the changes to the database.
This is much easier than dealing with possible errors between each step.
For example, without transactions we need to skip steps #2 and #3
if the balance is less than $100; we need to skip step #3 if for some
reason, we were unable to deduct $100 from the first account; and so on.
Transactions can simplify the processing.
Transactions have the following four properties, usually referred to by the acronym ACID:
- Atomicity: An atom is meant to be something that cannot be divided. The operations that make up a transaction must either all be carried out, or none at all
(as with our banking example).
- Consistency: The database changes state upon a successfully committed transaction.
- Isolation: Data being used for one transaction cannot be used by another transaction until the first transaction is complete. This enables transactions to
operate independently of and transparent to each other.
- Durability: This ensures that the result or effect of a committed transaction persists in case of a system failure.
To use transactions in MySQL, you must use a transaction-safe table type. The default MySQL table type, MyISAM, does not support transactions. BerkeleyDB and
InnoDB are the transaction-safe table types available in open source MySQL, version 3.23.34 and greater.
Support for InnoDB tables requires a specific compilation parameter when compiling MySQL from source. If your MySQL installation supports InnoDB tables, simply add
a "TYPE=InnoDB" definition to the table creation statement. You can check if you have InnoDB support by entering the following command:
show variables like 'have_innodb';
In MySQL, transactions begin with the statement BEGIN WORK or START TRANSACTION and end with either a COMMIT or a ROLLBACK statement. The SQL commands between the
beginning and ending statements form the operations of the transaction. Going back to the banking example, the following statements:
update account set balance = balance - 100 where acctnumber = 1;
update account set balance = balance + 100 where acctnumber = 2;
are written as transactions like this:
start transaction;
update account set balance = balance - 100 where acctnumber = 1;
update account set balance = balance + 100 where acctnumber = 2;
commit;
The updates are done in one transaction so that both must complete successfully before the changes are committed to the database. If either update fails, you can
issue a rollback statement to undo the changes. For more information on MySQL transactions, refer to the
MySQL Documentation.
Finishing Up
When you are ready to leave MySQL, just type "quit". When you come back, just remember that you have to select your database before you can access the tables as
in "use plastronics;". You won't need to create the database or the tables again - everything will be there when you return.
One other quick tip: In many MySQL interfaces, you can use the up-arrow key to access a list of commands that you have just entered. This can save a lot of time
if you are experimenting with queries.
Additional Examples and Exercises
The following websites have MySQL tutorials, if you need more practice or want to get into more of the details.
MySQL Tutorial
Database Journal MySQL Series
References
MySQL Documentation