SQL Notes

What is a Relational Database?

It is a organized, self-describing collection of data, based upon the theory of relational algebra. A DBMS (database management system) is software that provides a general solution to the constantly recurring problem of organizing storage and retrieval of data. Before there were DBMS's, every inventory, payroll, liabrary catalogue or collection of expermimental data  would require numerous special purpose programs to create and maintain files of particular data.

Any particular (relational) database consists of a set of tables (rows and columns, with each entry holding a single datum. Each column holds entries of a specific data type and logical meaning. For example, the table for a personal phone book may have columns for name and phone number, with each row giving this information for one person. The database will also have a schema, a description of the table, with the name and type of each column, as well as constraints on the data.

In retrieving data,  information from several tables may be combined (joined)  based upon correspondence of data items. Particular results may be selected, and ordered as desired. The result of any query is again a table of rows and columns.
Any DBMS provides the following services:

  1. Storage of data, and protection against loss.
  2. Transaction management. Many users may access a database concurrently, without fear of interference. Each transaction is considered atomic, that is it must totally succeed and all changes safely stored, or else roll back, leaving the database unchanged.
  3. A standard, high-level query language, SQL, for defining tables, modifying data therein, and querying (retrieving) data.
  4. Flexibility: the structure of a table can be modified, (by adding a column, for instance) even after it contains a lot of data.
  5. An interface to programming languages, such as PHP.

What is SQL?

SQL is the Standard Query Language for relational databases. It is understood by any relational DBMS, be it PostgreSQL, MySql, Oracle, or any other. Thus, any programming involving a database can be easily switched to use a different DBMS, if the need arises. SQL is described in any Database book, of which there are many in the labrary, and several also in room J-9. Documentation and examples also abound on the web, in particular on the websites of software providers, such as mysql.com and postgresql.org.
SQL has statements for data definition(CREATE TABLE, ALTER TABLE), data modification (INSERT, UPDATE, DELETE) and data retrieval, or query (SELECT, COPY).
The language is case insensitive, both in regard to keywords and names of objects (tables, columns, etc.). That is, SELECT can also be written select or even SeLeCT. There are a number of keywords, in what follows I wll attempt to emphasize them by using UPPERCASE, and leave example names in lowercase. However, this is not necessary in typing queries. (It is possible to force names to be case sensitive, and include spaces, by enclosing in "double Quotes", but I recommend against doing so. Beware, various "admin" type programs will do this "service" silently for you.)
Data values are either numbers, boolean (true/false), or else character strings, which must be enclosed in 'single quotes'. (Date and time types are input as strings, then stored specially.)

The value NULL

SQL has a special value, NULL, which stands for unknown, or "Does not apply." This would be the appropriate value of "dogs_name" for a person who does not have a dog, or whose dog's name is unknown to us. Comparisons with NULL always give the boolean value unknown - neither true nor false. Such a person would not appear in a list of people with dogs named 'Fido' nor a list of people with dogs not named 'Fido' (ie. known to have a dog with a different name.)
null=null is itself unknown. You must test for a null with is null instead.

Note hovever that PHP does not normally make a distinction between an SQL null, and a string with no characters ( a dog with the unspeakable name of  "" perhaps.) If NULL information needs to be maintained, special care must be taken, by calling pg_field_is_null.

Data Definition

CREATE TABLE

The create table statement defines a new table, by giving a list of the columns. Give each column a name, a type, and possibly other information, such as constraints or a default value. These can be added or changed later, if needed.

  1. name: How you will refer to the column. Must start with a letter, may include underscore (_). Make it meaningful.
  2. Type: Common types are: INTEGER, CHARACTER(), CHARACTER VARYING(), TEXT, BOOLEAN, REAL, NUMERIC, DATE, TIME, TIMESTAMP (complete date+time). Put a maximum length inside the (). The first 3 may be abbreviated to INT, CHAR, VARCHAR.
  3. DEFAULT is optional, and is stored if no value is given. Otherwise, if no value is given, NULL will be stored.
  4. Constraints. There are a number different rules. PRIMARY KEY means that this column must be unique in the table, so it can be used to identify a single row. CHECK gives a condition on acceptable values. NOT NULL forbids null values. REFERENCES means that the value must exist in another table.

For example, Bishop's students have attributes of name, birthdate, major program, email address, and box number. Since the University assigns a unique email address to each student, this is a candidate for primary key. Students may participate in a number of sports. Since they are not limited to one sport, this will require an additional table, with the email address refrencing the student.

CREATE TABLE students (
name VARCHAR(50) NOT NULL,
birthdate DATE,
major CHAR(6),
email VARCHAR(20) PRIMARY KEY,
box INT );

CREATE TABLE sports (
student VARCHAR(20) NOT NULL
REFERENCES students(email) ON DELETE CASCADE,
sport VARCHAR(30) NOT NULL );

So, any value for sports.student must alroady exist in students.email. CASCADE says that should a student be deleted from the students table, then the rows in sports for that student should also be deleted. Should an email need to be changed, and the student is in the sports table, the change would not be allowed, by default. However, we do not forsee this happening.

Data Modification

INSERT

Add a new row to a table. If the list of columns is omitted, the values must be arranged in the order given in the create table statement. If a value is not given, NULL is stored. We forgot to ask John for his birthday. He wants to play rugby.

INSERT INTO students (name, email, box, major) 
VALUES ('John Doe', 'jdoe09', 1234, 'MAJCSC');
INSERT INTO sports VALUES ('jdoe09', 'rugby');

UPDATE

Change the values in an existing row. Caution: Unless you use a WHERE clause, with the primary key, you may be updating more rows than you think!
Example: We now know John's birthday, and need to correct his box number.
UPDATE students SET birthdate = '1984-12-25', box=1336 
WHERE email = 'jdoe09' ;

DELETE

Remove rows from a table. Caution! This is not limited to one row, and can be used to emty an entire table. John has left the rugby team (but not any other team he may be on):

DELETE FROM sports WHERE student = 'jdoe09' AND sport = 'rugby' ;

SELECT

SELECT performs a query and returns a result, which is always a table (with rows and columns). A program will have to loop through the rows, and deal with each one as an array. The statement has 3 main clauses, and a number of optional ones. They are:

SELECT list of columns, or * for all
FROM table(s)
WHERE condition(s)
GROUP BY -- creates aggrates: count, sum, avg can appear in SELECT
HAVING -- condition on the grouping
ORDER BY list of cloumns
LIMIT count
OFFSET start

Examples: We want to get the names of all the computer science students, both majors and honours.
We want to get contact information for members of the rugby team, from youngest to oldest (earliest birthdate):

SELECT name FROM students 
WHERE major = 'MAJCSC' OR major = 'HONCSC' ;

SELECT name, birthdate, email, box
FROM students, sports
WHERE email = student -- important! join condition
AND sport = 'rugby'
ORDER BY birthdate ASC;


Now, see how to incorporate SQL into a PHP program