What is a Database?

A relational database is the organization of data into logical, related groups of information. This data is separated into pieces called tables. A table keeps related information together. For example, a business database may have a product table, salesman table, and customer table. Each of these tables is comprised of a set of records. A record is all the necessary information about one item in the table, grouped together. For example, a record of a customer may include their first name, last name, address, phone number and a customer number. These pieces of information within a record are called fields. In a database, each field has a name and a type. So, their first name, last name, street address, city and town may be of the type string, while the zip, phone and customer number may be of the type int. Typically, one of the fields in each record is a unique value that no other record in the table will have. This field is known as the primary key and both expedites searching and reduces doubling of information.

Normalizing a Database

When planning a database there are several items that need to be considered: how many tables will be needed, how will they be related, what type of information needs to be stored, etc. Let’s start by planning a database for a business accounting department. Our accounting database will need the following information: invoice number, invoice date, invoice amount, invoice description, date invoiced, client information, expense amount, expense category & description, and expense date. Knowing what information is needed is the first, and most important step in creating a database. Next, we proceed to the process of normalization. Normalization is the process of organizing data by a series of rules. One of the first steps in normalization is to determine the primary key for the database. In other words, what is the one field that we know will be unique in this database? Primary keys must be unique to an entry. For this reason, when storing information about users, the primary key is often an email address. Other times, a unique, auto-incrementing field is created to serve as the primary id (e.g. a student id number).

We now proceed to putting our database into first normal form. The first normal form of a database ensures that each column, or field, will contain only one value. The fields that we listed earlier fail this test because client information will need names, address, city, state, zip and phone. And, the expense category and description will need to be separated. The first normal form, or 1NF process tells us to identify any fields that contain more than one piece of information, break those fields up into separate fields, and then check that our new fields pass the 1NF test.

Second normal form(2NF) deals with how the field relate to each other. The requirements for second normal form are that the data is in 1NF and that any fields in the table relate only to the primary key. In our example, since the primary key is the invoice number, the only things that relate directly to the primary key are the fields that deal with the invoice. So, we must now make new table accordingly so that each “group” of information stays together. Thus, we will create tables for invoice, customer and expense. This is a recursive process because as we generate more tables, we must make sure that each of them, in turn, are in 1NF and 2NF.

There does exist a third normal form, but it is out of the scope of this class.

Field Planning

Now that our database is normalized, there are a few more things that we need to do to prepare for implementation. We need to determine each table’s name, each fields name and each fields data type. I will start by naming all of my fields and tables. Typically, just like programming variables we must use letters, numbers and underscores, must start with a letter, and must be less than 64 characters. Typically, I start tables with capitals and keep them plural to show multiple records, while fields start with lowercase and each subsequent word is then capitalized. Also, any foreign or primary keys get marked with id at the end. You can use whatever naming conventions you want, as long as you stay consistent. Consistency will be key and help keep you sane when you begin tying your SQL into your PHP code. Once this has been accomplished, we need to assign field types for each of our fields. Here are the different values that we can use:
CHAR[length] a fixed-length field from 0 to 255 characters
VARCHAR(length) a fixed-length field from 0 to 255 characters
TINYTEXT a string with a max length of 255 characters
TEXT a string with a maximum length of 65,535 characters
MEDIUMTEXT a string with a max length of 16, 777, 215 characters
LONGTEXT a string with a max length of 4,294,967,295 characters
TINYINT[length] -128 -> 127
SMALLINT[length] -32,768 -> 32,767
MEDIUMINT[length] -8,388,608 à 8,388,607
INT[length] -2,147,483,648 à 2,147,483,647
BIGINT[length] -9,223,372,036,854,775,808 à 9,223,372,036,854,775,807
FLOAT a small number with a floating decimal point
DOUBLE a large number with a floating decimal point
DATE in the format of YYYY-MM-DD
DATETIME in the format of YYYY-MM-DD HH:MM:SS
TIMESTAMP in the format of YYYYMMDDHHMMSS; up to year 2037
TIME in the format of HH:MM:SS
Other values that can be used when defining field types are as follows: