In an effort to provide information for those of you who are
newcomers to databases, especially with the wonderul new Sql Server
Express, I have committed myself to about half a dozen posts or so
targeted directly at you to help you understand and build better
databases. Stay tuned to this blog for weekly updates to this
series, or subscribe to the rss feed.
Normalization: organizing data in your database is
an efficient way for both you and the database to use and
understand. What this really means is that normalization helps us
to elimate redundant data and to store the data we have in a manner
that makes sense. Last week we talked about ACID,
and you will recall the “atomicity” of a database. Atomicity also
must be maintained in tables via normalization. Normalization
helps us to create atomic tables by removing redundant data within rows
of a table, thus creating atomic tables.
Guidelines exist for normalizing data (referred to as normal forms)
and exists as 5 sets of rules, labled first normal form through fifth
normal form. In most practical applications, you rarely see 4th
or 5th normal forms (herein referred to as 1NF-5NF), so we’re going to
look at 1NF, 2NF and 3NF, along with denormalized data.
Denormalized Data
Let’s take a look at some denormalized data. I’m going to use
the simplest example that comes to mind, and that is a auto repair shop
that specializes in domestic automobiles (domestic to the USA, that
is). Our table lists automobile manufactuers and the makes of
automobiles they manufactuer that the shop services.
Fig. 1 shows us denormalized data. For GM, we didn’t have
enough columns to store all the data we wanted, so we put Buick in with
Saturn.
First Normal Form (1NF)
We want to take fig. 1 and normalize it a bit. Before we can
do that, we need to know what the rules of 1NF are. There
are 2 rules to 1NF.
- Remove duplicate columns.
- Each column by row position must have a unique value.
As you can see, 1NF is already enforcing the atomic nature of a
table. So first we need to remove duplicate columns from the
table, at the same time creating column by row positions that have
unique values, which means we also have to seperate Saturn, Buick into
individual values.
Fig. 2 meets the 2 rules of 1NF. We have normalized our Fig. 1 data into First Normal Form.
Second Normal Form (2NF)
For each level of normal form, the next level must adhere to the
rules of the previous level, in addition to adding its own rules.
This means in order to create a data structure that meets the
requirements of 2NF, we must first meet the requirements of 1NF.
We have already discussed the rules for 1NF, so let’s look at the rules
required to meet 2NF.
- Remove duplicate data in a single column and place the data in seperate tables.
- Create relationships between the sets of data.
In Fig. 2, if GM gets bought by another manufactuer, say Honda, then
we’d have to replace each row in the table with Honda everywhere we
find GM. That is a bit inefficient. By meeting the 1st rule
of 2NF, we can replace GM with Honda in our manufactuer table in Fig.
3, and this is must more efficient and a logical storage of data.
In order to meet the 2nd rule of 2NF, we must have keys and
relationships. The ManufactuerId in the manufactuer table is a
Primary Key. It is a unique identifier used to reference the
description for the manufactuer. We create a relationship beween
it and the Make table using Manufactuer column in the Make
table as the foreign key. Fig. 4 shows the database diagram for
this. Now we are starting to adhere to consistency of our ACID model. Fig. 3 satisfies all the rules of 1NF and 2NF. We have normalized our Fig.2 to Second Normal Form.
Third Normal Form (3NF)
Again, in order to meet the rules of 3NF, we must first meet the
rules of 1NF and 2NF. In addition to meeting these requirements,
3NF introduces 1 other rule:
- All columns that are not dependent on the primary key must be removed.
Sounds simple enough. First, lets take our Make table from
Fig. 3 and add a column to it that shows how much the auto repair shop
charges per hour for each different make, and also a column that shows
if the shop works on cars or trucks for each particular make.
Not let’s examine the data and see if they meet the requirements of
3NF. First, we must consider that Make is the primary key in this
table. It is a unique value that can be used as an
identifier. Ideally, that would be an integer column called
MakeId and Make would be a description column. We’ll look at that
in a moment. Knowing that Make is the primary key, are all
columns in the Make table dependent on the primary key?
Manufactuer: yes. The manufactuer depends on the make of the
automobile. Style? Yes, the style the shop works on is
dependent on the make of automobile. Charge? Hrm. If
we look closely, it appears as though the charge is dependent on the
style, not the make. This table does not meet 3NF. We must
remove the charge column and relate it to the Style, not to the
make. We do this by creating another table called charge.
Now we can remove charge from the Make table in Fig. 5, because they
style that is dependent on the make, gives us the charge. The
complete 3NF structure when all is said and done looks like Fig. 7
below.
Next topic in this series: Isolation