Notes taken from Brian Green’s excellent video on normalization on youtube : https://www.youtube.com/watch?v=QqlPXKxN6LQ
Normalisation is a process of removing anomalies from a database.
These anomalies include: insertion, update and deletion anomalies.
Are when you can’t enter a record because something is missing.
In the above, imagine an employee joining a company, there would likely be no project for that person, therefore, they can’t be put into the database. This can happen in one-to-one relationships.
This occurs when you want to change something in the database and you have to update several fields in several tables.
In the above table, if we wanted to change the project name DB245 to a different name, we have to update several rows = not good, as this can cause the introduction of errors into the database.
Occurs when deleting a record automatically loses other data.
Once again, using the above table, if we delete the row with Lisa Sanderson in it, we will lose the ProjectName DB124. This isn’t good because another employee might be doing this project next.
This table is not normalised
1st normal form
Look at tracks column. 1st NF involves getting rid of all repeating groups and arrays. Basically any time that we’re listing anything horizontally, this should not happen and is not acceptable in database design.
- lists of things separated by a comma
- or enumarated fields like name1, name2, name3
1st NF says each column should only contain one data type.
So we can create a temporary table as follows.
Tracks are no longer listed as before and each is separated into individual rows. Now we can see a lot of redundacy in the data.
2nd Normal Form removes functional dependencies
‘Functional dependencies’ are groups of columns that depend on each other rather than the key of the table.
3rd normal form is about removing transient dependencies
‘Transient dependencies’ are when a field depends more on another column for its meaning than the table key.
e.g. an ArtistCountry does not depend on the track, it depends on the artists itself.
An ERD to 3NF would look like this.
When to have a one to many relationship, you will take the primary key from the ‘one side’ and make a foreign key (on the many side).
Example 1NF to 3NF
1NF problem: phone numbers going horizontally. The same data type is repeating (the phone number).
This could be consolidating into 1 column, but now there’s redundancy.
In the above though, we do now have 1NF.
Every phone number in the table below is dependent on it’s primary key, therefore there are no function dependencies.
We’ve eliminated repeating sets on the horizontal direction, and the repeating sets that occurred in the vertical direction.
This complies to 2NF
Now take a look at the table, and ask, “is there anything in that table that is not dependent on the person itself”.
Yes. The address, city and state are a hidden entity. That is an entity itself. It’s not wholely dependent on the ID.
You could have multiple people living at the same address. There is a transient dependency that we have to fix. To fix this, we take the address and move it to a new table.
The above now complies to 3NF.
However, we can keep doing this. If we look at the address table, we can now see the ‘State’ has repeating values. We could keep going…. i.e.
If you keep doing this, you could end up with hundreds of tables. The more you normalise a database, the long it takes to put information in and get information out.
Denormalization is stepping back a little, and thinking about whether something really needs to be normalized or not (like the example above), knowing that another JOIN will have to take place when getting data out. Denormalization is usually done for performance reasons/ and or simplicity of working with the tables. It’s the process of recombining tables that were split in the normalization process.
Final design mostly in 3NF