Database Design: Normalization

Database Design: Normalization

Producing a database can result in tables that do not behave very well. For example, you might end up storing the same information in two or more different places. Or you might end up deleting useful information when other information is deleted. There is a process of removing these problems known as ‘normalize This sounds very technical, and the definitions are cumbersome. However, the basic idea when you understand it is very straightforward

The ideas of normalization were developed by Codd in the 1970s. The principles have been around for much longer, but he formalized them and gave them a definition. There are lots of ‘normal forms’ We shall only consider the first three, which are the most common and the most useful.

FIRST NORMAL FORM

A table is in first normal form if it has a fixed number of columns and if it has fields that are simple types such as integer, date, or string. Though that sounds straightforward enough, entities in the real world are much more complex than that. As we have seen, an order might have lots of repeating information. Thus, to represent it, we might have to split the repeating information into separate tables and link the tables with keys. Further, if an attribute is a complicated structure, such as another object, then we will have to split that out into another object too.

SECOND NORMAL FORM

The second normal form aims at simplifying tables in such a way that data is not stored more than is necessary, and that potentially useful data is not lost if records are deleted Consider the table

registration(student Number, student Name, course number, course name) The fields student Number and course lumber are together the keys on this table, so they are underlined. This is a perfectly reasonable table, as you can see who is registered for what course. However, it does present potential problems. Firstly, student names will be repeated for every course that she is registered on

This wastes space, and if the student’s name has to be changed, then all of the relevant registration records need to be changed. The same applies to courseName too Secondly, if the only place that student Name and courseName are recorded is in the registration record, then if a particular student is not registered for any courses, there is no record anywhere of her name, and if no student is registered on the course there is no record anywhere of the course.

We can remove these problems by ‘normalizing the table The definition for the second normal form says that ‘every nonkey attribute must be fully dependent on the key. That means that every bit of the key is needed to work out each attribute. If only part of the key is needed to determine the value of the attribute then it is not in second normal form and we need to split the table up. Now if we look at the student Name, we see that it depends only on the student Number, and not on both the student Number and course number Therefore student Name is not fully dependent on the key – it depends on part of it. So we take sstudent’sName out of the record and put it in a separate table with the part of the key that determines it. registration (student Number, course number, course name) student (student Number, student Name)

This is a better structure for three reasons. Firstly you can now keep information about a student when the student has not been registered for a course. If a student is no longer registered you do not lose the student’s name. And finally, the student name is stored in only one place, so that it is easier to change the student name (you only have to change the one record) and there is less to store. If we continue, we see that the course name depends on the course number, and so again is not fully dependent on the key. We do the same thing and split out course name, and get some similar benefits for course information registration (student Number, course number) student (student Number, student Name)course namee)

The full definition is a bit more complicated than this, as there may be several candidate keys for a database table. However, this is an introduction, and the full intricacies are left for further study. Normalization thus increases the number of tables. However, in practice, it normally reduces the amount of information stored in the tables themselves, and therefore the amount of space taken up on the disk 1253 THIRD NORMAL FORM sounds even more complicated when you define it, but it is very similar. As before, it is trying to remove the need to store information unnec

A field is said to be transitively dependent on the key if it is dependent on a non-key attribute that is dependent on the key. To see what this means, we need to consider an example Consider the following table

order (order number, order date, customer number, customer name) Here customer name depends on the key order Number, but that is because the customer name depends on the customer Number. This is called a ‘transitive depend on ency’ So what we do is split out the customer name into another table, giving us no tables as follows. order (order Number. order date, customer number) customer (customer Number, customer name)

Now we have the advantage that we can store information about customers with no current orders, and we only store the customer name in one place. 1154 CONCLUSION ON NORMALIZATION Normalization is done to improve the structure of the database. It is appropriate

to do this at the design stage, as it is sometimes useful to feed this decomposition back up to the class level Primarily it achieves the following

1. It prevents storing the same information twice, by repeating the same information over and over again in a table or multiple tables 2. It makes updating easier because the information is usually stored in only one place.

3. It prevents accidental deletion of data, such as removing vital customer info motion when removing order information

If there is a need to see the information in the unnormalized form, query languages will permit this. They will combine the normalized tables and present them back as unnormalized views. So SQL can reconstruct an unnormalized view of the data without changing the way it is stored.

The definitions of second and third normal forms of Metalwho look more complicated than they are. Once you have an intuitive grasp of the reasons for normalization and the steps to normalize, then it becomes automatic and straightforward. The Process of normalization is to split tables down until the repeated storage of information is at a minimum.

Share