






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Normalization, Goals of normalization, Result of normalization, Levels of normalization, Purist approach of normalization, Student database system, Update Anomalies are discusses points of document. This handout is for data warehousing course from Virtual University of Pakistan.
Typology: Lecture notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







Virtual University of Pakistan
Before we begin our discussion about the normal forms, it's important to understand that what we will discuss are guidelines and guidelines only i.e. not a dejure standard. Sometimes (especially in a DSS environment), it becomes necessary to drift from this purist approach to meet practical business requirements of performance. However, when deviation take place, it's extremely important to evaluate any possible consequences these deviations would cause and the corresponding inconsistencies and anomalies this is what de-normalization is about.
What is normalization?
What are the goals of normalization?
Eliminate redundant data. Ensure data dependencies make sense.
What is the result of normalization?
What are the levels of normalization?
Always follow purist’s approach of normalization? NO
Virtual University of Pakistan
Normalization is the process of efficiently organizing data in a database by decomposing (splitting) a relational table into smaller tables by projection. There are basically two goals of normalization as follows:
Both of these are worthy goals, as they reduce the amount of space a database consumes, and ensure that data is logically stored and is in third normal form (3NF). The database community has developed a series of guidelines or benchmarks for ensuring that databases are indeed normalized. These goals are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). The first two normal forms are intermediate steps to achieve the goal of having all tables in 3NF. Note that in order to be correct, decomposition must be lossless i.e. the new tables can be recombined by a natural join to recreate the original table without creating any false or redundant data and without any loss of any data/information.
Consider a student database system to be developed for a multi-campus university, such that it specializes in one degree program at a campus i.e. BS, MS or PhD.
Virtual University of Pakistan
Table FIRST Figure-6.2: Part of example student database system in 1NF
Although the table is shown in 1NF i.e. it contains atomic values, there are no repeating values, there is no aggregation, yet it contains redundant data. For example, information about the student’s degree, and campus location is repeated for every course taken. Redundancy causes what are called update anomalies. Update anomalies are problems that arise when records are inserted, deleted, or updated in the database. For example, the following anomalies can occur:
Normalization: 1NF Update Anomalies
INSERT. Certain student with SID 5 got admission in a different campus (say) Karachi cannot be added until the student registers for a course.
DELETE. If student graduates and his/her corresponding record is deleted, then all information about that student is lost.
UPDATE. If student migrates from Islamabad campus to Lahore campus (say) SID = 1, then six rows would have to be updated with this new information.
Virtual University of Pakistan
INSERT. The fact that a certain student with SID 5 got admission in a different campus (say) Karachi cannot be added until the student registers for a course.
DELETE. If student graduates and his/her corresponding record is deleted, then all information about that student is lost.
UPDATE. If student migrates from Islamabad campus to Lahore campus (say) SID = 1, then six rows would have to be updated with this new information.
Normalization: 2NF
Every non-key column is fully dependent on the PK.
FIRST is in 1NF but not in 2NF because degree and campus are functionally dependent upon only on the column SID of the composite key (SID, course). This can be illustrated by listing the functional dependencies in the table:
SID —> campus, degree
campus —> degree
(SID, Course) —> Marks
To transform the table FIRST into 2NF we move the columns SID, Degree and Campus to a new table called REGISTRATION. The column SID becomes the primary key of this new table.
The definition of second normal form states that only tables with composite primary keys can be in 1NF but not in 2NF.
SID and Campus are NOT unique
Virtual University of Pakistan
To transform the table FIRST into 2NF we move the columns SID, Degree and city to a new table called REGISTRATION. The column SID becomes the primary key of this new table.
Normalization: 2NF
Presence of modification anomalies for tables in 2NF. For the table REGISTRATION, they are:
INSERT: Until a student gets registered in a degree program, that program cannot be offered!
DELETE: Deleting any row from REGISTRATION destroys all other facts in the table.
Why there are anomalies?
The table is in 2NF but NOT in 3NF
Tables in 2NF but not in 3NF still contain modification anomalies. In the example of REGISTRATION, they are:
Virtual University of Pakistan
Normalization: 3NF
All columns must be dependent only on the primary key.
Table PERFORMANCE is already in 3NF. The non-key column, marks, is fully dependent upon the primary key (SID, degree).
REGISTRATION is in 2NF but not in 3NF because it contains a transitive dependency.
A transitive dependency occurs when a non-key column that is a determinant of the primary key is the determinate of other columns.
The concept of a transitive dependency can be illustrated by showing the functional dependencies:
REGISTRATION.SID —> REGISTRATION.Degree REGISTRATION.SID —> REGISTRATION.Campus REGISTRATION.Campus —> REGISTRATION.Degree
Note that REGISTRATION.Degree is determined both by the primary key SID and the non-key column campus.
For a relational table to be in third normal form (3NF) all columns must be dependent only upon the primary key. More formally, a relational table is in 3NF if it is already in 2NF and every non-key column is non transitively dependent upon its primary key. In other words, all non-key attributes are functionally dependent only upon the primary key. Or put another way, all attributes must be directly dependent on the primary key without implied dependencies through other attributes of the relation.
Virtual University of Pakistan
Figure-6.4: Part of example student database system in 3NF
Figure-6.4 shows that the table REGISTRATION is transformed into two tables i.e. Student_Campus and Campus_Degree. If we look at it in the context of memory, we observe that the storage space requirement has increased, for this particular example by about 7%.
Normalization: 3NF Removal of anomalies and improvement in queries as follows:
INSERT: Able to first offer a degree program, and then students registering in it.
UPDATE: Migrating students between campuses by changing a single row.
DELETE: Deleting information about a course, without deleting facts about all columns in the record.
Virtual University of Pakistan
Observe that by virtue of bringing a relational table into 3NF, storage of redundant data is further eliminated, that not only results in saving of space, but also reduces manipulation anomalies. For example, as a consequence following improvements have taken place:
INSERT: If Peshawar campus decides to offer a PhD program, this can be reflected even though there is no student currently registered in that campus. Similarly, facts about new students can be added even though they may not have registered for a course.
UPDATE: Changing the campus of a student (by migration) or a degree program of a campus requires modification of only a single row.
DELETE: Information about courses taken can be deleted without destroying information about a student or a campus.
Conclusions
Normalization guidelines are cumulative.
Generally a good idea to only ensure 2NF.
3NF is at the cost of simplicity and performance.
There is a 4NF with no multi-valued dependencies.
There is also a 5NF.
Normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.