Database Normalization: A Comprehensive Guide to Minimizing Data Redundancy | Study notes Database Management Systems (DBMS)

NORMALIZATION

Normalization is a process for evaluating and correcting table

structures to minimize data redundancies, thereby reducing the

likelihood of data anomalies. The normalization process involves

assigning attributes to tables based on the concept of determination.

Normalization works through a series of stages called normal forms.

The ﬁrst three stages are described as ﬁrst normal form (1NF), second

normal form (2NF), and third normal form (3NF). From a structural point

of view, 2NF is better than 1NF, and 3NF is better than 2NF. For most

purposes in business database design, 3NF is as high as you need to

go in the normalization process.

THE NEED FOR NORMALIZATION

To get a better idea of the normalization process, consider the

simpliﬁed database activities of a construction company that manages

several building projects. Each project has its own project number,

name, employees assigned to it, and so on. Each employee has an

employee number, name, and job classiﬁcation, such as engineer or

computer technician.

The company charges its clients by billing the hours spent on each

contract. The hourly billing rate is dependent on the employee’s

position. For example, one hour of computer technician time is billed at

a diﬀerent rate than one hour of engineer time.

FIGURE 5.1 Tabular representation of the report format

Note that the data in Figure 5.1 reﬂects the assignment of employees

to projects. Apparently, an employee can be assigned to more than one

project. For example, Darlene Smithson (EMP_NUM = 112) has been

assigned to two projects: Amber Wave and Star ﬂight. Given the

structure of the data set, each project includes only a single

occurrence of any one employee.

Therefore, knowing the PROJ_NUM and EMP_NUM value will let you ﬁnd

the job classiﬁcation and its hourly charge. In addition, you will know

the total number of hours each employee worked on each project.

Unfortunately, the structure of the data set in Figure 5.1 does not

conform to the requirements discussed in Chapter 3, nor does it handle

data very well. Consider the following deﬁciencies:

1. The project number (PROJ_NUM) is apparently intended to be a

primary key or at least a part of a PK, but it contains nulls. (Given the

preceding discussion, you know that PROJ_NUM + EMP_NUM will deﬁne

each row.)

Database Normalization: A Comprehensive Guide to Minimizing Data Redundancy, Study notes of Database Management Systems (DBMS)