



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Normalization Process in 1NF, 2NF and 3NF
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Normalization is a process for evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies. The normalization process involves assigning attributes to tables based on the concept of determination. Normalization works through a series of stages called normal forms. The first three stages are described as first normal form (1NF), second normal form (2NF), and third normal form (3NF). From a structural point of view, 2NF is better than 1NF, and 3NF is better than 2NF. For most purposes in business database design, 3NF is as high as you need to go in the normalization process. THE NEED FOR NORMALIZATION To get a better idea of the normalization process, consider the simplified database activities of a construction company that manages several building projects. Each project has its own project number, name, employees assigned to it, and so on. Each employee has an employee number, name, and job classification, such as engineer or computer technician. The company charges its clients by billing the hours spent on each contract. The hourly billing rate is dependent on the employee’s position. For example, one hour of computer technician time is billed at a different rate than one hour of engineer time.
FIGURE 5.1 Tabular representation of the report format Note that the data in Figure 5.1 reflects the assignment of employees to projects. Apparently, an employee can be assigned to more than one project. For example, Darlene Smithson (EMP_NUM = 112) has been assigned to two projects: Amber Wave and Star flight. Given the structure of the data set, each project includes only a single occurrence of any one employee. Therefore, knowing the PROJ_NUM and EMP_NUM value will let you find the job classification and its hourly charge. In addition, you will know the total number of hours each employee worked on each project.
Unfortunately, the structure of the data set in Figure 5.1 does not conform to the requirements discussed in Chapter 3, nor does it handle data very well. Consider the following deficiencies:
In spite of those structural deficiencies, the table structure appears to work; the report is generated with ease. Unfortunately, the report might yield varying results depending on what data anomaly has occurred. For example, if you want to print a report to show the total “hours worked” value by the job classification “Database Designer,” that report will not include data for “DB Design” and “Database Design” data entries. Such reporting anomalies cause a multitude of problems for managers—and cannot be fixed through applications programming. Even if very careful data entry auditing can eliminate most of the reporting problems (at a high cost), it is easy to demonstrate that even a simple data entry becomes inefficient. Given the existence of update anomalies, suppose Darlene M. Smithson is assigned to work on the Evergreen project. The data entry clerk must update the PROJECT file with the entry:
15 Evergreen 112 Darlene M. Smithson DSS Analyst $45.95 0.
to match the attributes PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, and HOURS. (When Ms. Smithson has just been assigned to the project, she has not yet worked, so the total number of hours worked is 0.0.)
Each time another employee is assigned to a project, some data entries (such as PROJ_NAME, EMP_NAME, and CHG_HOUR) are unnecessarily repeated. Imagine the data entry chore when 200 or 300
Before outlining the normalization process, it’s a good idea to review the concepts of determination and functional dependency. Functional Dependency Concepts
CONCEPT DEFINITION
Functional dependency The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B. Example: PROJ_NUM → PROJ_NAME (read as “ PROJ_NUM functionally determines PROJ_NAME) In this case, the attribute PROJ_NUM is known as the “ determinant ” attribute and the attribute PROJ_NAME is known as the “ dependent ” attribute. Functional dependency (generalized definition)
Attribute A determines attribute B (that is, B is functionally dependent on A ) if all of the rows in the table that agree in value for attribute A also agree in value for attribute B. Fully functional dependency (composite key)
If attribute B is functionally dependent on a composite key A but not on any subset of that composite key, the attribute B is fully functionally dependent on A.
It is crucial to understand these concepts because they are used to derive the set of functional dependencies for a given relation. The normalization process works one relation at a time, identifying the dependencies on that relation and normalizing the relation. As you will see in the following sections, normalization starts by identifying the dependencies of a given relation and progressively breaking up the relation (table) into a set of new relations (tables) based on the identified dependencies.
Conversion to First Normal Form Because the relational model views data as part of a table or a collection of tables in which all key values must be identified, the data depicted in Figure 5.1 might not be stored as shown. Note that Figure 5.1 contains what is known as repeating groups. A repeating group derives its name from the fact that a group of multiple entries of the same type can exist for any single key attribute occurrence. In Figure 5.1, note that each single project number (PROJ_NUM) occurrence can reference a group of related data entries. For example, the Evergreen
project (PROJ_NUM = 15) shows five entries at this point—and those entries are related because they each share the PROJ_NUM = 15 characteristic. Each time a new record is entered for the Evergreen project, the number of entries in the group grows by one. A relational table must not contain repeating groups. The existence of repeating groups provides evidence that the RPT_FORMAT table in Figure 5.1 fails to meet even the lowest normal form requirements, thus reflecting data redundancies.
Normalizing the table structure will reduce the data redundancies. If repeating groups do exist, they must be eliminated by making sure that each row defines a single entity. In addition, the dependencies must be identified to diagnose the normal form. Identification of the normal form will let you know where you are in the normalization process. The normalization process starts with a simple three-step procedure. Step 1: Eliminate the Repeating Groups Start by presenting the data in a tabular format, where each cell has a single value and there are no repeating groups. To eliminate the repeating groups, eliminate the nulls by making sure that each repeating group attribute contains an appropriate data value. That change converts the table in Figure 5.1 to 1NF in Figure 5.2.
Figure 5.2 A table in first normal form
Step 2: Identify the Primary Key The layout in Figure 5.2 represents more than a mere cosmetic change. Even a casual observer will note that PROJ_NUM is not an adequate primary key because the project number does not uniquely identify all of the remaining entity (row) attributes. For example, the PROJ_NUM value 15 can identify any one of five employees. To maintain a proper primary key that will uniquely identify any attribute value, the new key must be composed of a combination of PROJ_NUM and EMP_NUM. For example, using the data shown in Figure 5.2, if you know that PROJ_NUM = 15 and EMP_NUM = 103, the entries for the attributes PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, and HOURS must be Evergreen, June E. Arbough, Elect. Engineer, $84.50, and 23.8, respectively. Step 3: Identify All Dependencies The identification of the PK in Step 2 means that you have already identified the following dependency:
PROJ_NUM, EMP_NUM PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS