Download Design Guidelines for Relational Schema and Normalization in Database Systems and more Study notes Mobile Computing in PDF only on Docsity!
DBDI/ Lecture 5
Functional Dependencies &
Normalization
Dr. Ala Al-Zobaidie The slides are based on the textbook: Fundamentals of Database Systems by Elmasri & Navathe
30/05/2007 DBDI /FD & Norm 2
Lecture’s Objectives
- Design guidelines for relational schema
- Definition of partial and full dependencies
- Transitive dependency
- Dependency Diagrams
- Rules about functional dependencies
- Logical vs. Storage design
- The three normal forms
- Boyce-Codd Normal Form
30/05/2007 DBDI /FD & Norm 3
Design guidelines for relational
schema
- Guideline 1:
- Semantic of the relation attributes
- Do not mix attributes from distinct real world
- Guideline 2:
- Reducing the redundant values in tuples
- Reducing storage area
- Avoiding update anomalies
30/05/2007 DBDI /FD & Norm 4
Guideline 2: Reducing storage area
- Reducing storage area
- Compare space for 2 schemas below:
Update Anomalies
- Create a training dept#
- Deleting Employee James E. Bong
- Change name of dept#5 from Research to Research & Development
Guideline 2: Avoiding update anomalies
- Insertion:
- We cannot insert partial new data, as we must have a value for the PK.
- PK id not necessarily related to the new data!
- Means: cannot create a dept without EMPs, or vice versa
- e.g. Create a training dept#
- Deletion:
- Deleting the last record may cause permanent loss of information - e.g. dept#
- Modification:
- Update one instance entails correctly updating all copies
- e.g. change the name of dept#5 to Research & Development
30/05/2007 DBDI /FD & Norm 7
Emp-Dept & Emp_Prj: Suffers from Update Anomalies due to Redundancy
30/05/2007 DBDI /FD & Norm 8
Guideline 3: Reducing the Null values in
tuples
- Avoid Null values
- Waste space at storage level
- No semantic:
- Does it apply to the instance?
- Applicable but absent?
- Zeros or Blanks?
- Problems with specifying Join operations at logical level (natural vs. outer joins)
- May pause problems with aggregate functions:
- Nulls are eliminated from all aggregate functions apart from count(*)
30/05/2007 DBDI /FD & Norm 9
Guideline 4: Disallowing the generation of
spurious tuples
- The“Lossless Join” property:
- Design relations so that they can be JOINed with equality condition “Equijoin” on attributes that are either PKs or FKs
- This will guarantee no generation of spurious tuples.
30/05/2007 DBDI /FD & Norm 10
Guideline 4: Ggeneration of spurious
tuples/1 → Lossless Join
Query: List the name & hours of all projects that John Smith worked on.
3
Guideline 4: Ggeneration of spurious
tuples/2 → Lossless Join
Query: List the name & hours of all projects that John Smith worked on.
3 ² ² ²
3
Guideline 4: Ggeneration of spurious
tuples/
² ² ²
3 3
30/05/2007 DBDI /FD & Norm 19
Dependency Inference Rules/
- A1. (Reflexive): If Y ⊆ X, then X → Y
- A2. (Augmentation): If X → Y, then XZ → YZ (Notation: XZ stands for X ∪ Z)
- A3. (Transitive): If X → Y and Y → Z, then X → Z A1, A2, A3 form a sound and complete set of inference rules 30/05/2007 DBDI /FD & Norm 20
Dependency Inference Rules/
Some additional inference rules that are
useful:
If X →^ YZ, then X^ →^ Y and X^ →^ Z
If X → Y and X → Z, then X → YZ
- A6. (Psuedo-transitivity):
If X → Y and WY → Z, then WX → Z
Also expressed as
- If X → Y and W → Z, then XW → YZ (Composition)
30/05/2007 DBDI /FD & Norm 21
Logical vs. Storage design
- Our aim in relational DB design is:
- The grouping of attributes to form “good” relation schema
- Two level of relation schema:
- The logical “user view” level
- The storage “base relation” level
- Design is concerned mainly with base
relations
30/05/2007 DBDI /FD & Norm 22
The Process of Normalization
- Formal technique for analyzing a relation based on its PK & FDs between its attributes.
- A series of steps. Each corresponds to a specific normal form, which has known properties.
- As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.
Unnormalised Form (UNF)
- A table that contains one or more repeating groups.
- To create an unnormalized table:
- transform data from information source (e.g. form) into table format with columns and rows. A relation schema which is not in Normalised form
Example of a relation instance
Multi Value
First Normal Form (1NF)
- A relation in which intersection of each row and column contains one and only one value. - Disallows composite attributes, multivalued attributes & nested relations
- UNF to 1NF
- Nominate an attribute or group of attributes to act as the key for the unnormalized table.
- Identify repeating group(s) in unnormalized table which repeats for the key attribute(s).
30/05/2007 DBDI /FD & Norm 25
UNF to 1NF/
- Remove repeating group by: entering appropriate data into the empty columns of rows containing repeating data (‘flattening’ the table).
1 st^ NF relation with redundancy
30/05/2007 DBDI /FD & Norm 26
UNF to 1NF/ Or by
- placing repeating data along with copy of the original key attribute(s) into a separate relation.
30/05/2007 DBDI /FD & Norm 27
Second Normal Form (2NF)
- Based on concept of full functional dependency:
- Prime attribute: attribute that is member of the PK
- Full FD: a FD Y → Z where removal of any attribute from Y means FD does not hold any more
- Examples:
- {SSN, Pnumber} → Hours is a Full FD since neither SSN → Hours nor Pnumber → Hours hold.
- {SSN , Pnumber} → Ename is a Partial dependency as only SSN → Ename
- 2NF - A relation that is in 1NF and every non- primary-key attribute is fully functionally dependent on the primary key.
30/05/2007 DBDI /FD & Norm 28
1NF to 2NF
- Identify primary key for the 1NF relation.
- Identify functional dependencies in the relation.
- If partial dependencies exist on the primary key remove them by placing them in a new relation along with copy of their determinant.
SSN PNUMBER HOURS ENAME PNAME PLOCATION
Examples of transformation into 2NF
Into 2NF
EMP_PROJ
FD FD FD
Third Normal Form (3NF)
- Based on concept of transitive
dependency:
- 3NF - A relation that is in 1NF and 2NF
and in which no non-primary-key
attribute is transitively dependent on
the primary key.
- 3NF - All non-prime attributes are fully &
directly dependent on the PK.
A B C
30/05/2007 DBDI /FD & Norm 37
BCNF / Example /
- Consider this scenario:
- The DSD company provides end user software training in Database, Network & Spreadsheets
- DSD employs several trainers in each of the three subject.
- Each trainer teaches only one subject, that is a Database trainer teaches Database only.
- Corporate customers may elect to purchase training contracts for one or more subjects.
1004 Database Alun
1003 Spreadsheet Phil
1002 Database Ala
1001 Network Sati
1001 Database Ala
Client Subject Staff
30/05/2007 DBDI /FD & Norm 38
- 2 composite candidate keys:
- FD1:{Client, Subject} → Staff
- FD2:{Client & Staff} → Subject
- These candidate keys are overlapping on Client.
Staff is a determinate but not a candidate key
BCNF / Example /
Client Subject Staff
Client Staff Subject
30/05/2007 DBDI /FD & Norm 39
BCNF / Example /
- What is the Problem? - Anomalies
- Delete client 1004 will also delete Tony teaches Database. So is for client 1001 on Network.
- Hence, we need to decompose table into two to get rid of redundancies.
1004 Database Alun
1003 Spreadsheet Phil
1002 Database Ala
1001 Network Sati
1001 Database Ala
Client Subject Staff
30/05/2007 DBDI /FD & Norm 40
BCNF / Example /
- Avoid mixing attributes belongs to different entities
Alun Database
Phil Spreadsheet
Sati Network
Ala Database
Staff Subject
Client Staff (^) Staff Subject
1002 Ala
1004 Alun
1003 Phil
1001 Sati
1001 Ala
Client Staff
30/05/2007 DBDI /FD & Norm 41
Summary
• We looked at:
–Informal Design guidelines
–Partial, Full & Transitive functional
dependencies
–Dependency Diagrams
–The three normal forms
–Boyce-Codd Normal Form