Normal Forms in Database Systems: Concepts and Rules, Study notes of Design

An introduction to normal forms in database systems, explaining functional dependencies, keys, superkeys, and the decomposition rule. It covers topics such as the third normal form, lossless decomposition, and dependency preservation. The document also discusses the importance of normalization in ensuring data integrity and improving query performance.

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

aristocrat
aristocrat 🇬🇧

5

(5)

240 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Normal Forms — 1CSC343 – Introduction to Databases
Week 12: Normal Forms
Database Design
Database Redundancies and Anomalies
Functional Dependencies
Entailment, Closure and Equivalence
Lossless Decompositions
The Third Normal Form (3NF)
The Boyce-Codd Normal Form (BCNF)
Normal Forms and Database Design
Normal Forms — 2CSC343 – Introduction to Databases
Logical Database Design
We have seen how to design a relational
schema by first designing an ER schema
and then transforming it into a relational
one.
Now we focus on how to transform the
generated relational schema into a "better"
one.
Goodness of relational schemas is defined
in terms of the notion of normal form.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download Normal Forms in Database Systems: Concepts and Rules and more Study notes Design in PDF only on Docsity!

CSC343 – Introduction to Databases Normal Forms — 1

Week 12: Normal Forms

Database Design Database Redundancies and Anomalies Functional Dependencies Entailment, Closure and Equivalence Lossless Decompositions The Third Normal Form (3NF) The Boyce-Codd Normal Form (BCNF) Normal Forms and Database Design

CSC343 – Introduction to Databases Normal Forms — 2

Logical Database Design

„ We have seen how to design a relational schema by first designing an ER schema and then transforming it into a relational one. „ Now we focus on how to transform the generated relational schema into a "better" one. „ Goodness of relational schemas is defined in terms of the notion of normal form.

CSC343 – Introduction to Databases Normal Forms — 3

Normal Forms and Normalization

„ A normal form is a property of a database schema. „ When a database schema is un-normalized (that is, does not satisfy the normal form), it allows redundancies of various types which can lead to anomalies and inconsistencies. „ Normal forms can serve as basis for evaluating the quality of a database schema and constitutes a useful tool for database design. „ Normalization is a procedure that transforms an un-normalized schema into a normalized one.

CSC343 – Introduction to Databases Normal Forms — 4

Examples of Redundancy

Employee Salary Project Budget Function Brown 20 Mars 2 technician Green 35 Jupiter 15 designer Green 35 Venus 15 designer Hoskins 55 Venus 15 manager Hoskins 55 Jupiter 15 consultant Hoskins 55 Mars 2 consultant Moore 48 Mars 2 manager Moore 48 Venus 15 designer Kemp 48 Venus 15 designer Kemp 48 Jupiter 15 manager

CSC343 – Introduction to Databases Normal Forms — 7

Functional Dependencies (FDs)

in the Example

„ Each employee has a unique salary. We represent this dependency as

Employee → Salary

and say " Salary functionally depends on Employee ". „ This means that everywhere we have the same Employee attribute value, we also get the same Salary attribute value. „ Likewise,

Project → Budget

i.e., each project has a unique budget.

CSC343 – Introduction to Databases Normal Forms — 8

Functional Dependencies

„ Given schema R(X) and non-empty subsets Y and Z of the attributes X , we say that there is a

functional dependency between Y and Z ( Y → Z ),

iff for every relation instance r of R(X) and every pair of tuples t 1 , t 2 of r , if t 1 .Y = t 2 .Y , then t 1 .Z = t 2 .Z.

„ A functional dependency is a statement about all allowable relations for a given schema. „ Functional dependencies have to be identified by understanding the semantics of the application. „ Given a particular relation r 0 of R(X), we can tell if a dependency holds or not; but just because it holds for r 0 , doesn’t mean that it also holds for R(X)!

CSC343 – Introduction to Databases Normal Forms — 9

Looking for FDs

Employee Salary Project Budget Function Brown 20 Mars 2 technician Green 35 Jupiter 15 designer Green 35 Venus 15 designer Hoskins 55 Venus 15 manager Hoskins 55 Jupiter 15 consultant Hoskins 55 Mars 2 consultant Moore 48 Mars 2 manager Moore 48 Venus 15 designer Kemp 48 Venus 15 designer Kemp 48 Jupiter 15 manager

CSC343 – Introduction to Databases Normal Forms — 10

Non-Trivial Dependencies

„ A functional dependency Y → Z is non-trivial if

no attribute in Z appears among attributes of Y , e.g.,

9 Employee → Salary is non-trivial;

9 Employee,Project → Project is trivial.

„ Anomalies arise precisely for the attributes which are involved in (non-trivial) functional dependencies:

9 Employee → Salary ;

9 Project → Budget.

„ Moreover, note that our example includes another functional dependency:

9 Employee,Project → Function.

CSC343 – Introduction to Databases Normal Forms — 13

How Do We Eliminate Redundancy?

„ Decomposition: Use two relations to store Person information:Person 99 Person1Person1 ( SI#, Name, Address ) 99 HobbiesHobbies ( SI#, Hobby ) „ The decomposition is more general: people with hobbies can now be described independently of their name and address. „ No update anomalies: 9 Name and address stored once; 9 A hobby can be separately supplied or deleted; 9 We can represent persons with no hobbies.

CSC343 – Introduction to Databases Normal Forms — 14

Superkey Constraints

„ A superkey constraint is a special functional dependency: Let K be a set of attributes of R, and U the set of all attributes of R. Then K is a superkey iff the functional dependency KU is satisfied in R. 9 E.g., SI#SI#,Name,Address (for a Person relation) „ A key is a minimal superkey, I.e., for each X ⊂ K, X is not a superkey 9 SI#, HobbySI#, Name, Address, Hobby but 9 SI#SI#, Name, Address, Hobby 9 HobbySI#, Name, Address, Hobby „ A key attribute is an attribute that is part of a key.

CSC343 – Introduction to Databases Normal Forms — 15

More Examples

„ AddressPostalCode

9 DCS’s postal code is M5S 3H

„ Author, Title, EditionPublicationDate

9 Ramakrishnan, et al., Database Management Systems, 3rd^ publication date is 2003

„ CourseIDExamDate, ExamTime

9 CSC343’s exam date is December 18, starting at 7pm

CSC343 – Introduction to Databases Normal Forms — 16

When are FDs "Equivalent"?

„ Sometimes functional dependencies (FDs) seem to be saying the same thing, e.g., AddrPostalCode,Str# vs AddrPostalCode, AddrStr#

„ Another example

AddrPostalCode, PostalCodeProvince vs AddrPostalCode, PostalCodeProvince AddrProvince „ When are two sets of FDs equivalent? How do we "infer" new FDs from given FDs?

CSC343 – Introduction to Databases Normal Forms — 19

Armstrong’s Axioms for FDs

„ This is the syntactic way of computing/testing semantic properties of FDs 9 Reflexivity : YX |- XY (trivial FD) e.g., | - Name, AddressName 9 Augmentation : XY | - XZYZ e.g., AddressZipCode | - Address,NameZipCode, Name 9 Transitivity : XY, YZ | - XZ e.g., Phone#Address, AddressZipCode | - Phone#ZipCode

CSC343 – Introduction to Databases Normal Forms — 20

Soundness

„ Theorem: F |- f implies F |= f „ In words: If FD f: XY can be derived from a set of FDs F using the axioms, then f holds in every relation that satisfies every FD in F. „ Example: Given XY and XZ then

„ Thus, XYZ is satisfied in every relation where both XY and XZ are satisfied. We have derived the union ruleunion rule for FDs.

XXY Augmentation by X YXYZ Augmentation by Y XYZ Transitivity

CSC343 – Introduction to Databases Normal Forms — 21

Completeness

„ Theorem: F |= f implies F |- f „ In words: If F entails f , then f can be derived from F using Armstrong's axioms. „ A consequence of completeness is the following (naïve) algorithm to determining if F entails f : AlgorithmAlgorithm : Use the axioms in all possible ways to generate F +^ (the set of possible FD’s is finite so this can be done) and see if f is in F +

CSC343 – Introduction to Databases Normal Forms — 22

Correctness

„ The notions of soundness and completeness link the syntax (Armstrong’s axioms) with semantics, I.e., entailment defined in terms of relational instances. „ This is a precise way of saying that the algorithm for entailment based on the axioms is ``correct’’ with respect to the definitions.

CSC343 – Introduction to Databases Normal Forms — 25

Attribute Closure

„ Calculating attribute closure leads to a more efficient way of checking entailment. „ The (^) attribute closureattribute closure of a set of attributes X with respect to a set of FDs F , denoted X +F , is the set of all attributes A such that XA 9 X +F is not necessarily same as X +G if FG „ Attribute closure and entailment: AlgorithmAlgorithm: Given a set of FDs,^ F , then^ X^ → Y if and only if YX +F

CSC343 – Introduction to Databases Normal Forms — 26

Computing Attribute Closure:

An Example

F : ABC AD DE ACB

X X F +

A {A, D, E} AB {A, B, C, D, E} AC {A, C, B, D, E} B {B} D {D, E} Is ABE entailed by F? Yes Is DC entailed by F? No Result : X F +^ allows us to determine all FDs of the form XY entailed by F

CSC343 – Introduction to Databases Normal Forms — 27

Computing the

Attribute Closure X+ F

closure := X; // since XX+ F repeat old := closure; if there is an FD ZV in F such that Zclosure and Vclosure then closure := closureV until old = closure

  • If Tclosure then XT is entailed by F

CSC343 – Introduction to Databases Normal Forms — 28

Normal Forms

„ Each normal form is a set of conditions on a schema that together guarantee certain properties (relating to redundancy and update anomalies).

„ First normal form (1NF) is the same as the definition of relational model (relations = sets of tuples; each tuple = sequence of atomic values).

„ Second normal form (2NF) – has no practical or theoretical value – won’t discuss.

„ The two most used are (^) third normal formthird normal form (3NF) and Boyce-Boyce-CoddCodd normal formnormal form (BCNF).

„ We will discuss in detail the 3NF.

CSC343 – Introduction to Databases Normal Forms — 31

Basic Idea

„ R(ABCD), A --> D

„ Projection: 9 R1(AD), A --> D 9 R2(ABC)

CSC343 – Introduction to Databases Normal Forms — 32

„ A relation that is not in 3NF, can be replaced with one or more normalized relations using normalization.

„ We can eliminate redundancies and anomalies for the example relation Emp(Employee,Salary,Project,Budget,Function)

if we replace it with the three relations obtained by projections on the sets of attributes corresponding to the three functional dependencies:

9 Employee → Salary;

9 Project → Budget.

9 Employee,Project → Function.

Normalization Through Decomposition

CSC343 – Introduction to Databases Normal Forms — 33

…Start with…

Employee Salary Project Budget Function Brown 20 Mars 2 technician Green 35 Jupiter 15 designer Green 35 Venus 15 designer Hoskins 55 Venus 15 manager Hoskins 55 Jupiter 15 consultant Hoskins 55 Mars 2 consultant Moore 48 Mars 2 manager Moore 48 Venus 15 designer Kemp 48 Venus 15 designer Kemp 48 Jupiter 15 manager

CSC343 – Introduction to Databases Normal Forms — 34

Result of Normalization

Employee Salary Brown 20 Green 35 Hoskins 55 Moore 48 Kemp 48 Employee Project Function Brown Mars technician Green Jupiter designer Green Venus designer Hoskins Venus manager Hoskins Jupiter consultant Hoskins Mars consultant Moore Mars manager Moore Venus designer Kemp Venus designer Kemp Jupiter manager

Project Budget Mars 2 Jupiter 15 Venus 15

The keys of new relations are lefthand sides of functional dependencies; satisfaction of 3NF is therefore guaranteed for the new relations.

CSC343 – Introduction to Databases Normal Forms — 37

The Join of the Projections

Employee Project Branch Brown Mars Chicago Green Jupiter Birmingham Green Venus Birmingham Hoskins Saturn Birmingham Hoskins Venus Birmingham Green Saturn Birmingham Hoskins Jupiter Birmingham

The result of the join is different from the original relation. We lost some information during the decomposition!

CSC343 – Introduction to Databases Normal Forms — 38

Lossless Decomposition

„ The decomposition of a relation R(X) on X (^1) and X 2 is lossless if for every instance r of R(X) the join of the projections of R on X (^1) and X 2 is equal to r itself (that is, does not contain spurious tuples). „ Of course, it is clearly desirable to allow only lossless decompositions during normalization.

CSC343 – Introduction to Databases Normal Forms — 39

A Condition for Lossless

Decomposition

„ Let R(X) be a relation schema and let X 1 and X 2 be two subsets of X such that X 1 ∪ X 2 = X. Also, let X 0 = X 1 ∩ X 2. „ If R(X) satisfies the functional dependency

X 0 → X 1 or X 0 → X 2 , then the decomposition

of R(X) on X 1 and X 2 is lossless. „ In other words, R(X) has a lossless decomposition on two relations if the set of attributes common to the relations is a superkey for at least one of the decomposed relations.

CSC343 – Introduction to Databases Normal Forms — 40

Intuition Behind the Test for

Losslessness

„ Suppose R 1R 2R 2. Then a row of r 1 can combine with exactly one row of r 2 in the natural join (since in r 2 a particular set of values for the attributes in R 1R (^2) defines a unique row) R 1R 2 R 1R 2 …………. a a ………... ………… a b …………. ………… b c …………. ………… c r 1 r 2