














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
differtent types of nomalization 1st NF,2nd NF,3rd NF, BCNF
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















The four informal measures of quality for relation schema
^ ^ Semantics of the attributes ^ ^ Reducing the redundant values in tuples Reducing the null values in tuples Disallowing the possibility of generating spurious tuples
Specifies how to interpret the attributes values stored in a tuple of the relation. In other words, how the attribute value in a tuple relate to one another.
Guideline 1 : Design a relation schema so that it is easy to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation. Save storage space and avoid update anomalies. Insertion anomalies. Deletion anomalies. Modification anomalies
Guideline 3 : Avoid placing attributes in a base relation whose values are mostly null. Disallowing spurious tuples.
Spurious tuples - tuples that are not in the original relation but generated by natural join of decomposed subrelations.
Example: decompose EMP_PROJ into EMP_LOCS and EMP_PROJ1.
Fig. 14.5a
Guideline 4 : Design relation schemas so that they can be naturally JOINed on primary keys or foreign keys in a way that guarantees no spurious tuples are generated.
6.2 A functional dependency (FD ) is a constraint between two sets of attributes from the database. It is denoted by
X Y
We say that " Y is functionally dependent on X ". Also, X is called the left-hand side of the FD. Y is called the right-hand side of the FD.
A functional dependency is a property of the semantics or meaning of the attributes, i.e., a property of the relation schema. They must hold on all relation states (extensions) of R. Relation extensions r(R). A FD X Y is a full functional dependency if removal of any attribute from X means that the dependency does not hold any more; otherwise, it is a partial functional dependency.
Examples:
FD is property of the relation schema R, not of a particular relation state/instance
Let R be a relation schema, where X R and Y R
t 1 , t 2 r, t 1 [X] = t 2 [X] t 1 [Y] = t 2 [Y]
The FD X Y holds on R if and only if for all possible relations r(R), whenever two tuples of r agree on the attributes of X, they also agree on the attributes of Y.
the single arrow denotes "functional dependency" X Y can also be read as " X determines Y " the double arrow denotes "logical implication"
IR1. Reflexivity e.g. X X
a formal statement of trivial dependencies ; useful for derivations
if a dependency holds, then we can freely expand its left hand side
^ ^ the "most powerful" inference rule; useful in multi-step derivations Armstrong inference rules are sound
meaning that given a set of functional dependencies F specified on a relation schema R, any dependency that we can infer from F by using IR1 through IR3 holds every relation state r of R that specifies the dependencies in F. In other words, rules can be used to derive precisely the closure or no additional FD can be derived. complete
Dept of CSE,GCEM Page 86
Step3 :On the third iteration, for B E
B {A, B,C}+
{A, B}+ := {A, B, C, E}.
Step4: On the fourth iteration, for CD EF remains unchanged.
Go round the inner loop 4 times again. On the first iteration result does not change; on the second it expands to {A,B,C,E,F}; On the third and forth it does not change.
Now go round the inner loop 4 times. Closure does not change and so the whole process terminates, with {A,B}+ = {A,B,C,E,F}
Example.
F = { SSN ENAME, PNUMBER {PNAME, PLOCATION}, {SSN,PNUMBER} HOURS }
{SSN}+^ = {SSN, ENAME}
{PNUMBER}+^ =?
{SSN,PNUMBER}+^ =?
The purpose of normalization.
^ The problems associated with redundant data. The identification of various types of update anomalies such as insertion, deletion, and modification anomalies. ^ How to recognize the appropriateness or quality of the design of relations. The concept of functional dependency, the main tool for measuring the appropriateness of
attribute groupings in relations. How functional dependencies can be used to group attributes into relations that are in a known normal form. How to define normal forms for relations.
^ How to undertake the process of normalization. How to identify the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF) normal forms, and Boyce-Codd normal form (BCNF). How to identify fourth (4NF), and fifth (5NF) normal forms
Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data, its relationships, and constraints. To achieve this objective, we must identify a suitable set of relations. A technique for producing a set of relations with desirable properties, given the data requirements of an enterprise
A relation is defined as a set of tuples. By definition, all elements of a set are distinct; hence, all tuples in a relation must also be distinct. This means that no two tuples can have the same combination of values for all their attributes.
The most commonly used normal forms
First Normal Form (1NF) Second Normal Form (2NF)
^ ^ Third Normal Form (3NF) Boyce-Codd Normal Form Other Normal Forms Fourth Normal Form Fifth Normal Form Domain Key Normal Form
First normal form is now considered to be part of the formal definition of a relation; historically, it was defined to disallow multivalued attributes, composite attributes, and their combinations. It states that the domains of attributes must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
Practical Rule: "Eliminate Repeating Groups," i.e., make a separate table for each set of related attributes, and give each table a primary key.
Formal Definition: A relation is in first normal form (1NF) if and only if all underlying simple domains contain atomic values only.
Second normal form is based on the concept of fully functional dependency. A functional X Y is a fully functional dependency is removal of any attribute A from X means that the dependency does not hold any more. A relation schema is in 2NF if every nonprime attribute in relation is fully functionally dependent on the primary key of the relation. It also can be restated as: a relation schema is in 2NF if every nonprime attribute in relation is not partially dependent on any key of the relation.
Practical Rule: "Eliminate Redundant Data," i.e., if an attribute depends on only part of a multivalued key, remove it to a separate table.
Formal Definition: A relation is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is fully dependent on the primary key.
Third normal form is based on the concept of transitive dependency. A functional dependency X Y in a relation is a transitive dependency if there is a set of attributes Z that is not a subset of any key of the relation, and both X Z and Z Y hold. In other words, a relation is in 3NF if, whenever a functional dependency
X A holds in the relation, either (a) X is a superkey of the relation, or (b) A is a prime attribute of the relation.
Practical Rule: "Eliminate Columns not Dependent on Key," i.e., if attributes do not contribute to a description of a key, remove them to a separate table
If X Y is non-trivial then X is a super key STREET CITY ZIP
{CITY,STREET } ZIP
ZIP CITY
Insertion anomaly: the city of a zip code can‘t be stored, if the street is not given
STREET ZIP ZIP CITY
Relationship Between Normal Forms
Normalization Algorithms based on FDs to synthesize 3NF and BCNF describe two desirable properties (known as properties of decomposition). Dependency Preservation Property Lossless join property Dependency Preservation Property enables us to enforce a constraint on the original relation from corresponding instances in the smaller relations.
Lossless join property enables us to find any instance of the original relation from corresponding instances in the smaller relations (Both used by the design algorithms to achieve desirable decompositions). A property of decomposition, which ensures that no spurious rows are generated when relations are reunited through a natural join operation.
Individual relations being in higher normal do not guarantee a good deign Database schema must posses additional properties to guarantee a good design.
Relation Decomposition and Insufficiency of Normal Forms
Suppose R = { A 1 , A 2 , …, An} that includes all the attributes of the database. R is a universal relation schema, which states that every attribute name is unique. Using FDs, the algorithms decomposes the universal relation schema R into a set of relation schemas
D = {R 1 , R 2 , …, Rn} that will become the relational database schema; D is called a decomposition of R. Each attribute in R will appear in at least one relation schema Ri in the decomposition so that no attributes are lost; we have
This is called attribute preservation condition of a decomposition.
We want to preserve dependencies because each dependencies in F represents a constraint on the
Database.
We would like to check easily that updates to the database do not result in illegal relations being created.
It would be nice if our design allowed us to check updates without having to compute natural joins. To know whether joins must be computed, we need to determine what functional dependencies may be tested by checking each relation individually.
Let F be a set of functional dependencies on schema R. Let D = {R1, R2, …, Rn} be a decomposition of R. Given a set of dependencies F on R, the projection of F on Ri, Ri(F), where Ri is a subset of R, is the set of all
functional dependencies XY such that attributes in XY are all contained in Ri. Hence the projection of F on each relation schema Ri in the decomposition D is the set of FDs in F+, such that all their LHS and RHS attributes are in Ri. Hence, the projection of F on each relation schema Ri in the decomposition D is the set of functional dependencies in F+.
((R1(F))(R2(F))… (Rm(F)))+^ = F+ i.e., the union of the dependencies that hold on each Ri belongs to D be equivalent to closure of F (all possible FDs)
/*Decompose relation, R, with functional dependencies, into relations, R 1 ,..., Rn, with associated functional dependencies,
F 1 ,..., Fk.
The decomposition is dependency preserving iff :
F+=(F 1 ... Fk)+^ */
If each functional dependency specified in F either appeared directly in one of the relation schema R in the decomposition D or could be inferred from the dependencies that appear in some R.
7.2.2 Lossless-join Dependency
A property of decomposition, which ensures that no spurious rows are generated when relations are reunited through a natural join operation.
Lossless-join property refers to when we decompose a relation into two relations - we can rejoin the resulting relations to produce the original relation.
Decompose relation, R, with functional dependencies, F, into relations, R1 and R2, with attributes, A1 and A2, and associated functional dependencies, F1 and F
Decompositions are projections of relational schemas
R A B C (^) A,B A B (^) B,C B C
a1 b1 c1 a1 b1 b1 c
However, sometimes there is the requirement to decompose a relation into more than two relations. Although rare, these cases are managed by join dependency and 5NF.
4NF associated with a dependency called multi-valued dependency (MVD). MVDs in a relation are due to first normal form (1NF), which disallows an attribute in a row from having a set of values.
MVD represents a dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B, and a set of values for C. However, the set of values for B and C are independent of each other. MVD between attributes A, B, and C in a relation using the following notation
A B (A multidetermines B)
A C
Formal Definition of Multivalued Dependency
A multivalued dependency (MVD) X Y specified on R, where X, and Y are both subsets of R and Z = (R – (X Y)) specifies the following restrictions on r(R)
t 3 [X]=t 4 [X]=t 1 [X]=t 2 [X]
t 3 [Y] = t 1 [Y] and t 4 [Y] = t 2 [Y]
t 3 [Z] = t 2 [Z] and t 4 [Z] = t 1 [Z]
4.3.1 Fourth Normal Form (4NF)
A relation that is in Boyce-Codd Normal Form and contains no MVDs. BCNF to 4NF involves the removal of the MVD from the relation by placing the attribute(s) in a new relation along with a copy of the determinant(s).
A Relation is in 4NF if it is in 3NF and there is no multivalued dependencies.
A join dependency ( JD ), denoted by JD{R 1 , R 2 , …, Rn}, specified on relation schema R, specifies a constraint on the states r of R. The constraint states that every legal state r of R should have a lossless join decomposition into R 1 , R 2 , …, Rn; that is, for every such r we have
Lossless-join property refers to when we decompose a relation into two relations - we can rejoin the resulting relations to produce the original relation. However, sometimes there is the requirement to decompose a relation into more than two relations. Although rare, these cases are managed by join dependency and 5NF.
A relation that has no join dependency.