Schema Refinement-Introduction to Database Systems-Lecture 15 Slides-Computer Science, Slides of Introduction to Database Management Systems

Schema Refinement and Normalization, Normal Forms, Boyce-codd Normal Form, Decomposition, Relation Schema, Lossless Decomposition, Lossy Decomposition, Lossless Join Decompositions, Dependency Preserving Decomposition, Third Normal Form, Candidate Key, Partial Dependency

Typology: Slides

2011/2012

Uploaded on 02/12/2012

dylanx
dylanx šŸ‡ŗšŸ‡ø

4.7

(21)

286 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Schema Refinement
and Normalization
Nobody realizes that some people
expend tremendous energy
merely to be normal.
Albert Camus
Normal Forms
•Review FDs, Armstrong’s Axioms and Attr. Closures!
•Q1: is any refinement needed??!
•If a relation is in a
normal form
(BCNF, 3NF etc.):
–we know that certain problems are avoided/minimized.
–helps decide whether decomposing a relation is useful.
•Role of FDs in detecting redundancy:
–Consider a relation R with 3 attributes, ABC.
•No (non-trivial) FDs hold: There is no redundancy here.
•Given A B: If A is not a key, then several tuples could have the
same A value, and if so, they’ll all have the same B value!
•1st Normal Form –all attributes are atomic
–i.e. the relational model
•1st 2nd (of historical interest) 3rd Boyce-Codd …
Boyce-Codd Normal Form (BCNF)
•Reln R with FDs
F
is in BCNF if, for all X A in F+
–A X (called a
trivial
FD), or
–X is a superkey for R.
•In other words: ā€œR is in BCNF if the only non-trivial FDs
over R are
key constraints
.ā€
•If R in BCNF, then every field of every tuple records
information that cannot be inferred using FDs alone.
–Say we know FD X A holds this example relation:
•Can you guess the value of the
missing attribute?
•Yes, so relation is not in BCNF
X Y A
5y1 7
5y2 ?
Decomposition of a Relation Schema
•If a relation is not in a desired normal form, it can be
decomposed
into multiple relations that each are in that
normal form.
•Suppose that relation R contains attributes
A1 ... An.
A
decomposition
of R consists of replacing R by two or more
relations such that:
–Each new relation scheme contains a subset of the
attributes of R, and
–Every attribute of R appears as an attribute of at least
one of the new relations.
Example (same as before)
•SNLRWH has FDs S SNLRWH and R W
•Q: Is this relation in BCNF?
Hourly_Emps
No, The second FD causes a violation;
W values repeatedly associated with R values.
Decomposing a Relation
•Easiest fix is to create a relation RW to store these
associations, and to remove W from the main
schema:
•Decompositions should be used only when needed.
–Q: potential problems of decomposition?
•Q: Are both of these relations are now in BCNF?
pf3
pf4

Partial preview of the text

Download Schema Refinement-Introduction to Database Systems-Lecture 15 Slides-Computer Science and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Schema Refinement

and Normalization

Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus

Normal Forms

  • Review FDs, Armstrong’s Axioms and Attr. Closures!
  • Q1: is any refinement needed??!
  • If a relation is in a normal form (BCNF, 3NF etc.):
    • we know that certain problems are avoided/minimized.
    • helps decide whether decomposing a relation is useful.
  • Role of FDs in detecting redundancy:
    • Consider a relation R with 3 attributes, ABC.
      • No (non-trivial) FDs hold: There is no redundancy here.
      • Given A B: If A is not a key, then several tuples could have the same A value, and if so, they’ll all have the same B value!
  • 1 st^ Normal Form – all attributes are atomic
    • i.e. the relational model
  • 1 st^2 nd^ (of historical interest) 3 rd^ Boyce-Codd …

Boyce-Codd Normal Form (BCNF)

  • Reln R with FDs F is in BCNF if, for all X A in F+
    • A X (called a trivial FD), or
    • X is a superkey for R.
  • In other words: ā€œR is in BCNF if the only non-trivial FDs

over R are key constraints.ā€

  • If R in BCNF, then every field of every tuple records information that cannot be inferred using FDs alone. - Say we know FD X A holds this example relation: - Can you guess the value of the missing attribute? - Yes, so relation is not in BCNF X Y A 5 y1 7 5 y2?

Decomposition of a Relation Schema

  • If a relation is not in a desired normal form, it can be

decomposed into multiple relations that each are in that

normal form.

  • Suppose that relation R contains attributes A1 ... An. A

decomposition of R consists of replacing R by two or more

relations such that:

  • Each new relation scheme contains a subset of the attributes of R, and
  • Every attribute of R appears as an attribute of at least one of the new relations.

Example (same as before)

  • SNLRWH has FDs S SNLRWH and R W
  • Q: Is this relation in BCNF? Hourly_Emps No, The second FD causes a violation; W values repeatedly associated with R values.

Decomposing a Relation

  • Easiest fix is to create a relation RW to store these associations, and to remove W from the main schema:
  • Decompositions should be used only when needed.
  • Q: potential problems of decomposition?
  • Q: Are both of these relations are now in BCNF?

Problems with Decompositions

  • There are three potential problems to consider:
    1. May be impossible to reconstruct the original relation! (Lossy Decomposition)
      • Fortunately, not in the SNLRWH example.
    2. Dependency checking may require joins (not Dependency Preserving)
      • Fortunately, not in the SNLRWH example.
    3. Some queries become more expensive.
      • e.g., How much does Guldu earn? Tradeoff: Must consider these issues vs. redundancy. (Well, not usually #1)

Lossless Decomposition (example)

Lossy Decomposition (example) Lossless Join Decompositions

  • Decomposition of R into X and Y is lossless w.r.t. a set

of FDs F if, for every instance r that satisfies F:

( r) ( r) = r

  • It is always true that r ( r) ( r)
    • In general, the other direction does not hold! If it does, the decomposition is lossless-join.
  • Definition extended to decomposition into 3 or more relations in a straightforward way.
  • It is essential that all decompositions used to deal with

redundancy be lossless! (Avoids Problem #1)

X Y X Y   More on Lossless Decomposition

  • The decomposition of R into X and Y is lossless with respect to F if and only if the closure of F contains: X Y X, or X Y Y in example: decomposing ABC into AB and BC is lossy, because intersection (i.e., ā€œBā€) is not a key of either resulting relation.
  • Useful result: If W Z holds over R and W Z is empty, then decomposition of R into R-Z and WZ is loss-less. i.e. the common attributes form a superkey for one side or the other

Lossless Decomposition (example)

But, now we can’t check A B without doing a join!

An Aside: Second Normal Form

  • Like 3NF, but allows transitive dependencies:
    • Reln R with FDs F is in 2NF if, for all X A in F+ A X (called a trivial FD), or X is a superkey of R, or X is not part of any candidate key for R. (i.e. ā€œX is not primeā€)
  • There’s no reason to use this in practice
    • And we won’t expect you to remember it

Decomposition into 3NF

  • Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp into 3NF (typically, can stop earlier) but does not ensure dependency preservation.
  • To ensure dependency preservation, one idea:
    • If X Y is not preserved, add relation XY. Problem is that XY may violate 3NF! e.g., consider the addition of CJP to `preserve’ JP C. What if we also have J C?
  • Refinement: Instead of the given set of FDs F, use a

minimal cover for F.

Minimal Cover for a Set of FDs

  • Minimal cover G for a set of FDs F:
    • Closure of F = closure of G.
    • Right hand side of each FD in G is a single attribute.
    • If we modify G by deleting an FD or by deleting attributes from an FD in G, the closure changes.
  • Intuitively, every FD in G is needed, and `` as small as

possible’’ in order to get the same closure as F.

  • e.g., A B, ABCD E, EF GH, ACDF EG has the following minimal cover: - A B, ACD E, EF G and EF H
  • M.C. implies Lossless-Join, Dep. Pres. Decomp!!!
    • (in book, p. 627)

Summary of Schema Refinement

  • BCNF: each field contains information that cannot be inferred using only FDs. - ensuring BCNF is a good heuristic.
  • Not in BCNF? Try decomposing into BCNF relations.
    • Must consider whether all FDs are preserved!
  • Lossless-join, dependency preserving decomposition into BCNF impossible? Consider 3NF. - Same if BCNF decomp is unsuitable for typical queries - Decompositions should be carried out and/or re-examined

while keeping performance requirements in mind.

  • Note: even more restrictive Normal Forms exist (we don’t cover them in this course, but some are in the book.)