Denormalization - Data Warehousing - Lecture Handouts, Lecture notes of Data Warehousing

Denormalization, Striking a balance between normalization and denormalization, Denormalization in DSS, Guidelines for Denormalization, Areas for Applying Denormalization Techniques are discusses points of document. This handout is for data warehousing course from Virtual University of Pakistan.

Typology: Lecture notes

2011/2012

Uploaded on 11/06/2012

ahsen
ahsen 🇵🇰

4.6

(88)

84 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Warehousing Course Code: CS614
Virtual University of Pakistan
62
Lecture Handout
Data Warehousing
Lecture No. 07
De-Normalization
Striking a balance between “good” & “evil”
Figure-7.1: Striking a balance between normalization and de-normalization
There should be a balance between normalized and de-normalized forms. In a fully
normalized form, too many joins are required and in a totally de-normalized form, we
have a big, wide single table. Database should be aligned someplace in between so as to
strike a balance, especially in the context of the queries and the application domain.
A “pure” normalized design is a good starting point for a data model and is a great thing
to do and achieve in academia. However, as briefly mentioned in the previous lecture, in
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Denormalization - Data Warehousing - Lecture Handouts and more Lecture notes Data Warehousing in PDF only on Docsity!

[email protected]

Virtual University of Pakistan

Lecture Handout

Data Warehousing

Lecture No. 07

De-Normalization

Striking a balance between “good” & “evil”

Figure-7.1: Striking a balance between normalization and de-normalization

There should be a balance between normalized and de-normalized forms. In a fully normalized form, too many joins are required and in a totally de-normalized form, we have a big, wide single table. Database should be aligned someplace in between so as to strike a balance, especially in the context of the queries and the application domain.

A “pure” normalized design is a good starting point for a data model and is a great thing to do and achieve in academia. However, as briefly mentioned in the previous lecture, in

[email protected]

Virtual University of Pakistan

the reality of the “real world”, the enhancement in performance delivered by some selective de-normalization technique can be a very valuable tool. The key to success is to undertake de-normalization as a design technique very cautiously and consciously. Do not let proliferation of the technique take over your data warehouse or you will end up with a single big flat file!

What is De-Normalization?

ƒ It is not chaos, more like a “controlled crash” with the aim of performance enhancement without loss of information.

ƒ Normalization is a rule of thumb in DBMS, but in DSS ease of use is achieved by way of denormalization.

ƒ De-normalization comes in many flavors, such as combining tables, splitting tables, adding data etc., but all done very carefully.

‘Denormalization’ does not mean that anything and everything goes. Denormalization does not mean chaos or disorder or indiscipline. The development of properly denormalized data structures follows software engineering principles, which insure that information will not be lost. De-normalization is the process of selectively transforming normalized relations into un-normalized physical record specifications, with the aim of reducing query processing time. Another fundamental purpose of denormalization is to reduce the number of physical tables, that must be accessed to retrieve the desired data by reducing the number of joins required to answer a query. Some people tend to confuse dimensional modeling with de-normalization. This will become clear when we will cover dimensional modeling, where indeed tables are collapsed together.

Why De-Normalization in DSS?

[email protected]

Virtual University of Pakistan

consider whether the physical records should exactly match the normalized relations for a DSS or not?

How De-Normalization improves performance?

De-normalization specifically improves performance by either:

ƒ Reducing the number of tables and hence the reliance on joins, which consequently speeds up performance.

ƒ Reducing the number of joins required during query execution, or

ƒ Reducing the number of rows to be retrieved from the Primary Data Table.

The higher the level of normalization, the greater will be the number of tables in the DSS as the depth of snowflake schema would increase. The greater the number of tables in the DSS, the more joins are necessary for data manipulation. Joins slow performance, especially for very large tables for large data extractions, which is a norm in DSS not an exception. De-normalization reduces the number of tables and hence the reliance on joins, which consequently speeds up performance.

De-normalization can help minimize joins and foreign keys and help resolve aggregates. By storing values that would otherwise need to be retrieved (repeatedly), one may be able to reduce the number of indexes and even tables required to process queries.

4 Guidelines for De-normalization

  1. Carefully do a cost-benefit analysis (frequency of use, additional storage, join time).
    1. Do a data requirement and storage analysis.

[email protected]

Virtual University of Pakistan

  1. Weigh against the maintenance issue of the redundant data (triggers used).
  2. When in doubt, don’t denormalize. Guidelines for Denormalization:- Following are some of the basic guidelines to help determine whether it's time to denormalize the DSS design or not:

1. Balance the frequency of use of the data items in question, the cost of additional storage to duplicate the data, and the acquisition time of the join.

2. Understand how much data is involved in the typical query; the amount of data affects the amount of redundancy and additional storage requirements.

3. Remember that redundant data is a performance benefit at query time, but is a performance liability at update time because each copy of the data needs to be kept up to date. Typically triggers are written to maintain the integrity of the duplicated data.

4. De-normalization usually speeds up data retrieval, but it can slow the data modification processes. It may be noted that both on-line and batch system performance is adversely affected by a high degree of de-normalization. Hence the golden rule is: When in doubt, don’t denormalize.

Areas for Applying De-Normalization Techniques

ƒ Dealing with the abundance of star schemas.

ƒ Fast access of time series data for analysis.

ƒ Fast aggregate (sum, average etc.) results and complicated calculations.

[email protected]

Virtual University of Pakistan

  1. Splitting Tables (Horizontal/Vertical Splitting).
  2. Adding Redundant Columns (Reference Data).
  3. Derived Attributes (Summary, Total, Balance etc).

Now we will discuss de-normalization techniques that have been commonly adopted by experienced database designers. These techniques can be classified into four prevalent strategies for denormalization which are:

  1. Collapsing Tables.
  • Two entities with a One-to-One relationship.
  • Two entities with a Many-to-Many relationship.
  1. Pre-joining.
  2. Splitting Tables (Horizontal/Vertical Splitting).
  3. Adding Redundant Columns (Reference Data).
  4. Derived Attributes (Summary, Total, Balance etc).

Collapsing Tables

[email protected]

Virtual University of Pakistan

Figure-7.1: Collapsing Tables

1. Collapsing Tables One of the most common and safe denormalization techniques is combining of One-to- One relationships. This situation occurs when for each row of entity A, there is only one related row in entity B. While the key attributes for the entities may or may not be the same, their equal participation in a relationship indicates that they can be treated as a single unit. For example, if users frequently need to see COLA, COLB, and COLC together and the data from the two tables are in a One-to-One relationship, the solution is to collapse the two tables into one. For example, SID and gender in one table, and SID and degree in the other table.

In general, collapsing tables in One-to-One relationship has fewer drawbacks than others. There are several advantages of this technique, some of the obvious ones being reduced storage space, reduced amount of time for data update, some of the other not so apparent advantages are reduced number of foreign keys on tables, reduced number of indexes (since most indexes are created based on primary/foreign keys). Furthermore, combining