





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Denormalization, Striking a balance between normalization and denormalization, Denormalization in DSS, Guidelines for Denormalization, Areas for Applying Denormalization Techniques are discusses points of document. This handout is for data warehousing course from Virtual University of Pakistan.
Typology: Lecture notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






Virtual University of Pakistan
Striking a balance between “good” & “evil”
Figure-7.1: Striking a balance between normalization and de-normalization
There should be a balance between normalized and de-normalized forms. In a fully normalized form, too many joins are required and in a totally de-normalized form, we have a big, wide single table. Database should be aligned someplace in between so as to strike a balance, especially in the context of the queries and the application domain.
A “pure” normalized design is a good starting point for a data model and is a great thing to do and achieve in academia. However, as briefly mentioned in the previous lecture, in
Virtual University of Pakistan
the reality of the “real world”, the enhancement in performance delivered by some selective de-normalization technique can be a very valuable tool. The key to success is to undertake de-normalization as a design technique very cautiously and consciously. Do not let proliferation of the technique take over your data warehouse or you will end up with a single big flat file!
What is De-Normalization?
It is not chaos, more like a “controlled crash” with the aim of performance enhancement without loss of information.
Normalization is a rule of thumb in DBMS, but in DSS ease of use is achieved by way of denormalization.
De-normalization comes in many flavors, such as combining tables, splitting tables, adding data etc., but all done very carefully.
‘Denormalization’ does not mean that anything and everything goes. Denormalization does not mean chaos or disorder or indiscipline. The development of properly denormalized data structures follows software engineering principles, which insure that information will not be lost. De-normalization is the process of selectively transforming normalized relations into un-normalized physical record specifications, with the aim of reducing query processing time. Another fundamental purpose of denormalization is to reduce the number of physical tables, that must be accessed to retrieve the desired data by reducing the number of joins required to answer a query. Some people tend to confuse dimensional modeling with de-normalization. This will become clear when we will cover dimensional modeling, where indeed tables are collapsed together.
Why De-Normalization in DSS?
Virtual University of Pakistan
consider whether the physical records should exactly match the normalized relations for a DSS or not?
How De-Normalization improves performance?
De-normalization specifically improves performance by either:
Reducing the number of tables and hence the reliance on joins, which consequently speeds up performance.
Reducing the number of joins required during query execution, or
Reducing the number of rows to be retrieved from the Primary Data Table.
The higher the level of normalization, the greater will be the number of tables in the DSS as the depth of snowflake schema would increase. The greater the number of tables in the DSS, the more joins are necessary for data manipulation. Joins slow performance, especially for very large tables for large data extractions, which is a norm in DSS not an exception. De-normalization reduces the number of tables and hence the reliance on joins, which consequently speeds up performance.
De-normalization can help minimize joins and foreign keys and help resolve aggregates. By storing values that would otherwise need to be retrieved (repeatedly), one may be able to reduce the number of indexes and even tables required to process queries.
4 Guidelines for De-normalization
Virtual University of Pakistan
1. Balance the frequency of use of the data items in question, the cost of additional storage to duplicate the data, and the acquisition time of the join.
2. Understand how much data is involved in the typical query; the amount of data affects the amount of redundancy and additional storage requirements.
3. Remember that redundant data is a performance benefit at query time, but is a performance liability at update time because each copy of the data needs to be kept up to date. Typically triggers are written to maintain the integrity of the duplicated data.
4. De-normalization usually speeds up data retrieval, but it can slow the data modification processes. It may be noted that both on-line and batch system performance is adversely affected by a high degree of de-normalization. Hence the golden rule is: When in doubt, don’t denormalize.
Areas for Applying De-Normalization Techniques
Dealing with the abundance of star schemas.
Fast access of time series data for analysis.
Fast aggregate (sum, average etc.) results and complicated calculations.
Virtual University of Pakistan
Now we will discuss de-normalization techniques that have been commonly adopted by experienced database designers. These techniques can be classified into four prevalent strategies for denormalization which are:
Collapsing Tables
Virtual University of Pakistan
Figure-7.1: Collapsing Tables
1. Collapsing Tables One of the most common and safe denormalization techniques is combining of One-to- One relationships. This situation occurs when for each row of entity A, there is only one related row in entity B. While the key attributes for the entities may or may not be the same, their equal participation in a relationship indicates that they can be treated as a single unit. For example, if users frequently need to see COLA, COLB, and COLC together and the data from the two tables are in a One-to-One relationship, the solution is to collapse the two tables into one. For example, SID and gender in one table, and SID and degree in the other table.
In general, collapsing tables in One-to-One relationship has fewer drawbacks than others. There are several advantages of this technique, some of the obvious ones being reduced storage space, reduced amount of time for data update, some of the other not so apparent advantages are reduced number of foreign keys on tables, reduced number of indexes (since most indexes are created based on primary/foreign keys). Furthermore, combining