Data Warehouse Development: Inmon and Kimball Models, Lecture notes of Computer science

An in-depth analysis of data warehouse development, focusing on the approaches of bill inmon and ralph kimball. It discusses the benefits of data warehouses, the concept of dimensional modeling, and the implementation of star and snowflake schemas. The document also compares the top-down edw approach of inmon with the bottom-up data mart strategy of kimball.

Typology: Lecture notes

2021/2022

Uploaded on 03/30/2024

jamila-babaeva
jamila-babaeva 🇦🇿

1 document

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Ministry of Education of Azerbaijan Republic
Azerbaijan University of Architecture and Construction
Faculty: Mechanics and Information Technology
Department: Information Technologies and Systems
Discipline: Decision support systems
Topic 6: Data warehouse development
Lecturer: Ph.D. in mathematics, assistant-prof. Asadova Jamila Abulfaz
PLAN OF THE LECTURE
1. Approaches to data warehouse development: Inmon and Kimball Models
2. Data representation in a data warehouse
3. Data analysis in a data warehouse
3.1.OLAP operations
3.2.OLAP variants: ROLAP, MOLAP and HOLAP
REFERENCES
1. 1.Decision Support Systems and Intelligent Systems, 8th Edition, Efraim Turban, Jay E.
Aronson, Ting-Peng Liang, & Ramesh Sharda. Prentice Hall, 2008;
ISBN 0-13-198660-0
Textbook WebSite:
http://wps.prenhall.com/bp_turban_dsbis_8/59/15113/3869054.cw/index.html
2. Andriole, S., Handbook of Decision Support Systems, TAB Books, Inc., 1989.
3. Buede,D. The Engineering Design of Systems: Models and Methods. New York: Wiley,
2000.
4. Marakas, G. Decision Support Systems, Prentice-Hall, 2003.
5. Sprague, R.H. and Carlson, E.D. Building Effective Decision Support Systems.
Englewood Cliffs, NJ: Prentice Hall, 1982.
Baku 2022
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Data Warehouse Development: Inmon and Kimball Models and more Lecture notes Computer science in PDF only on Docsity!

Ministry of Education of Azerbaijan Republic

Azerbaijan University of Architecture and Construction

Faculty: Mechanics and Information Technology

Department: Information Technologies and Systems

Discipline: Decision support systems

Topic 6: Data warehouse development

Lecturer: Ph.D. in mathematics, assistant-prof. Asadova Jamila Abulfaz

PLAN OF THE LECTURE

**1. Approaches to data warehouse development: Inmon and Kimball Models

  1. Data representation in a data warehouse
  2. Data analysis in a data warehouse**

3.1.OLAP operations 3.2.OLAP variants: ROLAP, MOLAP and HOLAP

REFERENCES

  1. 1.Decision Support Systems and Intelligent Systems, 8th Edition, Efraim Turban, Jay E. Aronson, Ting-Peng Liang, & Ramesh Sharda. Prentice Hall, 2008; ISBN 0-13-198660- Textbook WebSite: http://wps.prenhall.com/bp_turban_dsbis_8/59/15113/3869054.cw/index.html
  2. Andriole, S., Handbook of Decision Support Systems, TAB Books, Inc., 1989.
  3. Buede,D. The Engineering Design of Systems: Models and Methods. New York: Wiley,
  4. Marakas, G. Decision Support Systems, Prentice-Hall, 2003.
  5. Sprague, R.H. and Carlson, E.D. Building Effective Decision Support Systems. Englewood Cliffs, NJ: Prentice Hall, 1982.

Baku – 2022

TOPIC 6. DATA WAREHOUSE DEVELOPMENT

A data warehousing project is a major undertaking for any organization and is more complicated than a simple, mainframe selection and implementation project because it comprises and influences many departments and many input and output interfaces and it can be part of a CRM business strategy. A data warehouse provides several benefits that can be classified as direct and indirect. Direct benefits include the following:

  • End users can perform extensive analysis in numerous ways.
  • A consolidated view of corporate data (i.e., a single version of the truth) is possible.
  • Better and more timely information is possible. A data warehouse permits information processing to be relieved from costly operational systems onto low-cost servers; therefore, many more end-user information requests can be processed more quickly.
  • Enhanced system performance can result. A data warehouse frees production processing because some operational system reporting requirements are moved to DSS.
  • Data access is simplified. Indirect benefits result from end users using these direct benefits. On the whole, these benefits enhance business knowledge, present competitive advantage, improve customer service and satisfaction, facilitate decision making, and help in reforming business processes ; therefore, they are the strongest contributions to competitive advantage. 1. Approaches to Data Warehouse Development: Inmon and Kimball Models Many organizations need to create the data warehouses used for decision support. Two competing approaches are employed. The first approach is that of Bill Inmon, who is often called "the father of data warehousing. " Inmon supports a top-down development approach that adapts traditional relational database tools to the development needs of an enterprise-wide data warehouse, also known as the EDW approach. The second approach is that of Ralph Kimball, who proposes a bottom-up approach that employs dimensional modeling, also known as the data mart approach. Knowing how these two models are alike and how they differ helps us understand the basic data warehouse concepts. Table 3.3 compares the two approaches. We describe these approaches in detail next.

TABLE 3.3 Contrasts Between the Data Mart and EDW Development Approaches

WHICH MODEL IS BEST? There is no one-size-fits-all strategy to data warehousing. An enterprise's data warehousing strategy can evolve from a simple data mart to a complex data warehouse in response to user demands, the enterprise's business requirements, and the enterprise's maturity in managing its data resources. For many enterprises, a data mart is frequently a convenient first step to acquiring experience in constructing and managing a data warehouse while presenting business users with the benefits of better access to their data; in addition, a data mart commonly indicates the business value of data warehousing. Ultimately, engineering an EDW that consolidates old data marts and data warehouses is the ideal solution. However, the development of individual data marts can often provide many benefits along the way toward developing an EDW, especially if the organization is unable or unwilling to invest in a large-scale project. Table 3.4 summarizes the most essential characteristic differences between the two models.

2. REPRESENTATION OF DATA IN DATA WAREHOUSE

No matter what the architecture was, the design of data representation in the data warehouse has always been based on the concept of dimensional modeling. Dimensional modeling is a retrieval-based system that supports high-volume query access. Representation and storage of data in a data warehouse should be designed in such a way that not only accommodates but also boosts the processing of complex multidimensional queries. Often, the star schema and the snowflakes schema are the means by which dimensional modeling is implemented in data warehouses. The star schema (sometimes referenced as star join schema) is the most commonly used and the simplest style of dimensional modeling. A star schema contains a central fact table surrounded by and connected to several dimension tables.

dimension s. In the snow flake schema, however, dimen sions are normalized into multiple related tables whereas the star schema's dimensions are denormalized with each dimension being represented by a single table.

FIGURE 3(b) the Snowflake Schema.

3. ANALYSIS OF DATA IN THE DATA WAREHOUSE Once the data is properly stored in a data warehouse, it can be used in various ways to support organizational decision making. OLAP (online analytical processing) is arguably the most commonly used data analysis technique in data warehouses, and it has been growing in popularity due to the exponential increase in data volumes and the recognition of the business value of data-driven analytics. Simply, OLAP is an approach to quickly answer ad hoc questions by executing multidimensional analytical queries against organizational data repositories (i.e., data warehouses, data marts).

OLAP Versus OLTP OLTP (online transaction processing system) is a term used for a transaction system , which is primarily responsible for capturing and storing data related to day- to-day business functions such as ERP, CRM, SCM, point of sale, and so forth. The OLTP system addresses a critical business need , automating daily business transactions and running real-time reports and routine analyses. But these

systems are not designed for ad hoc analysis and complex queries that deal with a number of data items. OLAP, on the other hand, is designed to address this need by providing ad hoc analysis of organizational data much more effectively and efficiently. OLAP and OLTP rely heavily on each other : OLAP uses the data captures by OLTP, and OLTP automates the business processes that are managed by decisions supported by OLAP. Table 3.5 provides a multi-criteria comparison between OLTP and OLAP.

3.1. OLAP Operations The main operational structure in OLAP is based on a concept called cube. A cube in OLAP is a multidimensional data structure (actual or virtual) that allows fast analysis of data.

It can also be defined as the capability of efficiently manipulating and analyzing data from multiple perspectives. The arrangement of data into cubes aims to overcome a limitation of relational databases :

FIGURE 3. Slicing Operations on a Simple Three-Dimensional Data Cube.

3.2. VARIATIONS OF OLAP: ROLAP, MOLAP, and HOLAP OLAP has a few variations; among them ROLAP, MOLAP, and HOLAP are the most common ones. ROLAP stands for Relational Online Analytical Processing. ROLAP is an alternative to the MOLAP ( Multidimensional OLAP ) technology. Although both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the precomputation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) that summarize the data at any desired combination of dimensions. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database that was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. MOLAP is an alternative to the ROLAP technology. MOLAP differs from ROLAP significantly in that it requires the precomputation and storage of information in the cube -the operation known as preprocessing. MOLAP stores this data in an optimized multidimensional array storage , rather than in a relational database (which is often the case for ROLAP).

The undesirable trade-off between ROLAP and MOLAP with regards to the additional ETL (extract, transform, and load) cost and slow query performance has led to inquiries for better approaches where the pros and cons of these two approaches are optimized. These inquiries resulted in HOLAP (Hybrid Online Analytical Processing) , which is a combination of ROLAP and MOLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store. The degree of control that the cube designer has over this partitioning varies from product to product.

FIGURE 3.12 Enterprise Decision Evolution. Source: Courtesy of Teradata Corporation.

REVIEW QUESTIONS

  1. List the benefits of data warehouses.
  2. Describe and compare the main approaches to developing data warehouses: Inmon and Kimball models
  3. Describe the means by which dimensional modeling is implemented in data warehouses ("star" and "snowflake").
  4. What is OLAP and how does it differ from OLTP?
  5. What is a cube? What do drill down, roll up, pivoting, and slice and dice mean?