






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth analysis of data warehouse development, focusing on the approaches of bill inmon and ralph kimball. It discusses the benefits of data warehouses, the concept of dimensional modeling, and the implementation of star and snowflake schemas. The document also compares the top-down edw approach of inmon with the bottom-up data mart strategy of kimball.
Typology: Lecture notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







Ministry of Education of Azerbaijan Republic
Azerbaijan University of Architecture and Construction
Faculty: Mechanics and Information Technology
Department: Information Technologies and Systems
Discipline: Decision support systems
Topic 6: Data warehouse development
Lecturer: Ph.D. in mathematics, assistant-prof. Asadova Jamila Abulfaz
PLAN OF THE LECTURE
**1. Approaches to data warehouse development: Inmon and Kimball Models
3.1.OLAP operations 3.2.OLAP variants: ROLAP, MOLAP and HOLAP
REFERENCES
Baku – 2022
A data warehousing project is a major undertaking for any organization and is more complicated than a simple, mainframe selection and implementation project because it comprises and influences many departments and many input and output interfaces and it can be part of a CRM business strategy. A data warehouse provides several benefits that can be classified as direct and indirect. Direct benefits include the following:
TABLE 3.3 Contrasts Between the Data Mart and EDW Development Approaches
WHICH MODEL IS BEST? There is no one-size-fits-all strategy to data warehousing. An enterprise's data warehousing strategy can evolve from a simple data mart to a complex data warehouse in response to user demands, the enterprise's business requirements, and the enterprise's maturity in managing its data resources. For many enterprises, a data mart is frequently a convenient first step to acquiring experience in constructing and managing a data warehouse while presenting business users with the benefits of better access to their data; in addition, a data mart commonly indicates the business value of data warehousing. Ultimately, engineering an EDW that consolidates old data marts and data warehouses is the ideal solution. However, the development of individual data marts can often provide many benefits along the way toward developing an EDW, especially if the organization is unable or unwilling to invest in a large-scale project. Table 3.4 summarizes the most essential characteristic differences between the two models.
No matter what the architecture was, the design of data representation in the data warehouse has always been based on the concept of dimensional modeling. Dimensional modeling is a retrieval-based system that supports high-volume query access. Representation and storage of data in a data warehouse should be designed in such a way that not only accommodates but also boosts the processing of complex multidimensional queries. Often, the star schema and the snowflakes schema are the means by which dimensional modeling is implemented in data warehouses. The star schema (sometimes referenced as star join schema) is the most commonly used and the simplest style of dimensional modeling. A star schema contains a central fact table surrounded by and connected to several dimension tables.
dimension s. In the snow flake schema, however, dimen sions are normalized into multiple related tables whereas the star schema's dimensions are denormalized with each dimension being represented by a single table.
FIGURE 3(b) the Snowflake Schema.
3. ANALYSIS OF DATA IN THE DATA WAREHOUSE Once the data is properly stored in a data warehouse, it can be used in various ways to support organizational decision making. OLAP (online analytical processing) is arguably the most commonly used data analysis technique in data warehouses, and it has been growing in popularity due to the exponential increase in data volumes and the recognition of the business value of data-driven analytics. Simply, OLAP is an approach to quickly answer ad hoc questions by executing multidimensional analytical queries against organizational data repositories (i.e., data warehouses, data marts).
OLAP Versus OLTP OLTP (online transaction processing system) is a term used for a transaction system , which is primarily responsible for capturing and storing data related to day- to-day business functions such as ERP, CRM, SCM, point of sale, and so forth. The OLTP system addresses a critical business need , automating daily business transactions and running real-time reports and routine analyses. But these
systems are not designed for ad hoc analysis and complex queries that deal with a number of data items. OLAP, on the other hand, is designed to address this need by providing ad hoc analysis of organizational data much more effectively and efficiently. OLAP and OLTP rely heavily on each other : OLAP uses the data captures by OLTP, and OLTP automates the business processes that are managed by decisions supported by OLAP. Table 3.5 provides a multi-criteria comparison between OLTP and OLAP.
3.1. OLAP Operations The main operational structure in OLAP is based on a concept called cube. A cube in OLAP is a multidimensional data structure (actual or virtual) that allows fast analysis of data.
It can also be defined as the capability of efficiently manipulating and analyzing data from multiple perspectives. The arrangement of data into cubes aims to overcome a limitation of relational databases :
FIGURE 3. Slicing Operations on a Simple Three-Dimensional Data Cube.
3.2. VARIATIONS OF OLAP: ROLAP, MOLAP, and HOLAP OLAP has a few variations; among them ROLAP, MOLAP, and HOLAP are the most common ones. ROLAP stands for Relational Online Analytical Processing. ROLAP is an alternative to the MOLAP ( Multidimensional OLAP ) technology. Although both ROLAP and MOLAP analytic tools are designed to allow analysis of data through the use of a multidimensional data model, ROLAP differs significantly in that it does not require the precomputation and storage of information. Instead, ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) that summarize the data at any desired combination of dimensions. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database that was designed for OLTP will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. MOLAP is an alternative to the ROLAP technology. MOLAP differs from ROLAP significantly in that it requires the precomputation and storage of information in the cube -the operation known as preprocessing. MOLAP stores this data in an optimized multidimensional array storage , rather than in a relational database (which is often the case for ROLAP).
The undesirable trade-off between ROLAP and MOLAP with regards to the additional ETL (extract, transform, and load) cost and slow query performance has led to inquiries for better approaches where the pros and cons of these two approaches are optimized. These inquiries resulted in HOLAP (Hybrid Online Analytical Processing) , which is a combination of ROLAP and MOLAP. HOLAP allows storing part of the data in a MOLAP store and another part of the data in a ROLAP store. The degree of control that the cube designer has over this partitioning varies from product to product.
FIGURE 3.12 Enterprise Decision Evolution. Source: Courtesy of Teradata Corporation.