Data Warehousing – Complete Chapter Notes | Architecture, Dimensional Modelling, Schemas, Study notes of Basics of Data Warehousing

These are detailed 4th-year Honours level chapter notes on Data Warehousing, forming part of a complete Business Intelligence Systems course, authored by Vhuhwavho Phungo. This chapter covers both the theoretical foundations and technical architecture of data warehousing, making it an essential resource for understanding how organisations store, manage, and analyse large volumes of business data. Topics covered include: Data Warehouse Definition & Characteristics Data Warehouse Architecture ETL Processes Data Marts These notes are well-structured and ideal for Honours students preparing for exams, completing assignments, or building a solid understanding of data warehousing within a Business Intelligence context.

Typology: Study notes

2023/2024

Available from 05/28/2026

vhuhwavho-phungo
vhuhwavho-phungo 🇿🇦

8 documents

1 / 38

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Warehousing
Business Intelligence
Systems: A Managerial
Approach
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26

Partial preview of the text

Download Data Warehousing – Complete Chapter Notes | Architecture, Dimensional Modelling, Schemas and more Study notes Basics of Data Warehousing in PDF only on Docsity!

Data Warehousing

Business Intelligence

Systems: A Managerial

Approach

Learning Objectives

Understand the basic definitions and concepts of data warehouses

Learn different types of data warehousing architectures; their comparative advantages and disadvantages

Describe the processes used in developing and managing data warehouses

Explain data warehousing operations

Explain the role of data warehouses in

What is a Data Warehouse?

 A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format  “The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile (slowly changing dimensions) and relevant to some moment in time”

Characteristics of DW

Subject oriented

Integrated/non-segregated/dependent

Time-variant (time series)

Nonvolatile

Summarized

Not normalized

Metadata

Web based, relational/multi- dimensional

Client/server

Real-time and/or right-time (active)

Data Warehousing Definitions

Operational data stores (ODS) A type of database often used as an interim area for a data warehouse

Open marts An operational data mart

Enterprise data warehouse (EDW) A data warehouse for the enterprise

Metadata Data about data. In a data warehouse, metadata describe the contents of a data warehouse and the manner of its acquisition and use

DW Framework

Data Sources ERP Legacy POS Other OLTP/wEB External data Select Transform Extract Integrate Load

ETL

Process Enterprise Data warehouse Metadata Replication A P I /^ Middleware Data/text mining Custom built applications

OLAP,

Dashboard, Web Routine Business Reporting Applications (Visualization) Data mart (Engineering) Data mart (Marketing) Data mart (Finance) Data mart (...) Access No data marts option

DW Architectures

Tier 2 : Application server Tier 1 : Client workstation Tier 3 : Database server Tier 1 : Client workstation Tier 2 : Application & database server 3-tier architectur e 2-tier architectur e 1-tier Architectur e

A Web-based DW Architecture

Web

Server

Client

(Web browser)

Application

Server

Data

warehouse

Web pages

Internet/
Intranet/
Extranet

Alternative DW Architectures

Source Systems Staging Area Independent data marts (atomic/summarized data) End user access and applications

ETL

(a) Independent Data Marts Architecture Source Systems Staging Area End user access and applications

ETL

Dimensionalized data marts linked by conformed dimentions (atomic/summarized data) (b) Data Mart Bus Architecture with Linked Dimensional Datamarts Source Systems Staging Area End user access and applications

ETL

Normalized relational warehouse (atomic data) Dependent data marts (summarized/some atomic data) (c) Hub and Spoke Architecture (Corporate Information Factory)

Alternative DW Architectures

Source Systems Staging Area Normalized relational warehouse (atomic/some summarized data) End user access and applications

ETL
(d) Centralized Data Warehouse Architecture

End user access and applications Logical/physical integration of common data elements Existing data warehouses Data marts and legacy systmes

Data mapping / metadata
(e) Federated Architecture

Data Integration and the Extraction, Transformation, and Load (ETL) Process

Data integration Integration that comprises three major processes: data access , data federation (data virtualization - ability to aggregate data from disparate sources in a virtual database), and change capture

Enterprise application integration (EAI) A technology that provides a vehicle for pushing data from source systems into a data warehouse

Enterprise information integration (EII) An evolving tool space that provides real-time data integration from a variety of sources, such as relational databases, Web services, and

Extraction, transformation, and load (ETL) Data Integration and the Extraction, Transformation, and Load (ETL) Process Packaged application Legacy system Other internal applications Transient data source Extract Transform Cleanse Load Data warehouse Data mart

Data Warehouse Development

Data warehouse development approaches  Inmon Model : EDW approach (top-down) - normalized data model designed 1

st

> DW > then extract dimensional data marts for specific departments.  Kimball Model : Data mart approach (bottom- up) -data marts 1

st

, combine them to DW  Which model is best?  (^) There is no one-size-fits-all strategy to DW

One alternative is the hosted warehouse

Data warehouse structure:  The Star Schema vs. Relational

Real-time data warehousing?

Hosted Data Warehouses

 Benefits:  Requires minimal investment in infrastructure  Frees up capacity on in-house systems  Frees up cash flow  Makes powerful solutions affordable  Enables powerful solutions that provide for growth  Offers better quality equipment and software  Provides faster connections  Enables users to access data remotely  Allows a company to focus on core business