Introduction to Parallel and Distributed Databases, Summaries of Computer Science

Introduction to Parallel and Distributed Databases

Typology: Summaries

2025/2026

Uploaded on 01/25/2026

sean-mahwire-1
sean-mahwire-1 🇿🇼

3 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Page 1
CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.1
Outline
nIntroduction & architectural issues
lWhat is a distributed DBMS
lProblems
lCurrent state-of-affairs
qData distribution
qDistributed query processing
qDistributed query optimization
qDistributed transactions & concurrency control
qDistributed reliability
qDatabase replication
qParallel database systems
qDatabase integration & querying
qAdvanced topics
CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.2
File Systems
program 1
data description 1
program 2
data description 2
program 3
data description 3
File 1
File 2
File 3
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Introduction to Parallel and Distributed Databases and more Summaries Computer Science in PDF only on Docsity!

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Outline

n Introduction & architectural issues

l What is a distributed DBMS l Problems l Current state-of-affairs

q Data distribution

q Distributed query processing

q Distributed query optimization

q Distributed transactions & concurrency control

q Distributed reliability

q Database replication

q Parallel database systems

q Database integration & querying

q Advanced topics

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

File Systems

program 1

data description 1

program 2

data description 2

program 3

data description 3

File 1

File 2

File 3

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Database Management

database

DBMS

Application

program 1

(with data

semantics)

Application

program 2

(with data

semantics)

Application

program 3

(with data

semantics)

description

manipulation

control

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Motivation

Database

Technology

Computer

Networks

integration distribution

integration

integration ≠ centralization

Distributed

Database

Systems

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

What is not a DDBS?

n A timesharing computer system

n A loosely or tightly coupled multiprocessor

system

n A database system which resides at one of the

nodes of a network of computers - this is a

centralized database on a network node

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Centralized DBMS on a

Network

Site 5

Site 1

Site 2

Site 4^ Site 3

Communication

Network

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Distributed DBMS

Environment

Site 5

Site 1

Site 2

Site 4 Site 3

Communication

Network

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Implicit Assumptions

n Data stored at a number of sites à each site

logically consists of a single processor.

n Processors at different sites are interconnected

by a computer network à not a multiprocessor

system

l Parallel database systems

n Distributed database is a database, not a

collection of files à data logically related as

exhibited in the users’ access patterns

l Relational data model

n D-DBMS is a full-fledged DBMS

l Not remote file system, not a TP system

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Transparency

n Transparency is the separation of the higher

level semantics of a system from the lower

level implementation issues.

n Fundamental issue is to provide

data independence

in the distributed environment

l Network (distribution) transparency l Replication transparency l Fragmentation transparency u horizontal fragmentation: selection u vertical fragmentation: projection u hybrid Ch.x/ 13 CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Example

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Transparent Access

SELECT ENAME,SAL

FROM EMP,ASG,PAY

WHERE DUR > 12

AND EMP.ENO = ASG.ENO

AND PAY.TITLE = EMP.TITLE Paris projects Paris employees Paris assignments Boston employees Montreal projects Paris projects New York projects with budget > 200000 Montreal employees Montreal assignments Boston Communication Network Montreal Paris New York Boston projects Boston employees Boston assignments Boston projects New York employees New York projects New York assignments Tokyo CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Distributed Database - User

View

Distributed Database

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Reliability Through

Transactions

n Replicated components and data should make distributed

DBMS more reliable.

n Distributed transactions provide

l Concurrency transparency l Failure atomicity

• Distributed transaction support requires implementation of

l Distributed concurrency control protocols l Commit protocols

n Data replication

l Great for read-intensive workloads, problematic for updates l Replication protocols CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Potentially Improved

Performance

n Proximity of data to its points of use

l Requires some support for fragmentation and replication

n Parallelism in execution

l Inter-query parallelism l Intra-query parallelism

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Parallelism Requirements

n Have as much of the data required by each

application at the site where the application

executes

l Full replication

n How about updates?

l Mutual consistency l Freshness of copies CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

System Expansion

n Issue is database scaling

n Emergence of microprocessor and workstation

technologies

l Demise of Grosh's law l Client-server model of computing

n Data communication cost vs

telecommunication cost

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1. Directory Management

Relationship Between Issues

Reliability Deadlock Management Query Processing Concurrency Control Distribution Design CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Related Issues

n Operating System Support

l Operating system with proper support for database operations l Dichotomy between general purpose processing requirements and database processing requirements

n Open Systems and Interoperability

l Distributed Multidatabase Systems l More probable scenario l Parallel issues

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Architecture

n Defines the structure of the system

l components identified l functions of each component defined l interrelationships and interactions between components defined CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

ANSI/SPARC Architecture

External Schema Conceptual Schema Internal Schema Internal view Users External view Conceptual view External view External view

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Dimensions of the Problem

n Distribution

l Whether the components of the system are located on the same machine or not

n Heterogeneity

l Various levels (hardware, communications, operating system) l DBMS important one u data model, query language,transaction management algorithms

n Autonomy

l Not well understood and most troublesome l Various versions u Design autonomy: Ability of a component DBMS to decide on issues related to its own design. u Communication autonomy: Ability of a component DBMS to decide whether and how to communicate with other DBMSs. u Execution autonomy: Ability of a component DBMS to execute local operations in any manner it wants to. CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Client/Server Architecture

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Advantages of Client-Server

Architectures

n More efficient division of labor

n Horizontal and vertical scaling of resources

n Better price/performance on client machines

n Ability to use familiar tools on client machines

n Client access to remote data (via standards)

n Full DBMS functionality provided to client

workstations

n Overall better system price/performance

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Database Server

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Peer-to-Peer Component

Architecture

Database

USER PROCESSOR DATA PROCESSOR

USER User requests System responses External Schema User Interface Handler Global Conceptual Schema Semantic Data Controller^ Global Execution^ Monitor System Log Local Recovery Manager Local Internal Schema Local Query^ Processor^ Runtime^ Support^ Processor Local Conceptual Schema Global Query Optimizer GD/D CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Datalogical Multi-DBMS

Architecture

…^ GCS …

GES 1

LCS 2 … LCS n

LIS 2^ … LIS n

LES 11 LES 1 n LES n 1 LES nm GES 2 GES n LIS 1

LCS 1

CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

MDBS Components & Execution

Multi-DBMS Layer DBMS 1 DBMS 2 DBMS 3 Global User Request Local User Request Global Subrequest Global Subrequest Global Subrequest Local User Request CS742 – Distributed & Parallel DBMS M. Tamer Özsu Page 1.

Mediator/Wrapper Architecture