Parallel and Distributed Databases-Introduction to Database Systems-Lecture 24 Slides-Computer Science, Slides of Introduction to Database Management Systems

Parallel and Distributed Databases, Mapreduce, Dynamo, Peer-to-peer, Grep, Sort, Inverted Indexes, Clustering, Consistency, Anti-entropy, Epidemic Replication, Napster, Gnutella, Data-centric, Scalability, Replicate Information, Breadth-first Search, Hash Tables, Chord, Joining, Inserting

Typology: Slides

2011/2012

Uploaded on 02/12/2012

dylanx
dylanx 🇺🇸

4.7

(21)

286 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Parallel and distributed databases
R & G Chapter 22
What is a distributed database?
Why distribute a database
Scalability and performance
Resilience to failures
Throughput
Data size
versus
XX
Why distribute a database
Data is already distributed
Or needs to be distributed
Data is in multiple systems
Why not distribute a database
You must earn your complexity!
Communication needed
Must build a complex infrastructure
Unpredictable latencies must be masked
More types of failures
More components to fail
Network failures
Congestion, timeouts
More complex planning
Communication cost plus I/O cost
May have to deal with heterogeneity
Different types of systems
Different schemas, possibly incompatible
Different administrative domains
Types of distributed databases
pf3
pf4
pf5

Partial preview of the text

Download Parallel and Distributed Databases-Introduction to Database Systems-Lecture 24 Slides-Computer Science and more Slides Introduction to Database Management Systems in PDF only on Docsity!

Parallel and distributed databases

R & G Chapter 22

What is a distributed database? Why distribute a database

 Scalability and performance

 Resilience to failures

Throughput^ Data size

X^ versus X

Why distribute a database

 Data is already distributed

 Or needs to be distributed

 Data is in multiple systems

Why not distribute a database You must earn your complexity!

 Communication needed

 Must build a complex infrastructure

 Unpredictable latencies must be masked

 More types of failures

 More components to fail

  Network failuresCongestion, timeouts

 More complex planning

 Communication cost plus I/O cost

 May have to deal with heterogeneity

 Different types of systems

  Different schemas, possibly incompatibleDifferent administrative domains

Types of distributed databases

The old days: mainframes

Definitely not distributed!

Client-server

User interaction Data processing Network

Parallel database Primary/secondary

X

Multidatabase How do they work?

 What is shared?

 How to distribute the data?

 How to process the data?

 How to update the data?

Query processing

 Intra-operator parallelism

 Inter-operator parallelism

Parallel scanning filter filter filter filter filter filter Result Sorting Sorting Parallel hash join Hash() Join

Semi-join Inter-operator parallelism Updating distributed data

 Synchronous: read-any-write-all

Reads are fast

Updating distributed data

 Synchronous: voting

Updating distributed data

 Synchronous: voting

Writes tolerant to disconnection

Consistency of distributed data

 Should provide ACID