Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Distributed Databases-Database Management Systems-Lecture 25 Slides-Computer Science, Slides of Database Management Systems (DBMS)

University of Wisconsin (UW) - Madison Database Management Systems (DBMS)

Distributed Databases, Distributed DBMS Architectures, Storing Data, Distributed Catalog Management, Distributed Queries, Distributed Joins, Semijoin, Bloomjoin, Distributed Query Optimization, Updating Distributed Data, Synchronous Replication, Asynchronous Replication, Peer-to-peer Replication, Data Warehousing, Distributed Locking, Distributed Recovery, Two-phase Commit, Blocking, Database Management Systems, Raghu Ramakrishnan, Lecture Slides, Computer Science, University of Wisconsin, Unite

Typology: Slides

2011/2012

Uploaded on 02/15/2012

arien 🇺🇸

4.8

(24)

309 documents

1 / 11

This page cannot be seen from the preview

Don't miss anything!

Database Management Systems, 2nd Edition. R. Ramakrishnanand Johannes Gehrke 1

Distributed Databases

Chapter 22, Part B

Database Management Systems, 2nd Edition. R. Ramakrishnanand Johannes Gehrke 2

Introduction

YData is stored at several sites, each managed

by a DBMS that can run independently.

YDistributed Data Independence: Users

should not have to know where data is

located (extends Physical and Logical Data

Independence principles).

YDistributed Transaction Atomicity: Users

should be able to write Xacts accessing

multiple sites just like local Xacts.

Database Management Systems, 2nd Edition. R. Ramakrishnanand Johannes Gehrke 3

Recent Trends

YUsers have to be aware of where data is

located, i.e., Distributed Data Independence

and Distributed Transaction Atomicity are

not supported.

YThese properties are hard to support

efficiently.

YFor globally distributed sites, these properties

may not even be desirable due to

administrative overheads of making location

of data transparent.

Discover Slides of Database Management Systems (DBMS) University of Wisconsin (UW) - Madison

Partial preview of the text

Download Distributed Databases-Database Management Systems-Lecture 25 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 1

Distributed Databases

Chapter 22, Part B

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 2

Introduction

Y Data is stored at several sites, each managed by a DBMS that can run independently. Y Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence principles). Y Distributed Transaction Atomicity: Users should be able to write Xacts accessing multiple sites just like local Xacts.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 3

Recent Trends

Y Users have to be aware of where data is located, i.e., Distributed Data Independence and Distributed Transaction Atomicity are not supported. Y These properties are hard to support efficiently. Y For globally distributed sites, these properties may not even be desirable due to administrative overheads of making location of data transparent.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 4

Types of Distributed Databases

Y Homogeneous: Every site runs same type of DBMS. Y Heterogeneous: Different sites run different DBMSs (different RDBMSs or even non- relational DBMSs).

DBMS1 DBMS2 DBMS

Gateway

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 5

Distributed DBMS Architectures

Y Client-Server

Y Collaborating-Server

CLIENT CLIENT

SERVER SERVER SERVER

QUERY

SERVER

SERVER QUERY

Client ships query to single site. All query processing at server.

Thin vs. fat clients.
Set-oriented communication, client side caching.

Query can span multiple sites.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 6

Storing Data

Y Fragmentation

Horizontal: Usually disjoint.
Vertical: Lossless-join; tids. Y Replication
Gives increased availability.
Faster query evaluation.
Synchronous vs. Asynchronous. X Vary in how current copies are.

TID t t t t

R1 R

SITE A SITE B

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 10

Semijoin

Y At London, project Sailors onto join columns and ship this to Paris. Y At Paris, join Sailors projection with Reserves.

Result is called reduction of Reserves wrt Sailors. Y Ship reduction of Reserves to London. Y At London, join Sailors with reduction of Reserves. Y Idea: Tradeoff the cost of computing and shipping projection and computing and shipping projection for cost of shipping full Reserves relation. Y Especially useful if there is a selection on Sailors, and answer desired at London.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 11

Bloomjoin

Y At London, compute a bit-vector of some size k:

Hash join column values into range 0 to k-1.
If some tuple hashes to I, set bit I to 1 (I from 0 to k-1).
Ship bit-vector to Paris. Y At Paris, hash each tuple of Reserves similarly, and discard tuples that hash to 0 in Sailors bit-vector.
Result is called reduction of Reserves wrt Sailors. Y Ship bit-vector reduced Reserves to London. Y At London, join Sailors with reduced Reserves. Y Bit-vector cheaper to ship, almost as effective.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 12

Distributed Query Optimization

Y Cost-based approach; consider all plans, pick cheapest; similar to centralized optimization.

Difference 1: Communication costs must be considered.
Difference 2: Local site autonomy must be respected.
Difference 3: New distributed join methods. Y Query site constructs global plan, with suggested local plans describing processing at each site.
If a site can improve suggested local plan, free to do so.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 13

Updating Distributed Data

Y Synchronous Replication: All copies of a modified relation (fragment) must be updated before the modifying Xact commits.

Data distribution is made transparent to users. Y Asynchronous Replication: Copies of a modified relation are only periodically updated; different copies may get out of synch in the meantime.
Users must be aware of data distribution.
Current products follow this approach.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 14

Synchronous Replication

Y Voting: Xact must write a majority of copies to modify an object; must read enough copies to be sure of seeing at least one most recent copy.

E.g., 10 copies; 7 written for update; 4 copies read.
Each copy has version number.
Not attractive usually because reads are common. Y Read-any Write-all: Writes are slower and reads are faster, relative to Voting.
Most common approach to synchronous replication. Y Choice of technique determines which locks to set.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 15

Cost of Synchronous Replication

Y Before an update Xact can commit, it must obtain locks on all modified copies.

Sends lock requests to remote sites, and while waiting for the response, holds on to other locks!
If sites or links fail, Xact cannot commit until they are back up.
Even if there is no failure, committing must follow an expensive commit protocol with many msgs. Y So the alternative of asynchronous replication is becoming widely used.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 19

Implementing the Capture Step

Y Log-Based Capture: The log (kept for recovery) is used to generate a Change Data Table (CDT).

If this is done when the log tail is written to disk, must somehow remove changes due to subsequently aborted Xacts. Y Procedural Capture: A procedure that is automatically invoked (trigger; more later!) does the capture; typically, just takes a snapshot. Y Log-Based Capture is better (cheaper, faster) but relies on proprietary log details.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 20

Implementing the Apply Step

Y The Apply process at the secondary site periodically obtains (a snapshot or) changes to the CDT table from the primary site, and updates the copy.

Period can be timer-based or user/application defined. Y Replica can be a view over the modified relation!
If so, the replication consists of incrementally updating the materialized view as the relation changes. Y Log-Based Capture plus continuous Apply minimizes delay in propagating changes. Y Procedural Capture plus application-driven Apply is the most flexible way to process changes.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 21

Data Warehousing and Replication

Y A hot trend: Building giant “warehouses” of data from many sites.

Enables complex decision support queries over data from across an organization. Y Warehouses can be seen as an instance of asynchronous replication.
Source data typically controlled by different DBMSs; emphasis on “cleaning” data and removing mismatches ($ vs. rupees) while creating replicas. Y Procedural capture and application Apply best for this environment.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 22

Distributed Locking

Y How do we manage locks for objects across many sites?

Centralized: One site does all locking. X Vulnerable to single site failure.
Primary Copy: All locking for an object done at the primary copy site for this object. X Reading requires access to locking site as well as site where the object is stored.
Fully Distributed: Locking for a copy done at site where the copy is stored. X Locks at all sites while writing an object.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 23

Distributed Deadlock Detection

Y Each site maintains a local waits-for graph. Y A global deadlock might exist even if the local graphs contain no cycles:

T1 T2 T1 T2 T1 T SITE A SITE B GLOBAL Y Three solutions: Centralized (send all local graphs to one site); Hierarchical (organize sites into a hierarchy and send local graphs to parent in the hierarchy); Timeout (abort Xact if it waits too long).

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 24

Distributed Recovery

Y Two new issues:

New kinds of failure, e.g., links and remote sites.
If “sub-transactions” of an Xact execute at different sites, all or none must commit. Need a commit protocol to achieve this. Y A log is maintained at each site, as in a centralized DBMS, and commit protocol actions are additionally logged.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 28

Blocking

Y If coordinator for Xact T fails, subordinates who have voted yes cannot decide whether to commit or abort T until coordinator recovers.

T is blocked.
Even if all subordinates know each other (extra overhead in prepare msg) they are blocked unless one of them voted no.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 29

Link and Remote Site Failures

Y If a remote site does not respond during the commit protocol for Xact T, either because the site failed or the link failed:

If the current site is the coordinator for T, should abort T.
If the current site is a subordinate, and has not yet voted yes, it should abort T.
If the current site is a subordinate and has voted yes, it is blocked until the coordinator responds.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 30

Observations on 2PC

Y Ack msgs used to let coordinator know when it can “forget” an Xact; until it receives all acks, it must keep T in the Xact Table. Y If coordinator fails after sending prepare msgs but before writing commit/abort log recs, when it comes back up it aborts the Xact. Y If a subtransaction does no updates, its commit or abort status is irrelevant.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 31

2PC with Presumed Abort

Y When coordinator aborts T, it undoes T and removes it from the Xact Table immediately.

Doesn’t wait for acks ; “presumes abort” if Xact not in Xact Table. Names of subs not recorded in abort log rec. Y Subordinates do not send acks on abort. Y If subxact does not do updates, it responds to prepare msg with reader instead of yes/no. Y Coordinator subsequently ignores readers. Y If all subxacts are readers, 2nd phase not needed.

Database Management Systems, 2 nd^ Edition. R. Ramakrishnan and Johannes Gehrke 32

Summary

Y Parallel DBMSs designed for scalable performance. Relational operators very well- suited for parallel execution.

Pipeline and partitioned parallelism. Y Distributed DBMSs offer site autonomy and distributed administration. Must revisit storage and catalog techniques, concurrency control, and recovery issues.

Distributed Databases-Database Management Systems-Lecture 25 Slides-Computer Science, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Distributed Databases-Database Management Systems-Lecture 25 Slides-Computer Science and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Distributed Databases

Introduction

Recent Trends

Types of Distributed Databases

Distributed DBMS Architectures

Storing Data

Semijoin

Bloomjoin

Distributed Query Optimization

Updating Distributed Data

Synchronous Replication

Cost of Synchronous Replication

Implementing the Capture Step

Implementing the Apply Step

Data Warehousing and Replication

Distributed Locking

Distributed Deadlock Detection

Distributed Recovery

Blocking

Link and Remote Site Failures

Observations on 2PC

2PC with Presumed Abort

Summary