Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Distributed Databases: Understanding Client-Server Communications and Advantages, Study notes of Computer Science

Georgia Institute of Technology - Main Campus Computer Science

An overview of distributed databases, their architecture, and the advantages of using a client-server communication model. Topics include task distribution, communication methods, caching, and the characteristics, advantages, and limitations of client-server systems. Additionally, it discusses the importance of transparency, reliability, and performance in distributed database management systems.

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-0uh 🇺🇸

10 documents

1 / 29

This page cannot be seen from the preview

Don't miss anything!

Distributed Database Systems

Motivation

Growth of data communication

networks

Faster, cheaper & reliable

Needs of organizations

Multiple sites

Global presence

Interacting organizations

Need for transaction capabilities

Availability and reliability are crucial

Discover Study notes of Computer Science Georgia Institute of Technology - Main Campus

Partial preview of the text

Download Distributed Databases: Understanding Client-Server Communications and Advantages and more Study notes Computer Science in PDF only on Docsity!

Distributed Database Systems

Motivation

Growth of data communication

networks

Faster, cheaper & reliable

Needs of organizations

Multiple sites

Global presence

Interacting organizations

Need for transaction capabilities

Availability and reliability are crucial

What is Distributed Database

System?

A Distributed Database (DDB) is a

collection of multiple,logically

interrelated databases distributed over

a computer network

A Distributed Database System is

software that manages a distributed

database while making the distribution

transparent to the user

Distributed DB Environment

Site 1 Site 2

Site 3 Site 4

Communication

Network

Client-Server: Characteristics

Client

presents the user interface

forms queries or commands in a predefined language

communicates these to the server according to an

accepted interprocess communication method

performs data analysis on the results returned from the

server

presents the results to the user

Server

provides a service to client(s)

only responds to queries or commands from clients;

does not initiate conversation with clients

hides the existence of the client-server system from the

client (ideally)

Client-Server: Advantages &

Limitations

Advantages

Efficient division of labor

Client access to remote data (via

standards)

Full DBMS functionality provided to client

workstations

Limitations

Server forms bottleneck

Server forms single point of failure

Database scaling difficult

Multiple Clients/Multiple

Servers

Communications

Client

Services

Applications

LAN

z directory

z caching

z query decomposition

z commit protocols

Communications

DBMS Services

Database

Communications

DBMS Services

Database

Server-to-Server

Communications

DBMS Services

LAN

Communications

DBMS Services

z SQL interface

z programmatic

interface

z other application

support environments

Communications

Client

Services

Applications

Database Database

Transparency

High-level users are shielded from the

implementational details such as the

location, replication of data

Users still see a monolithic view of the

database

Different types of transparencies

Distribution transparency

Replication transparency

Fragmentation transparency

Example

TITLE SAL

SKILL

Elect. Eng. 40000 Syst. Anal. 34000 Mech. Eng. 27000 Programmer 24000

PROJ

JNO JNAME BUDGET

ENO ENAME TITLE

E1 J. Doe Elect. Eng. E2 M. Smith Syst. Anal. E3 A. Lee Mech. Eng. E4 J. Miller Programmer E5 B. Casey Syst. Anal. E6 L. Chu Elect. Eng. E7 R. Davis Mech. Eng. E8 J. Jones Syst. Anal.

EMP

ENO JNO RESP

E1 J1 Manager 12

DUR

E2 J1 Analyst 24 E2 J2 Analyst 6 E3 J3 Consultant 10 E3 J4 Engineer 48 E4 J2 Programmer 18 E5 J2 Manager 24 E6 J4 Manager 48 E7 J3 Engineer 36

E8 J3 Manager 40

ASG

LOC

J1 Instrumentation 150000 Montreal

J3 CAD/CAM 250000 New York

J2 Database Develop. 135000 New York

J4 Maintenance 310000 Paris

E7 J5 Engineer 23

J5 CAD/CAM 500000 Bangalore

Transparency - Illustration

Paris

New York Bangalore

Berlin

Paris Employees Paris Projects Paris Assignments

Bangalore Employees Bangalore Projects Bangalore Assignments New York Projects

New York Employees New York Projects New York Assignments Paris Employees

Communication

Network

SELECT ENAME,SAL

FROM EMP,ASG,SKILL

WHERE DUR > 12

AND EMP.ENO = ASS.ENO

AND EMP.TITLE = SKILL.TITLE

Reliability & Performance

Distributed DBMS avoidssingle point of

failure

Only data and software at the failed site

are inaccessible

Further improvement by replication

Performance improvements

Proximity of data to points of its use

Parallelism in execution

Inter-query parallelism

Intra-query parallelism

Distributed Query Processing

Methodology

Query Decomposition

Data Localization

Global Optimization

Local Optimization

Calculus query on Distributed

relations

Algebraic query on

Distributed relations

Fragment Query

Optimized Fragment Query with

Communication Operations

Optimized local queries

Global Schema

Fragment Schema

Stats on fragments

Local Schema

Control

Site

Local

Sites

Steps

Query Decomposition

Input: Calculus query on global relations

Output: Algebraic query on global relations

Transformation rules used

Same as Centralized

Data Localization

Input: Algebraic query on global relations

Output: Algebraic query on fragment relations

Global Optimization

Input: Algebraic query on fragment relations

The best global schedule

Local optimization

Input: Best global execution schedule

Output: The optimal access path for each local sub-query

Decomposition &

Restructuring

Convert relational calculus

to relational algebra

Make use of query trees

Example

Find the names of all employees other than J. Doe who worked on CAD/CAM project for either 1 or 2 years SELECT ENAME FROM E, W, P WHERE E.ENO = W.ENO AND W.JNO = P.JNO AND E.ENAME ≠ “J. Doe” AND P.JNAME = “CAD/CAM” AND (W.DUR = 12 or W.DUR = 24)

ΛENAME

σ(DUR = 12 OR DUR = 24) AND JNAME = “CAD/CAM” AND ENAME = “JDOE”

|X| (^) JNO

|X|ENO

W E

Project

Select

Join

Step 2 – Data Localization

Input: Algebraic query on distributed

relations

Determine which fragments are

involved

Localization program

Substitute for each global query its

materialization program

Optimize

Step 3 – Global Query

Optimization

Input: Fragment Query

Find the best global schedule

Minimizes a cost function

Distributed join processing

Which relations to ship where?

Decide on the use of semi-joins

Semi joins saves communication at the expense

of more local processing

Join methods

Nested loop Vs. Ordered joins

Cost-Based Optimization

Solution Space

The set of equivalent algebra expressions (query

trees)

Cost function (in terms of time)

I/O cost + CPU cost + Communication cost

Might have different weights in different

distributed environments

Search algorithms

How do we find the best plan?

Exhaustive search, heuristic algorithms (iterative

deepening, simulated annealing, etc.)

Cost Functions

Total Time

Reduce each cost component

Do as little of each cost component as

possible

Optimizes the utilization of resources

Response Time

Do as many things as possible in parallel

May increase total time

Total Cost

Total cost = CPU cost + I/O cost +

communication cost

CPU cost = unit instruction cost * no. of instructions

I/O cost = unit disk I/O cost * no. of disk I/O

Communication cost = message initiation +

data transmission

Example

Assume that only the communication cost is considered

Total time = 2 * message initialization time + unit

transmission time * (x+y)

Response time = max(time to send x from 1 to 3 and y

from 2 to 3)

Time to send x from 1 to 3 = message initialization time

+ unit transmission time * x

Time to send y from 2 to 3 = message initialization time

+ unit transmission time * y

Site 1

Site 2

Site 3

X Units

Y Units

Distributed Transaction

Execution

User application

Results &

User notifications

Read, Write,

EOT, Abort

Local

Recovery

Protocol

Distributed

Concurrency Control

Protocol

Replica Control

Protocol

Distributed

Transaction Execution

Model

Begin_transaction,

Read, Write, EOT,

Abort

Serializability in Distributed

DBMS

Somewhat more involved: Two histories have

to be considered

Local histories

Global history

For global transactions (i.e., global history) to

be serializable two conditions are necessary

Each local history should be serializable

Two conflicting operations should in the same

relative order in all of the local histories where they

appear together

Global Non-serializability

T1: Read (X)

X <- X+ Write(X) Commit

T2: Read (X)

X <- X* Write(X) Commit

The following two local histories are individually serializable

(in fact serial). But the two transactions are not globally

serializable

LH 1 = {R 1 (x), W 1 (X), C 1 , R 2 (X), W 2 (X), C 2 }

LH 2 = {R 2 (x), W 2 (X), C 2 , R 1 (X), W 1 (X), C 1 }

Deadlock

A transaction is deadlocked if it is blocked and

will remain blocked until external intervention

Locking based CC algorithms may result in

deadlocks

Wait-for graph

If transaction T i waits for another transaction T j to

release a lock, then Ti Æ T j in WFG

T

Local Vs. Global WFG

Assume T1 and T2 run at site 1, T3 and T4 at

site 2. Also assume T3 waits for a lock held

by T4 which waits for a lock held by T1 which

waits for a lock held by T2 which, in turn,

waits for a lock held by T

T (^1)

T (^2)

T (^4)

T (^3)

Local WFG

Site 1 Site 2

T (^1)

T (^2)

T (^4)

T (^3)

Distributed Databases: Understanding Client-Server Communications and Advantages, Study notes of Computer Science

Related documents

Partial preview of the text

Download Distributed Databases: Understanding Client-Server Communications and Advantages and more Study notes Computer Science in PDF only on Docsity!

Distributed Database Systems

Motivation

Growth of data communication

networks

 Faster, cheaper & reliable

Needs of organizations

 Multiple sites

 Global presence

 Interacting organizations

 Need for transaction capabilities

 Availability and reliability are crucial

What is Distributed Database

System?

A Distributed Database (DDB) is a

collection of multiple,logically

interrelated databases distributed over

a computer network

A Distributed Database System is

software that manages a distributed

database while making the distribution

transparent to the user

Distributed DB Environment

Site 1 Site 2

Site 3 Site 4

Communication

Network

Client-Server: Characteristics

Client

 presents the user interface

 forms queries or commands in a predefined language

 communicates these to the server according to an

accepted interprocess communication method

 performs data analysis on the results returned from the

server

 presents the results to the user

Server

 provides a service to client(s)

 only responds to queries or commands from clients;

does not initiate conversation with clients

 hides the existence of the client-server system from the

client (ideally)

Client-Server: Advantages &

Limitations

Advantages

 Efficient division of labor

 Client access to remote data (via

standards)

 Full DBMS functionality provided to client

workstations

Limitations

 Server forms bottleneck

 Server forms single point of failure

 Database scaling difficult

Multiple Clients/Multiple

Servers

Communications

Client

Services

Applications

Communications

DBMS Services

Communications

DBMS Services

Server-to-Server

Communications

DBMS Services

Communications

DBMS Services

Communications

Client

Services

Applications

Transparency

High-level users are shielded from the

implementational details such as the

location, replication of data

Faster, cheaper & reliable

Multiple sites

Global presence

Interacting organizations

Need for transaction capabilities

Availability and reliability are crucial

presents the user interface

forms queries or commands in a predefined language

communicates these to the server according to an

performs data analysis on the results returned from the

presents the results to the user

provides a service to client(s)

only responds to queries or commands from clients;

hides the existence of the client-server system from the

Efficient division of labor

Client access to remote data (via

Full DBMS functionality provided to client

Server forms bottleneck

Server forms single point of failure

Database scaling difficult

Users still see a monolithic view of the

Distribution transparency

Replication transparency

Fragmentation transparency

Only data and software at the failed site

Further improvement by replication

Proximity of data to points of its use

Parallelism in execution

Inter-query parallelism

Intra-query parallelism

Input: Calculus query on global relations

Output: Algebraic query on global relations

Transformation rules used

Same as Centralized

Input: Algebraic query on global relations

Output: Algebraic query on fragment relations

Input: Algebraic query on fragment relations

The best global schedule

Input: Best global execution schedule

Output: The optimal access path for each local sub-query