Distributed Query Processing: Client-Server Architectures and Query Optimization, Slides of Database Management Systems (DBMS)

The state of the art in distributed query processing, focusing on client-server database systems and their architectures, including peer-to-peer, strict client-server, and middleware/multitier systems. It also covers query processing techniques such as query shipping, data shipping, and hybrid shipping, as well as query optimization strategies and their trade-offs.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

dhanapati
dhanapati 🇮🇳

4.1

(24)

123 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The State of the Art in
Distributed Query Processing
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Distributed Query Processing: Client-Server Architectures and Query Optimization and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

The State of the Art in

Distributed Query Processing

Introduction

  • Distributed database technology is becoming

an increasingly attractive enhancement to

many database systems

  • Cost and scalability
  • Software integration
    • Legacy systems
  • New applications
  • Market forces

Client-Server Database Systems

  • Relationships between distributed nodes take

a client-server form

  • Client: makes requests of the servers, usually

the source of queries

  • Server: responds to client requests, usually

the source of data

  • System architectures: peer-to-peer, strict

client-server, middleware/multitier

Architectures: Peer-to-Peer

  • All nodes are equivalent
  • Each can be either a client or server on demand (can store data and/or make requests)
  • Ex: SHORE system

Peer Node Server or Client

Peer Node Server or Client

Peer Node Server or Client

Architectures: Middleware/Multitier

  • Multiple levels of client- server interaction
  • Nodes act as clients to those below them and servers to those above
  • SAP R/3, web servers with DB backends

Node 1 Client to Node 2

Node 2 Server to Node 1, Client to Node 3

Node 3 Server to Node 2

Architectures: Evaluation

  • Peer-to-Peer
    • Simplest setup
    • Equal load sharing
  • Strict Client-Server
    • Specialization
    • Administration for servers only
  • Middleware/Multitier
    • Functionality integration
    • Scalability

Query Shipping

  • SQL query code is sent down to the server
  • Server parses and evaluates query, returns result
  • Used in DB2, Oracle, MS SQL Server

Data Shipping

  • Client parses query and requests data from server
  • Server provides data, then client executes query
  • Data can be cached at client (main memory or disk)

Evaluation

  • Query Shipping
    • Reliant on server performance
    • Scales poorly with increasing client load
  • Data Shipping
    • Good scalability
    • High communication costs
  • Hybrid
    • Potential to outperform other options
    • More complex optimizations

Hybrid Shipping Observations

  • Some observations of optimal performance

using hybrid shipping

  • Preference to not use a client cache
    • If network transfer cost < client access cost
  • Shipping down cached data
    • If in main memory & execution at server
  • Multiple small updates
    • Maintain at client and post to server only when necessary

Distributed Query Plans

  • Each operator is annotated with a logical site

of execution – plans are shareable

  • client means an operator is executed from the

client where the query is issued

  • server means:
    • for scan operators, execute at a location that has the necessary data
    • for updates, execute at all locations with the relevant data

Query Optimization: Where?

  • Should optimization occur at the client or the

server?

  • At client: less load on servers, better

scalability

  • At server: more information about system

statistics, especially server loads

  • Potential solution: primary parsing and query

rewriting at client, further optimization at

server

Query Optimization: When?

  • Tradeoff of accuracy vs. cost
  • Traditional-style: optimize once, store plan
    • No support for changing DB conditions
    • No incurred cost for query execution
  • Plan sets: optimize for possible scenarios
    • Generate a few query plans for diff. conditions
    • Choose plans based on runtime statistics
  • On-the-fly: observe intermediate results
    • Re-optimize query if different from expectations

Query Optimization: Two-Step

  • Compile-time: generate join order, etc.
  • Runtime: perform site selection
  • Reasonable cost at each end
  • Responds well to changing server loads
  • Fully utilizes client data caching