CS 347: Query Optimization Notes - Parallel and Distributed Systems, Slides of Distributed Database Management Systems

These notes from cs 347 cover query optimization techniques in parallel and distributed systems. Topics include exhaustive search with pruning, hill climbing, and query separation. Examples and cost comparisons are provided.

Typology: Slides

2011/2012

Uploaded on 07/16/2012

sambandam
sambandam 🇮🇳

4.3

(37)

154 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2
CS 347 Notes 04 7
Take into account:
(in parallel/distributed system)
Start up costs (for parallel operation)
Data distribution costs/time
•Contention
memory, disk, network,…
Assembling result
CS 347 Notes 04 8
Example: Response time
Site 1
Site 2
Site 3
Site 4
Startup Distri- Searching Final
bution +send results proc.
CS 347 Notes 04 9
Searching strategies
(1) Exhaustive (with pruning)
(2) Hill climbing (greedy)
(3) Query separation
CS 347 Notes 04 10
(1) Exhaustive
- consider “all” query plans
with a set of techniques
- prune some plans
-heuristics
CS 347 Notes 04 11
RST
R S RT S R S T T S TR
(S R) T (T S) R
ship S semi ship T semi
to R join to S join
1 Prune because cross-product not necessary
2 Prune because larger relation first
Example: join |R|>|S|>|T|R S T
AB
2 1 2 1
CS 347 Notes 04 12
In generating plans, keep goal in mind:
e.g.: Goal is parallelism in system with fast
net, consider partitioning relation(s)
first
e.g.: Goal is reduction of net traffic,
consider semi-joins
docsity.com
pf3
pf4
pf5

Partial preview of the text

Download CS 347: Query Optimization Notes - Parallel and Distributed Systems and more Slides Distributed Database Management Systems in PDF only on Docsity!

CS 347 Notes 04 7

 Take into account:

(in parallel/distributed system)

  • Start up costs (for parallel operation)
  • Data distribution costs/time
  • Contention
    • memory, disk, network,…
  • Assembling result

CS 347 Notes 04 8

Example: Response time

Site 1 Site 2 Site 3 Site 4 Startup Distri- Searching Final bution +send results proc.

CS 347 Notes 04 9

Searching strategies

(1) Exhaustive (with pruning) (2) Hill climbing (greedy) (3) Query separation

CS 347 Notes 04 10

(1) Exhaustive

  • consider “all” query plans with a set of techniques
  • prune some plans
    • heuristics

CS 347 Notes 04 11

R S T R S RT S R S T T S TR (S R) T (T S) R ship S semi ship T semi to R join to S join 1 Prune because cross-product not necessary 2 Prune because larger relation first

Example: join R S T |R|>|S|>|T|

A B

2 1 2 1

CS 347 Notes 04 12

 In generating plans, keep goal in mind:

e.g.: Goal is parallelism in system with fast net, consider partitioning relation(s) first e.g.: Goal is reduction of net traffic, consider semi-joins

CS 347 Notes 04 13

(2) Hill climbing Better plans

Worse plans

x Initial plan

1

CS 347 Notes 04 14

(2) Hill climbing Better plans

Worse plans

x Initial plan

1

2

CS 347 Notes 04 15

Example R S T V

Rel Site Size tuple size = 1 R 1 10 S 2 20 T 3 30 V 4 40

R S T V

A B C

Goal: minimize data transmission

CS 347 Notes 04 16

Initial plan: send relations to one site

What site do we send all relations to? To site 1: cost=20+30+40= To site 2: cost=10+30+40= To site 3: cost=10+20+40= To site 4: cost=10+20+30=60 

CS 347 Notes 04 17

P 0 : R (1  4)

S (2  4)

T (3  4)

Compute R S T V at site 4

CS 347 Notes 04 18

Local search

  • Consider sending each relation to neighbor: e.g.: 4

1 2

R S

R

R

S

CS 347 Notes 04 25

Hill climbing may miss best plan!

Example: best plan could be: PB : T (3  4) =T V  (4  2) ’=  S ’ (2  1) ”= ’ R ” (1  4) Compute answer

’’

’

V T

R (^) S T

[optional]

CS 347 Notes 04 26

Hill climbing may miss best plan!

Example: best plan could be: PB : T (3  4) =T V  (4  2) ’=  S ’ (2  1) ”= ’ R ” (1  4) Compute answer

’’

’

V T

R (^) S T

33 = total

Costs could be low because β is [optional] very selective

CS 347 Notes 04 27

(3) Query separation

  • separate query into 2 or more steps -optimize each step independently

CS 347 Notes 04 28

Example: simple queries

e.g.: c 1

c 2 c 3

R S

1. Compute R’ = A [c 2 R]

S’ = A [c 3 S]

  1. Compute J = R’ S’

A

CS 347 Notes 04 29

c 1

c 2 c 3

R S

A

1. Compute R’ = A [c 2 R]

S’ = A [c 3 S]

  1. Compute J = R’ S’
  2. Compute

Ans = c 1 { [J c 2 R] [J c 3 S] }

CS 347 Notes 04 30

In other words:

(a) Compute A values in answer (steps 1,2) (b) Get tuples from sites with matching A values and compute answer (step 3)

CS 347 Notes 04 31

Simple query

  • Relations have a single attribute
  • Output has a single attribute e.g., J  R’ S’

CS 347 Notes 04 32

Idea

  • Decompose query into
    • Local processing
    • Simple query (or queries)
    • Final processing
  • Optimize simple query

CS 347 Notes 04 33

Philosophy

  • Hard part is distributed join
  • Do this part with only keys; get rest of data later
  • Simpler to optimize simple queries

CS 347 Notes 04 34

Summary: Query Optimization

  • Cost estimation
  • Strategies
    • Exhaustive
    • Hill climbing
    • Separation

CS 347 Notes 04 35

Words of wisdom

“Optimization is like chess playing”

i.e., May have to make sacrifices (move data, partition relations, build indexes) for later gains!