Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Database Management Systems: Leveraging Parallelism for Efficient Data Processing, Slides of Database Management Systems (DBMS)

Damodaram Sanjivayya National Law University Database Management Systems (DBMS)

An introduction to parallel database management systems (dbms), explaining why parallel access to data is essential, the concepts of pipeline and partition parallelism, and the success stories of parallel dbmss such as teradata and tandem. It also covers different types of dbms parallelism, automatic data partitioning, and parallel query processing techniques like parallel scans, sorting, and joins.

Typology: Slides

2012/2013

Uploaded on 04/27/2013

dhanapati 🇮🇳

4.1

(24)

123 documents

1 / 21

This page cannot be seen from the preview

Don't miss anything!

Parallel DBMS

Docsity.com

Discover Slides of Database Management Systems (DBMS) Damodaram Sanjivayya National Law University

Partial preview of the text

Download Parallel Database Management Systems: Leveraging Parallelism for Efficient Data Processing and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Parallel DBMS

Why Parallel Access To Data?

1 Terabyte

10 MB/s

At 10 MB/s

1.2 days to scan

1 Terabyte

1,000 x parallel

1.5 minute to scan.

Parallelism: divide a big problem into many smaller ones to be solved in parallel. Docsity.com

DBMS: The || Success Story

DBMSs are among most (only?) successful application

of parallelism.

Teradata, Tandem vs. Thinking Machines,
Every major DBMS vendor has some || server
Workstation manufacturers depend on || DB server sales.
Reasons for success:
Bulk-processing (= partitioned ||-ism).
Natural pipelining.
Inexpensive hardware can do the trick!
Users/app-prog. don’t need to think in ||

Some || Terminology

Speed-Up
- More resources means

proportionally less time

for given amount of data.

Scale-Up
- If resources increased in

proportion to increase in

data size, time is constant.

degree of ||-ism

Xact/sec. (throughput)

(^) Ideal

degree of ||-ism

sec./Xact (response time)

Ideal

Different Types of DBMS ||-ism

Intra-operator parallelism
- get all machines working to compute a given operation (scan, sort, join)
Inter-operator parallelism
- each operator may run concurrently on a different site (exploits pipelining)
Inter-query parallelism
- different queries run on different site

We’ll focus on intra-operator ||-ism

Automatic Data Partitioning

Partitioning a table:

Range Hash Round Robin

Shared-disk and -memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning!

A...E F...J^ K...N^ O...S^ T...Z^ A...E F...J^ K...N^ O...S^ T...Z^ A...E^ F...J^ K...N^ O...S T...Z

Good for group-by, range queries, and also equip-join

Good for equijoins (^) Good to spread load; Most flexible - but

Parallel Sorting

New records again and again:
- 8.5 Gb/minute, shared-nothing; Datamation benchmark

in 2.41 secs (UCB students)

Idea:
- Scan in parallel, and range-partition as you go.
- As tuples come in, begin “local” sorting on each
- Resulting data is sorted, and range-partitioned.
- Problem: skew!
- Solution: “sample” data at start to determine partition

points.

Parallel Joins

Nested loop:
- Each outer tuple must be compared with each

inner tuple that might join.

Easy for range partitioning on join cols, hard

otherwise!

Sort-Merge (or plain Merge-Join):
- Sorting gives range-partitioning.
- Merging partitioned tables is local.

Dataflow Network for || Join

Good use of split/merge makes it easier to build

parallel versions of sequential join code.

Complex Parallel Query Plans

Complex Queries: Inter-Operator parallelism
- Pipelining between operators:
  - sort and phase 1 of hash-join block the pipeline!!
- Bushy Trees

A B R S

Sites 1-4 Sites 5-

Sites 1-

Observations

It is relatively easy to build a fast parallel

query executor

It is hard to write a robust high-quality parallel

query optimizer:

There are many tricks.
One quickly hits the complexity barrier.

Parallel Query Optimization

Common approach: 2 phases
- Pick best sequential plan (System R algo)
- Pick degree of parallelism based on current system

parameters.

“Bind” operators to processors
- Take query tree, “decorate” with assigned

processor

Parallel DBMS Summary

||-ism natural to query processing:
- Both pipeline and partition ||-ism!
Shared-Nothing vs. Shared-Mem
- Shared-disk too, but less standard
- Shared-memory easy, costly.
  - But doesn’t scaleup.
- Shared-nothing cheap, scales well.
  - But harder to implement.
Intra-op, Inter-op & Inter-query ||-ism possible.

|| DBMS Summary, cont.

Data layout choices important!
Most DB operations can be done partition-||
- Sort-merge join, hash-join.
Complex plans.
- Allow for pipeline-||ism, but sorts, hashes block the

pipeline.

Partition ||-ism achieved via bushy trees.

Parallel Database Management Systems: Leveraging Parallelism for Efficient Data Processing, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Parallel Database Management Systems: Leveraging Parallelism for Efficient Data Processing and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Parallel DBMS

Why Parallel Access To Data?

1 Terabyte

10 MB/s

At 10 MB/s

1.2 days to scan

1 Terabyte

1,000 x parallel

1.5 minute to scan.

proportionally less time

for given amount of data.

proportion to increase in

data size, time is constant.

Automatic Data Partitioning

Partitioning a table:

Range Hash Round Robin

in 2.41 secs (UCB students)

points.

inner tuple that might join.

otherwise!

A B R S

parameters.

processor

pipeline.