Bitonic Sort Algorithm: A Parallel Approach to Data Sorting, Lecture notes of Advanced Computer Architecture

An in-depth look at the bitonic sort algorithm, a parallel sorting technique used to efficiently sort large data sets. The concept of bitonic sequences, compare-exchange operations, and merging techniques. It also includes various figures to illustrate the process.

Typology: Lecture notes

2013/2014

Uploaded on 12/27/2014

wsabren1..
wsabren1.. 🇮🇶

4

(5)

11 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Step 1 Step 2 Step 3
aiaj
PiPi
PiPj
PjPj
max{ai,aj}
min{ai,aj}
aj,ai
ai,aj
Figure 9.1 A parallel compare-exchange operation. Processes Piand Pjsend their elements to
each other. Process Pikeeps min{ai,aj}, and Pjkeeps max{ai,aj}.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Bitonic Sort Algorithm: A Parallel Approach to Data Sorting and more Lecture notes Advanced Computer Architecture in PDF only on Docsity!

Step 1 Step 2 Step 3

ai a j

Pi P^ j Pi P^ j Pi P j

ai , a j a j , ai min{ ai , a j } max{ ai , a j }

Figure 9.1 A parallel compare-exchange operation. Processes Pi and P j send their elements to

each other. Process Pi keeps min{ ai , a j }, and P j keeps max{ ai , a j }.

Step 2

Step 3 Step 4

Step 1

Pi

Pi Pi

P j Pi

P j P j

P j

Figure 9.2 A compare-split operation. Each process sends its block of size n / p to the other

process. Each process merges the received block with its own block and retains only the appropriate

half of the merged block. In this example, process Pi retains the smaller elements and process P j

retains the larger elements.

Columns of comparators

Input wires Output wires

Interconnection network

Figure 9.4 A typical sorting network. Every sorting network is made up of a series of columns,

and each column contains a number of comparators connected in parallel.

Original

sequence 3 5 8 9 10 12 14 20 95 90 60 40 35 23 18 0

1st Split 3 5 8 9 10 12 14 0 95 90 60 40 35 23 18 20

2nd Split 3 5 8 0 10 12 14 9 35 23 18 20 95 90 60 40

3rd Split 3 0 8 5 10 9 14 12 18 20 35 23 60 40 95 90

4th Split 0 3 5 8 9 10 12 14 18 20 23 35 40 60 90 95

Figure 9.5 Merging a 16 -element bitonic sequence through a series of log 16 bitonic splits.

BM[2]

BM[2]

BM[2]

BM[2]

BM[2]

BM[2]

BM[2]

BM[2]

BM[16]

BM[4]

BM[4]

BM[4]

BM[4]

BM[8]

BM[8]

0001

0100 0101

0000 0010 0011

0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

Wires

Figure 9.7 A schematic representation of a network that converts an input sequence into a bitonic

sequence. In this example, ⊕BM[k] and BM[k] denote bitonic merging networks of input size k

that use ⊕ and  comparators, respectively. The last merging network (⊕BM[ 16 ]) sorts the input.

In this example, n = 16.

Wires

Figure 9.8 The comparator network that transforms an input sequence of 16 unordered numbers

into a bitonic sequence. In contrast to Figure 9.6, the columns of comparators in each bitonic merging

network are drawn in a single box, separated by a dashed line.

Stage 3

Processors

Stage 1 Stage 2 Stage 4

Figure 9.10 Communication characteristics of bitonic sort on a hypercube. During each stage of

the algorithm, processes communicate along the dimensions shown.

1101

1001 1010

0100

1000 1011

1100 1110 1111

0101 0110 0111

0000 0001 0010 0011

1110

1001

0010

1010

1101 1100

0000 0001 0011

0111 0110 0101 0100

1000 1011

1111

1000

1110

1100

1111

0000 0001 0100 0101

0010 0011 0110 0111

1001 1101

1010 1011

(a) (b) (c)

Figure 9.11 Different ways of mapping the input wires of the bitonic sorting network to a mesh

of processes: (a) row-major mapping, (b) row-major snakelike mapping, and (c) row-major shuffled

mapping.

Phase 6 (even)

Phase 1 (odd)

Unsorted

Phase 2 (even)

Phase 3 (odd)

Phase 4 (even)

Phase 5 (odd)

Sorted

Phase 8 (even)

Phase 7 (odd)

Figure 9.13 Sorting n = 8 elements, using the odd-even transposition sort algorithm. During

each phase, n = 8 elements are compared.

Figure 9.14 An example of the first phase of parallel shellsort on an eight-process array.

Figure 9.16 A binary tree generated by the execution of the quicksort algorithm. Each level of the

tree represents a different array-partitioning iteration. If pivot selection is optimal, then the height of

the tree is (log n ), which is also the number of iterations.

[4] {54}

[1] {33}

[6] {33}

[5] {82}

[2] {21}

[3] {13} [7] {40}

[8] {72}

(a)^33 211354 82 3340

(b)

(c)

(f)

(d) (e)

leftchild rightchild

leftchild rightchild

leftchild rightchild

root = 4

Figure 9.17 The execution of the PRAM algorithm on the array shown in (a). The arrays leftchild

and rightchild are shown in (c), (d), and (e) as the algorithm progresses. Figure (f) shows the binary

tree constructed by the algorithm. Each node is labeled by the process (in square brackets), and

the element is stored at that process (in curly brackets). The element is the pivot. In each node,

processes with smaller elements than the pivot are grouped on the left side of the node, and those

with larger elements are grouped on the right side. These two groups form the two partitions of the

original array. For each partition, a pivot element is selected at random from the two groups that form

the children of the node.

pivot=

pivot selection

after local

rearrangement

after global

rearrangement

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

P 0

P 0

P 1

P 1

P 2

P 2

P 3

P 3

P 4

P 4

| Si | | L i |

Prefix Sum Prefix Sum

Figure 9.19 Efficient global rearrangement of the array.

Initial element

distribution

Local sort &

sample selection

Global splitter

selection

Final element

assignment

Sample combining

P 0

P 0

P 0

P 1

P 1

P 1

P 2

P 2

P 2

Figure 9.20 An example of the execution of sample sort on an array with 24 elements on three

processes.