Shared-Memory Sample Sort Algorithm: Implementing Efficient Parallel Sorting on XMT | Assignments Algorithms and Programming

HW3: Shared-Memory Sample Sort

Course: CMSC751/ENEE759, Spring 2009

Title: Shared-Memory Sample Sort

Date Assigned: March 10th, 2009

Date Due: March 24th, 2009

Contact: Alex Tzannes - [email protected]

1 Assignment Goal

The goal of this assignment is to provide a randomized sorting algorithm that runs efficiently on XMT.

While you are allowed some flexibility as to what serial sorting algorithms to use for different steps of the

parallel algorithm, you should try to find and select the most efficient one for each case. The Sample Sort

algorithm follows a "decomposition first" pattern and is widely used on multiprocessor architectures.

Being a randomized algorithm, its running time depends on the output of a random number generator.

Sample Sort performs well on very large arrays, with high probability.

In this assignment, we propose implementing a variation of the Sample Sort algorithm that performs

well on shared memory parallel architectures such as XMT.

2 Problem Statement

The Shared Memory Sample Sort algorithm is an implementation of Sample Sort for shared memory

machines. The idea behind Sample Sort is to find a set of p−1 elements from the array, called splitters,

which partition the ninput elements into pgroups set0. ..setp−1. In particular, every element in setiis

smaller than every element in seti+1. The partitioned sets are then sorted independently.

The input is an unsorted array A. The output is returned in array Result. Let pbe the number of

processors. We will assume, without loss of generality, that Nis divisible by p. An overview of the

Shared Memory Sample Sort algorithm is as follows:

Step 1. In parallel, a set Sof s×prandom elements from the original array Ais collected, where pis

the number of TCUs available and sis called the oversampling ratio. Sort the array S, using an

algorithm that performs well for the size of S. Select a set of p−1 evenly spaced elements from

it into S0:S0={S[s],S[2s],...,S[(p−1)×s]}

These elements are the splitters that are used below to partition the elements of Ainto psets (or

partitions)seti, 0 ≤i<p. The sets are set0={A[i]|A[i]<S0[0]},set1={A[i]|S0[0]≤A[i]<

S0[1]},...,setp−1={A[i]|S0[p−1]≤A[i]}.

Step 2. Consider the input array Adivided into psubarrays, B0=A[0,...,(N/p)−1],B1=A[N/p,...,2(N/p)−

1]etc. The ith TCU iterates through subarray Biand for each element executes a binary search on

the array of splitters S0, for a total of N/pbinary searches per TCU. The following quantities are

computed:

Shared-Memory Sample Sort Algorithm: Implementing Efficient Parallel Sorting on XMT, Assignments of Algorithms and Programming