*Low-Cost Task Scheduling for
Distributed-Memory Machines
*

Outline
– ** Introduction
**–

**–**

*List Scheduling***–**

*Preliminaries***–**

*General Framework for LSSP***–**

*Complexity Analysis***–**

*Case Study***–**

*Extensions for LSDP*

Introduction

• ** Task Scheduling
**–

**–**

*Scheduling heuristics***–**

*Shared-memory - Distributed Memory*

*Bounded - unbounded number of*** processors
**–

**–**

*Multistep - singlestep methods***–**

*Duplicating - nonduplicating methods*

List Scheduling
• ** LDSP and LSSP algorithms
**•

*LSSP (List Scheduling with Static Priorities);*– *Tasks are scheduled in the order of their previously
computed priorities on the task’s “best” processor.
*

– ** Best processor is ...
**•

*The processor enabling the earliest start time, if the*** performance is the main concern
**•

*The processor becoming idle the earliest, if the speed is*** the main concern.
**•

*LSDP (List Scheduling with Dynamic*** Priorities);
**–

**–**

*Priorities for task-processor pairs*

List Scheduling
• *Reducing LSSP time complexity
*

– *O(V log(V) + (E+V)P)
=> O(V log (P) + E)
**V = expected number of tasks
E = expected number of dependencies
P = number of processors
*

*1. Considering only two processors
2. Maintaining partially-sorted task priority
*

*queue with a limited number of tasks
*

Preliminaries

• ** Parallel programs
**–

**–**

*(DAG) G = (V,E)***–**

*Computation cost Tw(t)***–**

*Communication cost Tc(t, t’)*

*Communication and*** computation ratio (CCR)
**–

*The task graph width (W)*E E

E E E

E E

E E

V

V V V

V V V

V

E

Preliminaries
• ** Entry and exit tasks
**•

**•**

*The bottom level (Tb) of the task***•**

*Ready = parents scheduled***•**

*Start time Ts(t)***•**

*Finish time Tf(t)***•**

*Partial schedule***→**

*Processor ready time***∈**

*Tr(p) = max Tf(t) , t*

*V, Pr(t)=p.*• ** Processor becoming idle the earliest (pr)
**→

**∈**

*Tr(pr) = min Tr(p) , p*

Preliminaries
• ** The last message arrival time
**→

**∈**

*Tm(t) = max { Tf(t’) + Tc(t’, t) } (t’, t)*

*E*• *The enabling processor pe(t); from which
last message arrives
*

• ** Effective message arrival time
**→

**∈**

*Te(t,p) = max { Tf(t’) + Tc(t’, t) } (t’, t)*

*E , pt(t’) <> p*• ** The start time of a ready task, once
scheduled
**→

General Framework for LSSP

• ** General LSSP algorithm
**–

*Task’s priority computation,*• ** O(E + V)
**–

*Task selection,*• ** O(V log W)
**–

*Processor selection*• *O( (E + V) P)*

General Framework for LSSP

• ** Processor Selection
**–

*selecting a processor**1. The enabling processor
2. Processor becoming idle first
*

**→*** Ts(t) = max { Te (t, p), Tr ( p ) }
*

General Framework for LSSP
• ** Lemma 1.
**→

*p <> pe(t) : Te (t, p) = Tm(t)***• ***Theorem 1. t is a ready task, one of the
*

** processors p **∈

**→**

*{pe(t), pr } satisfies***∈**

*Ts (t, p) = min Ts(t, px), px*

*P*• ** O( (E + V) P ) **→

**–**

*O (V log (P) + E )***–**

*O (E + V) to traverse the task graph*

*O (V log P) to maintain the processors*** sorted
General Framework for LSSP

** Task Selection
**

*O (V log W) can be reduced by sorting only*** some of the tasks.
**

*Task priority queue**1. A sorted list of size H
2. A FIFO list ( O ( 1 ) )
*

** decreases to O(V log H)
**

****

*H needs to be adjusted*

Complexity Analysis

*Computing task priorities
O ( E + V )
*

** Task selection O ( V log W )
**→

**→**

*O ( V log H ) for partially sorted priority queue*

*O ( V log (P) ) for queue of size P* ** Processor Selection O (E + V)
**→

*O (V log P)* ** Total complexity
**→

**→**

*O ( V ( log (W) + log (P) ) + E) fully sorted*

Case Study
• *MCP (Modified Critical
*

** Path)
**–

*The task having the**highest bottom level has
the highest priority
*

• ** FCP (Fast Critical Path)
**•

**•**

*3 Processors*

*Partially sorted priority*** queue of size 2
**•

*7 tasks*4

4

1

1 3 2

3 1

1 1

t0 / 2

t1 / 2 t2 / 2 t3 / 2

t6 / 2 t5 / 3 t4 / 3

t7 / 2

2

Case Study

4

4

1

1 3 2

3 1

1 1

t0 / 2

t1 / 2 t2 / 2 t3 / 2

t6 / 2 t5 / 3 t4 / 3

t7 / 2

2

**Ready tasks Scheduling
sorted FIFO
**

**t
t -> p [ Ts - Tf ]
**

t0 [15] - t0 t0 -> p0 [0 - 2]

t1 [11] t2 [9]

t3 [12] t1 t1 -> p0 [2 - 4]

t3 [12] t4 [6] t2 [9] t5 [8]

t3 t3 -> p1 [3 - 6]

t2 [9] t4 [6]

t5 [8] t2 t2 -> p0 [4 - 6]

t5 [8] t4 [6]

t6 [6] t5 t5 -> p2 [6 - 9]

t4 [6] t6 [6]

- t4 t4 -> p0 [6 - 9]

t6 [6] - t6 t6 -> p1 [7 - 9]

t7 [2] - t7 t7 -> p2 [11 - 13]

Extensions for LSDP

• ** Extend the approach to dynamic priorities
**→

**→**

*ETF : ready task starts the earliest***→**

*ERT : ready task finishes the earliest***–**

*DLS : task-processor having highest dynamic level***→**

*General formula***ρ**

**α**

*(t, p) =*

*( t ) + max { Te (T, p), Tr (p) }*• α** ETF ( t ) = 0
**• α

**• α**

*ERT ( t ) = Tw( t )*

Extensions for LSDP

• ** EP case
**–

**–**

*on each processor, the tasks are sorted*

*the processors are sorted*• ** non-EP case
**–

**–**

*the processor becoming idle first*

Extensions for LSDP

• ** 3 tries;
**–

*1 for EP case, 1 for non-EP case*• ** Task priority queues maintained;
**–

*P for EP case, 2 for non-EP case*• ** Each task is added to 3 queues;
**–

*1 for EP case, 2 for non-EP case*• ** Processor queues;
**–

Complexity
• *Originally O ( W ( E + V ) P )
now O ( V (log (W) + log (P) ) + E )
can be further reduced using partially
*

*sorted priority queue. A size of P is required
to maintain comparable performance
*

*
O ( V log (P) + E )
*

Conclusion

** ***LSSP can be performed at a significantly lower
*

** cost...
**

*Processor selection between only two processors;*** enabling processor or processor becoming idle first
**

*Task selection, only a limited number of tasks are*** sorted
**

*Using the extension of this method, LSDP*** complexity also can be reduced
**

*For large program and processor dimensions,**superior cost-performance trade-off.
*

