Low-Cost Task Scheduling  -  Computer Systems Architecture - Lecture Slides, Slides for Computer Architecture and Organization. Alagappa University
jutt
jutt

Low-Cost Task Scheduling - Computer Systems Architecture - Lecture Slides, Slides for Computer Architecture and Organization. Alagappa University

PDF (247 KB)
20 pages
1000+Number of visits
Description
Some concept of Computer Systems Architecture are Acyclic Graph, Advanced Micro Devices, Basic Grid Architecture, Control Flow Prediction, Desktop Processor Architecture, Message-Driven Processor. Main points of this lec...
20 points
Download points needed to download
this document
Download the document
Preview3 pages / 20
This is only a preview
3 shown on 20 pages
Download the document
This is only a preview
3 shown on 20 pages
Download the document
This is only a preview
3 shown on 20 pages
Download the document
This is only a preview
3 shown on 20 pages
Download the document

Low-Cost Task Scheduling for Distributed-Memory Machines

Docsity.com

Outline – Introduction List Scheduling Preliminaries General Framework for LSSP Complexity Analysis Case Study Extensions for LSDP Conclusion

Docsity.com

Introduction

Task Scheduling Scheduling heuristics Shared-memory - Distributed Memory Bounded - unbounded number of

processors Multistep - singlestep methods Duplicating - nonduplicating methods Static - dynamic priorities

Docsity.com

List Scheduling • LDSP and LSSP algorithms LSSP (List Scheduling with Static Priorities);

Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor.

Best processor is ... The processor enabling the earliest start time, if the

performance is the main concern The processor becoming idle the earliest, if the speed is

the main concern. LSDP (List Scheduling with Dynamic

Priorities); Priorities for task-processor pairs more complex

Docsity.com

List Scheduling • Reducing LSSP time complexity

O(V log(V) + (E+V)P) => O(V log (P) + E) V = expected number of tasks E = expected number of dependencies P = number of processors

1. Considering only two processors 2. Maintaining partially-sorted task priority

queue with a limited number of tasks

Docsity.com

Preliminaries

Parallel programs (DAG) G = (V,E) Computation cost Tw(t) Communication cost Tc(t, t’) Communication and

computation ratio (CCR) The task graph width (W)

E E

E E E

E E

E E

V

V V V

V V V

V

E

Docsity.com

Preliminaries • Entry and exit tasks The bottom level (Tb) of the task Ready = parents scheduled Start time Ts(t) Finish time Tf(t) Partial schedule Processor ready time Tr(p) = max Tf(t) , t V, Pr(t)=p.

Processor becoming idle the earliest (pr) Tr(pr) = min Tr(p) , p P

Docsity.com

Preliminaries • The last message arrival time Tm(t) = max { Tf(t’) + Tc(t’, t) } (t’, t) E

The enabling processor pe(t); from which last message arrives

Effective message arrival time Te(t,p) = max { Tf(t’) + Tc(t’, t) } (t’, t) E , pt(t’) <> p

The start time of a ready task, once scheduled Ts(t, p) = max { Te(t, p), Tr(p) }

Docsity.com

General Framework for LSSP

General LSSP algorithm Task’s priority computation,

O(E + V) Task selection,

O(V log W) Processor selection

O( (E + V) P)

Docsity.com

General Framework for LSSP

Processor Selection selecting a processor

1. The enabling processor 2. Processor becoming idle first

Ts(t) = max { Te (t, p), Tr ( p ) }

Docsity.com

General Framework for LSSP • Lemma 1. p <> pe(t) : Te (t, p) = Tm(t)

Theorem 1. t is a ready task, one of the

processors p {pe(t), pr } satisfies Ts (t, p) = min Ts(t, px), px P

O( (E + V) P ) O (V log (P) + E ) O (E + V) to traverse the task graph O (V log P) to maintain the processors

sorted Docsity.com

General Framework for LSSP

Task Selection O (V log W) can be reduced by sorting only

some of the tasks. Task priority queue

1. A sorted list of size H 2. A FIFO list ( O ( 1 ) )

decreases to O(V log H) H needs to be adjusted H = P is optimal ( O ( V log P ) )

Docsity.com

Complexity Analysis

Computing task priorities O ( E + V )

Task selection O ( V log W ) O ( V log H ) for partially sorted priority queue O ( V log (P) ) for queue of size P

Processor Selection O (E + V) O (V log P)

Total complexity O ( V ( log (W) + log (P) ) + E) fully sorted O ( V ( log (P) + E ) partially sorted

Docsity.com

Case Study • MCP (Modified Critical

Path) The task having the

highest bottom level has the highest priority

FCP (Fast Critical Path) 3 Processors Partially sorted priority

queue of size 2 7 tasks

4

4

1

1 3 2

3 1

1 1

t0 / 2

t1 / 2 t2 / 2 t3 / 2

t6 / 2 t5 / 3 t4 / 3

t7 / 2

2

Docsity.com

Case Study

4

4

1

1 3 2

3 1

1 1

t0 / 2

t1 / 2 t2 / 2 t3 / 2

t6 / 2 t5 / 3 t4 / 3

t7 / 2

2

Ready tasks Scheduling sorted FIFO

t t -> p [ Ts - Tf ]

t0 [15] - t0 t0 -> p0 [0 - 2]

t1 [11] t2 [9]

t3 [12] t1 t1 -> p0 [2 - 4]

t3 [12] t4 [6] t2 [9] t5 [8]

t3 t3 -> p1 [3 - 6]

t2 [9] t4 [6]

t5 [8] t2 t2 -> p0 [4 - 6]

t5 [8] t4 [6]

t6 [6] t5 t5 -> p2 [6 - 9]

t4 [6] t6 [6]

- t4 t4 -> p0 [6 - 9]

t6 [6] - t6 t6 -> p1 [7 - 9]

t7 [2] - t7 t7 -> p2 [11 - 13]

Docsity.com

Extensions for LSDP

Extend the approach to dynamic priorities ETF : ready task starts the earliest ERT : ready task finishes the earliest DLS : task-processor having highest dynamic level General formula ρ (t, p) = α ( t ) + max { Te (T, p), Tr (p) }

• αETF ( t ) = 0 • αERT ( t ) = Tw( t ) • αDLS ( t ) = - Tb(t)

Docsity.com

Extensions for LSDP

EP case on each processor, the tasks are sorted the processors are sorted

non-EP case the processor becoming idle first if this is EP, it falls to the EP case

Docsity.com

Extensions for LSDP

3 tries; 1 for EP case, 1 for non-EP case

Task priority queues maintained; P for EP case, 2 for non-EP case

Each task is added to 3 queues; 1 for EP case, 2 for non-EP case

Processor queues; 1 for EP case, 1 for non-EP case

Docsity.com

Complexity • Originally O ( W ( E + V ) P ) now O ( V (log (W) + log (P) ) + E ) can be further reduced using partially

sorted priority queue. A size of P is required to maintain comparable performance

O ( V log (P) + E )

Docsity.com

Conclusion

LSSP can be performed at a significantly lower

cost... Processor selection between only two processors;

enabling processor or processor becoming idle first Task selection, only a limited number of tasks are

sorted Using the extension of this method, LSDP

complexity also can be reduced For large program and processor dimensions,

superior cost-performance trade-off.

Docsity.com

no comments were posted
This is only a preview
3 shown on 20 pages
Download the document