Debugging Parallel Programs: Common Bugs and Solutions, Slides of Assembly Language Programming

This document, from the 6.189 iap 2007 mit course taught by dr. Rodric rabbah of ibm, discusses the challenges of debugging parallel programs and provides solutions through visual debugging tools, commercial and research debuggers, and common defect patterns. Topics such as erroneous use of language features, space decomposition, synchronization, and performance scalability.

Typology: Slides

2010/2011

Uploaded on 10/11/2011

lovefool
lovefool 🇬🇧

4.5

(21)

292 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT
6.189 IAP 2007
Lecture 9
Debugging Parallel Programs
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Debugging Parallel Programs: Common Bugs and Solutions and more Slides Assembly Language Programming in PDF only on Docsity!

Dr. Rodric Rabbah, IBM.^

6.189 IAP 2007Lecture 9Debugging Parallel Programs^1 6.189 IAP 2007 MIT

2 6.189 IAP 2007 MIT

Debugging Parallel Programs is Hard-er ●^ Parallel programs are subject to the usual bugs ●^ Plus: new timing and synchronization errors ●^ And: parallel bugs often disappear when you addcode to try to identify the bugDr. Rodric Rabbah, IBM.

4 6.189 IAP 2007 MIT

TotalView Dr. Rodric Rabbah, IBM.

5 6.189 IAP 2007 MIT

Debugging Parallel Programs ●^ Commercial debuggers^ „^ TotalView, … ●^ The^ printf^ approach ●^ gdb, MPI gdb, ppu/spu gdb, … ●^ Research debuggers^ „^ StreamIt Debugger, …Dr. Rodric Rabbah, IBM.

7 6.189 IAP 2007 MIT

Cell Debugger in Eclipse IDE Dr. Rodric Rabbah, IBM.

8 6.189 IAP 2007 MIT

Pattern-based Approach to Debugging ●^ “Defect Patterns”: common kinds of bugs in parallelprograms^ „^ Useful tips to prevent them^ „^ Recipes for effective resolution ●^ Inspired by empirical studies at University ofMaryland^ „^ http://fc-md.umd.edu/softwareday//presentations/Session0/Keynote.pdf ●^ At the end of this course, will try to identify somecommon Cell defect patterns based on yourfeedback and projectsDr. Rodric Rabbah, IBM.

10 6.189 IAP 2007 MIT

Does Cell have too many functions?^ spe_create_threadmfc_getspe_waitmfc_putmfc_stat_cmd_queuespe_write_in_mboxmfc_write_tag_maskspe_stat_in_mboxmfc_read_tag_status_all/any/immediatespe_read_out_mboxspu_read_in_mboxspe_stat_out_mboxspu_stat_in_mboxspe_write_signalspu_write_out_mbox,spe_get_lsspe_get_ps_areaspe_mfc_getspe_mfc_putspe_mfc_read_tag_statusspe_create_groupspe_get_eventDr. Rodric Rabbah, IBM.

spu_write_out_intr_mboxspu_stat_out_mbox, spu_stat_out_intr_mboxspu_read_signal1/2spu_stat_signal1/2spu_write_event_maskspu_read_event_statusspu_stat_event_statusspu_write_event_ackspu_read_decrementerspu_write_decrementer

●^ Yes! But you may not need all of them ●^ Understand a few basic features

11 6.189 IAP 2007 MIT

Defect Pattern: Space Decomposition ●^ Incorrect mapping between the problem space and theprogram memory space ●^ Symptoms^ „^ Segmentation fault (if array index is out of range)^ „^ Incorrect or slightly incorrect output ●^ Cause^ „^ Mapping in parallel version can be different from that in serialversion–^ Array origin is different in every processor–^ Additional memory space for communication can complicate themapping logic ●^ Prevention^ „^ Validate memory allocation carefully when parallelizing codeDr. Rodric Rabbah, IBM.

13 6.189 IAP 2007 MIT

Sequential Implementation ●^ Approach to implementation^ „^ Use an integer array^ buffer[] Dr. Rodric Rabbah, IBM.

for current cell values „^ Use a second array^ nextbuffer[]

to store the values for next step „ Swap the buffers

14 6.189 IAP 2007 MIT /*^ Initialize^ cells^ */int^ x,^ n,^ *tmp;int^ buffer^ =^ (int)malloc(N Dr. Rodric Rabbah, IBM.

*^ sizeof(int)); int^ nextbuffer^ =^ (int)malloc(N

*^ sizeof(int)); FILE^ *fp^ =^ fopen("input.dat",

"r"); if^ (fp^ ==^ NULL)^ {^ exit(-1);^

for^ (x^ =^ 0;^ x^ <^ N;^ x++)^ {^ fscanf(fp,

"%d",^ &buffer[x]);^ } fclose(fp);/*^ Main^ loop^ */for^ (n^ =^ 0;^ n^ <^ steps;^ n++)^

{ for (x = 0; x < N; x++) { nextbuffer[x] = (buffer[(x-1+N)%N]+buffer[(x+1)%N])^ %

} tmp^ =^ buffer;^ buffer^ =^ nextbuffer;

nextbuffer^ =^ tmp;

Sequential C Code } /*^ Final^ output^ */...free(nextbuffer);^ free(buffer);

Example adapted fromTaiga Nakamura

16 6.189 IAP 2007 MIT nlocal^ =^ N^ /^ size;buffer^ =^ (int*)malloc((nlocal+2) Dr. Rodric Rabbah, IBM.

^ sizeof(int)); nextbuffer^ =^ (int)malloc((nlocal+2)

^ sizeof(int)); /^ Main^ loop^ */for^ (n^ =^ 0;^ n^ <^ steps;^ n++)^ { for^ (x^ =^ 0;^ x^ <^ nlocal;^ x++)

{ nextbuffer[x] = (buffer[(x-1+N)%N]+buffer[(x+1)%N])^ %^

} /*^ Exchange^ boundary^ cells^ with

neighbors^ */ ... tmp^ =^ buffer;^ buffer^ =^ nextbuffer;

nextbuffer^ =^ tmp;

Decomposition } buffer[]^ …^0

Where are the bugs? (nlocal+1)^ Example adapted fromTaiga Nakamura

17 6.189 IAP 2007 MIT nlocal^ =^ N^ /^ size;buffer^ =^ (int*)malloc((nlocal+2) Dr. Rodric Rabbah, IBM.

^ sizeof(int)); nextbuffer^ =^ (int)malloc((nlocal+2)

^ sizeof(int)); /^ Main^ loop^ */for^ (n^ =^ 0;^ n^ <^ steps;^ n++)^ { for^ (x^ =^ 0;^ x^ <^ nlocal;^ x++)

{ nextbuffer[x] = (buffer[(x-1+N)%N]+buffer[(x+1)%N])^ %^

} /*^ Exchange^ boundary^ cells^ with

neighbors^ */ ... tmp^ =^ buffer;^ buffer^ =^ nextbuffer;

nextbuffer^ =^ tmp;

Decomposition } buffer[]^ …^0

Where are the bugs? N may not be divisible by^ size (x = 1; x < nlocal+1;^ x++) (nlocal+1)^ Example adapted fromTaiga Nakamura

19 6.189 IAP 2007 MIT /*^ Main^ loop^ */for^ (n^ =^ 0;^ n^ <^ steps;^ n++)^ { for^ (x^ =^ 1;^ x^ <^ nlocal+1;^ x++) Dr. Rodric Rabbah, IBM.

nextbuffer[x]^ =^ (buffer[(x-1+N)%N]+buffer[(x+1)%N])

%^ 10;

} /*^ Exchange^ boundary^ cells^ with

neighbors^ */ receive (&nextbuffer[0],^

(rank+size-1)%size); send^ (&nextbuffer[nlocal],

(rank+1)%size); receive (&nextbuffer[nlocal+1], (rank+1)%size);send^ (&nextbuffer[1],^

(rank+size-1)%size); tmp^ =^ buffer;^ buffer^ =^ nextbuffer;

nextbuffer^ =^ tmp;

Communication^ }

Where are the bugs? … ● Deadlock

… … 0 (nlocal+1)^ Example adapted fromTaiga Nakamura

20 6.189 IAP 2007 MIT

Modes of Communication ●^ Recall there are different types of sends andreceives^ „^ Synchronous^ „^ Asynchronous^ „^ Blocking^ „^ Non-blocking ●^ Tips for orchestrating communication^ „^ Alternate the order of sends and receives^ „^ Use asynchronous and non-blocking messageswhere possibleDr. Rodric Rabbah, IBM.