Download Revisiting the Sequential Programming Model for Multi-Core | CISC 879 and more Exams Computer Science in PDF only on Docsity!
Presenter: Kishen Maloor Dept of Computer & Information Sciences University of Delaware
Revisiting the Sequential
Programming Model for Multi-Core
Matthew J. Bridges Neil Vachharajani Yun Zhang Thomas Jablin David I. August Department of Computer Science Princeton University
Motivation
- Move to multi-threaded programming is costly.
- Parallel programming models: costly to adopt.
- Need for automatic parallelization.
- Large number of existing single-threaded applications.
- Past attempts have been Insufficient to keep many cores busy.
Parallelization Framework
- Compiler and hardware support.
- Thread level speculation.
- Execute loop iterations in parallel.
- Needs to buffer results.
- Decoupled software pipelining.
- Partition loop into stages; execute in parallel.
Parallelization Framework
- Attempt to extract DOALL parallelism.
- Use of alias and value speculation.
- Avoid misspeculation.
- Synchronizing some dependences.
- Forwarding stored values to later threads.
Y-branch
dict = start_dictionary(); while ((char = read(1)) != EOF) { profitable = compress(char, dict) @YBRANCH(probability=.00001) if (!profitable) dict = restart_dictionary(dict); } finish_dictionary(dict); Use of a y-branch #define CUTOFF 100000 dict = start_dictionary(); int count = 0; while ((char = read(1)) != EOF) { profitable = compress(char, dict) if (!profitable) { dict = restart_dictionary(dict); } else if (count == CUTOFF) { dict = restart_dictionary(dict); count = 0; } count++; } finish_dictionary(dict);
Commutative
static int seed; @Commutative int Yacm_random() { int temp = seed / 127773L; seed = 16807L * (seed - temp * 127773L) - (temp * 2836L); if( seed < 0 ) seed += 2147483647L; Return seed; } Use of commutative
Experimentation Approach
Experimentation Approach
- Extension of DSWP.
- Phases are statically selected “regions” in code.
- “Tasks” are dynamic instances of these regions.
Case Studies
- Manual parallelization of the SPEC CINT2000 benchmarks.
- Uses the described experimentation approach.
- Demonstrate use of known compiler technologies.
- Experiments performed with 1- cores.
256.bzip
- Compresses and decompresses a file.
- Input file is divided into independent blocks of same size.
- Use DSWP parallelization.
- Phase A thread reads in blocks.
- Phase B threads compress blocks, buffer results.
- Phase C threads write result to output stream.
253.perlbmk
- Interpreter for the Perl language.
- Source statements => set of operations demarcated by NEXTSTATE operations.
- Executed using a virtual stack machine.
- Compiler can precompute next NEXTSTATE operation.
- Execute sets of operations representing perl statements in parallel.
181.mcf
- Solves a combinatorial optimization problem using a network simplex algorithm.
- Main loop of this algorithm parallelized using value speculation.
254.gap
- Interpreter for a computational discrete algebra programming language.
- Speculate that statements are data independent.
- Memory allocation routines marked commutative.
- Misspeculation results:
- Due to true data dependences.
- Due the garbage collection performed.
186.crafty
- Application that plays chess.
- Uses a recursive search function.
- Can search each of the moves in the root list of moves independently.
- Uses caches to some ways to prune the search space and improve performance.
- Cache lookup function marked as commutative.
- Unroll recursion and parallelize.