




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The importance of understanding parallel programs in the context of parallel computer architecture. It introduces motivating problems, such as simulating ocean currents and the evolution of galaxies, and explains the steps in creating a parallel program. The document also covers the concept of work decomposition and task partitioning, and discusses the challenges of managing data access, communication, and synchronization in parallel programs.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





[§2.1] Why should we care about the structure of programs in an architecture class?
° Caches. ° Instruction-set design.
This is even more important in multiprocessors. Why?
In our discussion of parallel programs, we will proceed as follows.
We will study these parallel applications.
° No discretization of domain. ° Rather, the domain is represented as a large number of bodies interacting with each other—an n -body problem.
° Irregular structure, unpredictable communication, scientific computing.
° Traverses a 3D scene with unpredictable access patterns and renders it into a 2-dimensional image for display. ° Irregular structure, computer graphics
° Irregular structure, information processing. ° I/O intensive; parallelizing I/O important. ° Not discussed here (read in book).
Simulating ocean currents
Goal: Simulate the motion of water currents in the ocean. Important to climate modeling.
Motion depends on atmospheric forces, friction with ocean floor, & “friction” with ocean walls.
Predicting the state of the ocean at any instant requires solving complex systems of equations.
The problem is continuous in both space and time, but to solve it, we discretize it over both dimensions.
Every important variable, e.g.,
has a value at each grid point.
This model uses a set of 2D horizontal cross-sections through the ocean basin.
In each time-step—
A brute-force approach to calculating interactions between stars would be O ( ).
However, smarter algorithms are able to reduce that to O ( ), making it possible to simulate systems of millions of stars.
They take advantage of the fact that the strength of gravitational attraction falls off with distance.
So the influences of stars far away don’t need to be computed with such great accuracy.
Star on which forces are being computed
Star too close to approximate
Small group far enough away to approximate to center of mass
Large group far enough away to approximate
We can approximate a group of far-off stars by a single star at the center of the group.
The strength of many physical forces falls off with distance, so hierarchical methods are becoming increasingly popular.
Some galaxies are denser in some regions. These regions are more expensive to compute with.
Ray tracing
Ray tracing is a common technique for rendering complex scenes into images.
There is parallelism across rays.
[§2.2] Sometimes, a serial algorithm can be easily translated to a parallel algorithm. Other times, to achieve efficiency, a completely different algorithm is required.
[§2.2.1] Pieces of the job:
Example: In the ocean simulation, an equal number of rows may be assigned to each processor.
Processes need not correspond 1-to-1 with processors!
Four steps in parallelizing a program:
Together, decomposition and assignment are called partitioning.
They break up the computation into tasks to be divided among processes.
Goal: Enough tasks to keep processes busy, but not too many.
The number of tasks available at a time is an upper bound on the achievable
Amdahl’s law
If some portions of the problem don’t have much concurrency, the speedup on those portions will be low, lowering the average speedup of the whole program.
Suppose that a program is composed of a serial phase and a parallel phase.
Then regardless of how many processors p are used, the execution time of the program will be at least
and the speedup will be no more that. This is known as Amdahl’s law.
For example, if 25% of the program’s execution time is serial, then regardless of how many processors are used, we can achieve a speedup of no more than