Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Architecture of Parallel Computers: Parallel Programs and Applications - Prof. Gehringer, Study notes of Electrical and Electronics Engineering

North Carolina State University (NCSU)Electrical and Electronics Engineering

Prof. Gehringer

The importance of understanding parallel programs in the context of parallel computer architecture. It introduces motivating problems, such as simulating ocean currents and the evolution of galaxies, and explains the steps in creating a parallel program. The document also covers the concept of work decomposition and task partitioning, and discusses the challenges of managing data access, communication, and synchronization in parallel programs.

Typology: Study notes

Pre 2010

Uploaded on 03/10/2009

koofers-user-5mw-1 🇺🇸

10 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

Lecture 5 Architecture of Parallel Computers 1

Parallel Programs

[§2.1] Why should we care about the structure of programs in an

architecture class?

•Knowing about them helps us make design decisions.

•It led to key advances in uniprocessor architecture

°Caches.

°Instruction-set design.

This is even more important in multiprocessors. Why?

In our discussion of parallel programs, we will proceed as follows.

•Introduce “motivating problems”—application case-studies.

•Describe the steps in creating a parallel program

•Show what a simple parallel program looks like in the three

programming models,

and consider what primitives a system must support.

We will study these parallel applications.

•Simulating ocean currents.

°Discretize the problem on a set of regular grids, and

solve an equation on those grids.

°Common technique, common communication patterns.

°Regular structure, scientific computing.

•Simulating the evolution of galaxies.

°No discretization of domain.

°Rather, the domain is represented as a large number of

bodies interacting with each other—an n-body problem.

Discover Study notes of Electrical and Electronics Engineering North Carolina State University (NCSU)

Partial preview of the text

Download Architecture of Parallel Computers: Parallel Programs and Applications - Prof. Gehringer and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Parallel Programs

[§2.1] Why should we care about the structure of programs in an architecture class?

Knowing about them helps us make design decisions.
It led to key advances in uniprocessor architecture

° Caches. ° Instruction-set design.

This is even more important in multiprocessors. Why?

In our discussion of parallel programs, we will proceed as follows.

Introduce “motivating problems”—application case-studies.
Describe the steps in creating a parallel program
Show what a simple parallel program looks like in the three programming models, and consider what primitives a system must support.

We will study these parallel applications.

Simulating ocean currents. ° Discretize the problem on a set of regular grids, and solve an equation on those grids. ° Common technique, common communication patterns. ° Regular structure, scientific computing.
Simulating the evolution of galaxies.

° No discretization of domain. ° Rather, the domain is represented as a large number of bodies interacting with each other—an n -body problem.

° Irregular structure, unpredictable communication, scientific computing.

Rendering scenes by ray tracing

° Traverses a 3D scene with unpredictable access patterns and renders it into a 2-dimensional image for display. ° Irregular structure, computer graphics

Data mining

° Irregular structure, information processing. ° I/O intensive; parallelizing I/O important. ° Not discussed here (read in book).

Simulating ocean currents

Goal: Simulate the motion of water currents in the ocean. Important to climate modeling.

Motion depends on atmospheric forces, friction with ocean floor, & “friction” with ocean walls.

Predicting the state of the ocean at any instant requires solving complex systems of equations.

The problem is continuous in both space and time, but to solve it, we discretize it over both dimensions.

Every important variable, e.g.,

pressure
velocity
currents

has a value at each grid point.

This model uses a set of 2D horizontal cross-sections through the ocean basin.

In each time-step—

Compute gravitational forces exerted on each star by all the others.
Update the position, velocity, and other attributes of the star.

A brute-force approach to calculating interactions between stars would be O ( ).

However, smarter algorithms are able to reduce that to O ( ), making it possible to simulate systems of millions of stars.

They take advantage of the fact that the strength of gravitational attraction falls off with distance.

So the influences of stars far away don’t need to be computed with such great accuracy.

Star on which forces are being computed

Star too close to approximate

Small group far enough away to approximate to center of mass

Large group far enough away to approximate

We can approximate a group of far-off stars by a single star at the center of the group.

The strength of many physical forces falls off with distance, so hierarchical methods are becoming increasingly popular.

Some galaxies are denser in some regions. These regions are more expensive to compute with.

Stars in denser regions interact with more other stars.

Ample concurrency exists across stars within a time-step, but it is irregular and constantly changing => hard to exploit.

Ray tracing

Ray tracing is a common technique for rendering complex scenes into images.

Scene is represented as a set of objects in 3D space.
Image is represented as a 2D array of pixels.
Scene is rendered as seen from a specific viewpoint.
Rays are shot from that viewpoint through every pixel into the scene
Follow their paths ... They bounce around as they strike objects. They generate new rays: ray tree per input ray
Result is color and opacity for that pixel.

There is parallelism across rays.

The parallelization process

[§2.2] Sometimes, a serial algorithm can be easily translated to a parallel algorithm. Other times, to achieve efficiency, a completely different algorithm is required.

[§2.2.1] Pieces of the job:

• Identify work that can be done in parallel.

• Partition work and perhaps data among processes.

• Manage data access, communication and synchronization.

• Note : Work includes computation, data access and I/O.

Tasks are assigned to processors.

Example: In the ocean simulation, an equal number of rows may be assigned to each processor.

Processes need not correspond 1-to-1 with processors!

Four steps in parallelizing a program:

Decomposition of the computation into tasks.
Assignment of tasks to processors.
Orchestration of the necessary data access, communication, and synchronization among processes.
Mapping of processes to processors.

Together, decomposition and assignment are called partitioning.

They break up the computation into tasks to be divided among processes.

Tasks may become available dynamically.
The number of available tasks may vary with time.

Goal: Enough tasks to keep processes busy, but not too many.

The number of tasks available at a time is an upper bound on the achievable

Amdahl’s law

If some portions of the problem don’t have much concurrency, the speedup on those portions will be low, lowering the average speedup of the whole program.

Suppose that a program is composed of a serial phase and a parallel phase.

The whole program runs for 1 time unit.
The serial phase runs for time s , and the parallel phase for time 1− s.

Then regardless of how many processors p are used, the execution time of the program will be at least

and the speedup will be no more that. This is known as Amdahl’s law.

For example, if 25% of the program’s execution time is serial, then regardless of how many processors are used, we can achieve a speedup of no more than

Architecture of Parallel Computers: Parallel Programs and Applications - Prof. Gehringer, Study notes of Electrical and Electronics Engineering

Related documents

Partial preview of the text

Download Architecture of Parallel Computers: Parallel Programs and Applications - Prof. Gehringer and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

Parallel Programs

The parallelization process

• Identify work that can be done in parallel.

• Partition work and perhaps data among processes.

• Manage data access, communication and synchronization.

• Note : Work includes computation, data access and I/O.