Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Fundamental Concepts - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Aliah University Parallel Computing and Programming

Some concept of Parallel Processing are Anatomy, Cache Access Time, Instruction Formats, Instruction Formats, Instruction Formats, Multidimensional Meshes, Network Processors, Snooping Protocol. Main points of this lecture are: Fundamental Concepts, Taxonomy, Tools For Evaluation, Comparison, Theory, Delineate, Hard Problems, Introduction to Parallelism, Taste of Parallel Algorithms, Parallel Algorithm Complexity

Typology: Slides

2012/2013

Uploaded on 04/30/2013

devank 🇮🇳

4.3

(12)

152 documents

1 / 97

This page cannot be seen from the preview

Don't miss anything!

Part I

Fundamental Concepts

Docsity.com

Discover Slides of Parallel Computing and Programming Aliah University

Partial preview of the text

Download Fundamental Concepts - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Part I

Fundamental Concepts

I Fundamental Concepts

Provide motivation, paint the big picture, introduce the 3 Ts:

Taxonomy (basic terminology and models)
Tools for evaluation or comparison
Theory to delineate easy and hard problems

Topics in This Part

Chapter 1 Introduction to Parallelism

Chapter 2 A Taste of Parallel Algorithms

Chapter 3 Parallel Algorithm Complexity

Chapter 4 Models of Parallel Processing

1.1 Why Parallel Processing?

Fig. 1.1 The exponential growth of microprocessor performance, known as Moore’s Law, shown over the past two decades (extrapolated).

1980 1990 2000 2010

KIPS

MIPS

GIPS

TIPS

Processor performance

Calendar year

80286

68000

80386

80486 68040

Pentium

Pentium II R

×1.6 / yr

2020

Projection, circa 1998 Projection, circa 2012

The number of cores has been increasing from a few in 2005 to the current 10s, and is projected to reach 100s by 2020

From: “Robots After All,” by H. Moravec, CACM , pp. 90-97, October 2003.

Mental power in four scales

Evolution of Computer Performance/Cost

NRC Report (2011): The Future of Computing Performance: Game Over or Next Level?

Trends in Processor Chip Density, Performance,

Clock Speed, Power, and Number of Cores

Density

Perf’ce

Power

Cores

Clock

Source: [DANO12] “CPU DB: Recording Microprocessor History,” CACM , April 2012.

Feature Size (μm)

Overall Performance Improvement (SPECINT, relative to 386)

Gate Speed Improvement (FO4, relative to 386)

~1985 (^) --------- 1995-2000 --------- ~ Much of arch. improvements already achieved

Shares of Technology and Architecture

in Processor Performance Improvement

The Speed-of-Light Argument

The speed of light is about 30 cm/ns.

Signals travel at 40-70% speed of light (say, 15 cm/ns).

If signals must travel 1.5 cm during the execution of an

instruction, that instruction will take at least 0.1 ns;

thus, performance will be limited to 10 GIPS.

This limitation is eased by continued miniaturization,

architectural methods such as cache memory, etc.;

however, a fundamental limit does exist.

How does parallel processing help? Wouldn’t multiple

processors need to communicate via signals as well?

Why Do We Need TIPS or TFLOPS Performance?

Reasonable running time = Fraction of hour to several hours (10 3 -10 4 s) In this time, a TIPS/TFLOPS machine can perform 10 15 -10 16 operations

Example 2: Fluid dynamics calculations (1000 × 1000 × 1000 lattice) 10 9 lattice points × 1000 FLOP/point × 10 000 time steps = 10 16 FLOP

Example 3: Monte Carlo simulation of nuclear reactor 10 11 particles to track (for 1000 escapes) × 10 4 FLOP/particle = 10 15 FLOP

Decentralized supercomputing ( from Mathworld News , 2006/4/7 ): Grid of tens of thousands networked computers discovers 2 30 402 457 – 1, the 43rd Mersenne prime, as the largest known prime (9 152 052 digits )

Example 1: Southern oceans heat Modeling (10-minute iterations) 300 GFLOP per iteration × 300 000 iterations per 6 yrs = 10 16 FLOP

The ASCI Program

1995 2000 2005 2010

Performance (TFLOPS)

Calendar year

ASCI Red

ASCI Blue

ASCI White

1+ TFLOPS, 0.5 TB

3+ TFLOPS, 1.5 TB

10+ TFLOPS, 5 TB

30+ TFLOPS, 10 TB

100+ TFLOPS, 20 TB

100

1000 Plan^ Develop^ Use

ASCI

ASCI Purple

ASCI Q

Fig. 24.1 Milestones in the Accelerated Strategic (Advanced Simulation &) Computing Initiative (ASCI) program, sponsored by the US Department of Energy, with extrapolation up to the PFLOPS level.

The Quest for Higher Performance

1. IBM Blue Gene/L 2. SGI Columbia 3. NEC Earth Sim

LLNL, California NASA Ames, California Earth Sim Ctr, Yokohama

Material science, nuclear stockpile sim

Aerospace/space sim, climate research

Atmospheric, oceanic, and earth sciences

32,768 proc’s, 8 TB, 28 TB disk storage

10,240 proc’s, 20 TB, 440 TB disk storage

5,120 proc’s, 10 TB, 700 TB disk storage

Linux + custom OS Linux Unix

71 TFLOPS , $100 M 52 TFLOPS , $50 M 36 TFLOPS* , $400 M?

Dual-proc Power-PC chips (10-15 W power)

20x Altix (512 Itanium2) linked by Infiniband

Built of custom vector microprocessors

Full system: 130k-proc, 360 TFLOPS (est)

Volume = 50x IBM, Power = 14x IBM

Led the top500 list for 2.5 yrs

Top Three Supercomputers in 2005 ( IEEE Spectrum , Feb. 2005, pp. 15-16)

The Quest for Higher Performance: 2012 Update

1. Cray Titan 2. IBM Sequoia 3. Fujitsu K Computer

ORNL, Tennessee LLNL, California RIKEN AICS, Japan

XK7 architecture Blue Gene/Q arch RIKEN architecture

560,640 cores, 710 TB, Cray Linux

1,572,864 cores, 1573 TB, Linux

705,024 cores, 1410 TB, Linux Cray Gemini interconn’t Custom interconnect Tofu interconnect

17.6/27.1 PFLOPS 16.3/20.1 PFLOPS 10.5/11.3 PFLOPS** *

AMD Opteron, 16-core, 2.2 GHz, NVIDIA K20x

Power BQC, 16-core, 1.6 GHz

SPARC64 VIIIfx, 2.0 GHz 8.2 MW power 7.9 MW power 12.7 MW power

Top Three Supercomputers in November 2012 (http://www.top500.org)

max/peak performance In the top 10, IBM also occupies ranks 4-7 and 9-10. Dell and NUDT (China) hold ranks 7-8.

Top 500 Supercomputers in the World

1.2 A Motivating

Example

Fig. 1.3 The sieve of Eratosthenes yielding a list of 10 primes for n = 30. Marked elements have been distinguished by erasure from the list.

Init. Pass 1 Pass 2 Pass 3 2 ← m 2 2 2 3 3 ← m 3 3 4 5 5 5 ← m 5 6 7 7 7 7 ← m 8 9 9 10 11 11 11 11 12 13 13 13 13 14 15 15 16 17 17 17 17 18 19 19 19 19 20 21 21 22 23 23 23 23 24 25 25 25 26 27 27 28 29 29 29 29 30

Any composite number has a prime factor that is no greater than its square root.

Single-Processor Implementation of the Sieve

Fig. 1.4 Schematic representation of single-processor solution for the sieve of Eratosthenes.

1 2 n

Current Prime (^) Index P

Bit-vector

Fundamental Concepts - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Related documents

Partial preview of the text

Download Fundamental Concepts - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Part I

Fundamental Concepts

I Fundamental Concepts

Provide motivation, paint the big picture, introduce the 3 Ts:

Topics in This Part

Chapter 1 Introduction to Parallelism

Chapter 2 A Taste of Parallel Algorithms

Chapter 3 Parallel Algorithm Complexity

Chapter 4 Models of Parallel Processing

Evolution of Computer Performance/Cost

Trends in Processor Chip Density, Performance,

Clock Speed, Power, and Number of Cores

Shares of Technology and Architecture

in Processor Performance Improvement

The Speed-of-Light Argument

The speed of light is about 30 cm/ns.

Signals travel at 40-70% speed of light (say, 15 cm/ns).

If signals must travel 1.5 cm during the execution of an

instruction, that instruction will take at least 0.1 ns;

thus, performance will be limited to 10 GIPS.

This limitation is eased by continued miniaturization,

architectural methods such as cache memory, etc.;

however, a fundamental limit does exist.

How does parallel processing help? Wouldn’t multiple

processors need to communicate via signals as well?

Why Do We Need TIPS or TFLOPS Performance?

The ASCI Program

The Quest for Higher Performance

1. IBM Blue Gene/L 2. SGI Columbia 3. NEC Earth Sim

The Quest for Higher Performance: 2012 Update

1. Cray Titan 2. IBM Sequoia 3. Fujitsu K Computer

Top 500 Supercomputers in the World

Single-Processor Implementation of the Sieve