Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Scope of Parallelism-Parallel Processing-Lecture Slides, Slides of Parallel Computing and Programming

Ankit Institute of Technology and Science Parallel Computing and Programming

Prof. Bhairav Gupta delivered this lecture at Ankit Institute of Technology and Science for Parallel Processing course. It includes: Parallelism, Processor, Memory, System, Performance, , Latency, Prefetch, Bandwidth, Datapath, Bottlenecks

Typology: Slides

2011/2012

Uploaded on 07/23/2012

paramita 🇮🇳

4.6

(16)

120 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

1.2

Scope of Parallelism

• Conventional architectures coarsely

comprise of a processor, memory

system, and the datapath.

• Each of these components present

significant performance bottlenecks.

• Parallelism addresses each of these

components in significant ways.

• It is important to understand each of

these performance bottlenecks.

docsity.com

Discover Slides of Parallel Computing and Programming Ankit Institute of Technology and Science

Partial preview of the text

Download Scope of Parallelism-Parallel Processing-Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Scope of Parallelism

Conventional architectures coarsely

comprise of a processor, memorysystem, and the datapath.

Each of these components present

significant performance bottlenecks.

Parallelism addresses each of these

components in significant ways.

It is important to understand each of

these performance bottlenecks.

docsity.com

Implicit Parallelism: Trends inMicroprocessor Architectures

•^

Microprocessor clock speeds have posted impressivegains over the past two decades (two to three ordersof magnitude).

Higher levels of device integration have madeavailable a large number of transistors.

The question of how best to utilize these resources isan important one.

Current processors use these resources in multiplefunctional units and execute multiple instructions inthe same cycle.

The precise manner in which these instructions areselected and executed provides impressive diversityin architectures.

docsity.com

SuperPipeline

•^

Pipelining, however, has several limitations.

The speed of a pipeline is eventually limited by thenumber of stages & time of slowest stage.

For this reason, conventional processors rely on verydeep pipelines or super-pipeline (20 stage pipelinesin state-of-the-art Pentium processors).

However, a typical pipeline has resource constraint,data dependency & Branch prediction issues. Approxevery 5-6th instruction is a conditional jump! Thisrequires very accurate branch prediction.

The penalty of a prediction error grows with the depthof the pipeline, since a larger number of instructionswill have to be flushed.

docsity.com

Superscalar

One simple way of alleviating thesebottlenecks is to use multiple pipelines &

Issue multiple independent instructionssimultaneously

Examples: MIPS1000, PowerPC & Pentium

The question then becomes one of selectingthese instructions.

docsity.com

Superscalar Execution

Example…

con’d

In the above example, there is some

wastage of resources due to datadependencies.

The example also illustrates that

different instruction mixes with identicalsemantics can take significantlydifferent execution time.

docsity.com

Superscalar Execution

•^

Scheduling of instructions is determined by a numberof factors:– True Data Dependency: The result of one

operation is an input to the next.

Resource Dependency: Two operations require

the same resource.

Branch Dependency: Scheduling instructions

across conditional branch statements cannot bedone deterministically a-priori.

The scheduler, a piece of hardware looks at a

number of instructions in an instruction queue andselects appropriate number of instructions toexecute concurrently based on these factors.

The complexity of this hardware is an important

constraint on superscalar processors.

docsity.com

Superscalar Execution: Efficiency Considerations

•^

Not all functional units can be kept busy at all times.

If during a cycle, no functional units are utilized, thisis referred to as vertical waste.

If during a cycle, only some of the functional units areutilized, this is referred to as horizontal waste.

Due to limited parallelism in typical instruction traces,dependencies, or the inability of the scheduler toextract parallelism, the performance of superscalarprocessors is eventually limited.

Conventional microprocessors typically support four-way superscalar execution.

docsity.com

Very Long Instruction Word (VLIW)

Processors

•^

The hardware cost and complexity of the superscalarscheduler is a major consideration in processordesign.

To address this issues, VLIW processors rely oncompile time analysis to identify and bundle togetherinstructions that can be executed concurrently.

These instructions are packed and dispatchedtogether, and thus the name very long instructionword.

This concept was used with some commercialsuccess in the Multiflow Trace machine (circa).

Variants of this concept are employed in the IntelIA64 processors & TI TMS320 C6XXX DSPs.

docsity.com

Very Long Instruction Word (VLIW)

Processors: Considerations

•^

Issue hardware is simpler.

Compiler has a bigger context from which to selectco-scheduled instructions.

Compilers, however, do not have runtime informationsuch as cache misses. Scheduling is, therefore,inherently conservative.

Branch and memory prediction is more difficult.

VLIW performance is highly dependent on thecompiler. A number of techniques such as loopunrolling, speculative execution, branch predictionare critical.

Typical VLIW processors are limited to 4-way to 8-way parallelism.

docsity.com

Scope of Parallelism-Parallel Processing-Lecture Slides, Slides of Parallel Computing and Programming

Related documents

Partial preview of the text

Download Scope of Parallelism-Parallel Processing-Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Scope of Parallelism

comprise of a processor, memorysystem, and the datapath.

significant performance bottlenecks.

components in significant ways.

these performance bottlenecks.

•^

•^

Superscalar

One simple way of alleviating thesebottlenecks is to use multiple pipelines &

Issue multiple independent instructionssimultaneously

Examples: MIPS1000, PowerPC & Pentium

The question then becomes one of selectingthese instructions.

wastage of resources due to datadependencies.

different instruction mixes with identicalsemantics can take significantlydifferent execution time.

Superscalar Execution

•^

•^

•^

•^