Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

In-Order & Out-of-Order Execution: Pentium II, Pentium 4, UltraSPARC III, Slides of Information and Computer Technology

English and Foreign Languages University Information and Computer Technology

An overview of in-order and out-of-order execution in various microarchitectures, including pentium ii, pentium 4, ultrasparc iii, and 8051 cpus. It discusses the concepts of in-order execution, out-of-order execution, and speculative execution, as well as the problems and techniques related to each. The document also covers the architecture of each cpu, such as the fetch/decode unit, dispatch/execute unit, retire unit, and memory subsystem.

Typology: Slides

2012/2013

Uploaded on 04/29/2013

architay 🇮🇳

4.4

(17)

112 documents

1 / 50

This page cannot be seen from the preview

Don't miss anything!

In-Order Execution

•In-order execution does not always give the

best performance on superscalar machines.

–The following example uses in-order execution

and in-order completion.

–Multiplication takes one more cycle to complete

than addition/subtraction.

–A scoreboard keeps track of register usage.

•User-visible registers are RO to R8.

•Multiple instructions can read a register, but only one

can write a register.

Docsity.com Docsity.com

Discover Slides of Information and Computer Technology English and Foreign Languages University

Partial preview of the text

Download In-Order & Out-of-Order Execution: Pentium II, Pentium 4, UltraSPARC III and more Slides Information and Computer Technology in PDF only on Docsity!

In-Order Execution

In-order execution does not always give the

best performance on superscalar machines.

The following example uses in-order execution and in-order completion.
Multiplication takes one more cycle to complete than addition/subtraction.
A scoreboard keeps track of register usage.
- User-visible registers are RO to R8.
- Multiple instructions can read a register, but only one can write a register.

In-Order Execution

We can notice three kinds of dependencies

which can cause problems (instruction stalls):

RAW (Read After Write) dependence
WAR (Write After Read) dependence
WAW (Write After Write) dependence
In a WAR dependence, one instruction is trying to overwrite a register that a previous instruction may not yet have finished reading. A WAW dependence is similar.

In-Order Execution

In-order completion is important as well in

order to have the property of precise

interrupts.

Out-of-order completion leads to imprecise interrupts (we don’t know what has completed at the time of an interrupt - this is not good).
In order to avoid stalls, let us now permit out-

of-order execution and out-of-order

retirement.

Out-of-Order Execution

The previous example also introduces a new

technique called register renaming.

The decode unit has changed the use of R1 in I and I7 to a secret register, S1, not visible to the programmer.
Now I6 can be issued concurrently with I5.
Modern CPUs often have dozens of secret registers for use with register renaming.
This can often eliminate WAR and WAW dependencies.

Speculative Execution

Computer programs can be broken up into basic blocks , with each basic block consisting of a linear sequence of code with one entry point and one exit.
A basic block does not contain any control structures. - Therefore its machine language translation does not contain any branches.
Basic blocks are connected by control statements. Programs in this form can be represented by directed graphs.

Speculative Execution

Within each basic block, the reordering techniques seen work well.
Unfortunately, most basic blocks are short and there is insufficient parallelism to exploit.
The next step is to allow reordering to cross block boundaries.
The biggest gains come when a potentially slow operation can be moved upward in the graph to get it going earlier. Moving code upward over a branch is called hoisting.

Speculative Execution

Imagine that all of the variables of the previous example except evensum and oddsum are kept in registers.
It might make sense to move their LOAD instructions to the top of the loop, before computing k , to get them started early on, so the values will be available when they are needed.
Of course only one of them will be needed on each iteration, so the other LOAD will be wasted.

Speculative Execution

Another problem arises if a speculatively executed

instruction causes an exception.

A LOAD instruction may cause a cache miss on a

machine with a large cache line and a memory far slower than the CPU and cache.

One solution is to have a special SPECULATIVE-

LOAD instruction that tries to fetch the word from the cache, but if it is not there, just gives up.

Speculative Execution

A worse situation happens with the following statement: if (x > 0) z = y/x;
Suppose that the variables are all fetched into registers in advance and that the (slow) floating- point division is hoisted above the if test. - If x is 0, the resulting divide-by-zero trap terminates the program even though the programmer has put in explicit code to prevent this situation. - One solution is to have special versions of instructions that might cause exceptions.

Pentium II Microarchitecure

There are three primary components of the CPU:
- Fetch/Decode unit
- Dispatch/Execute unit
- Retire unit
Together they act as a high-level pipeline.
The units communicate through an instruction pool. - The ROB ( ReOrder Buffer ) is a table which stores information about partially completed instructions.

Pentium II Microarchitecure

The Fetch/Decode Unit

The Fetch/Decode unit is highly pipelined, with seven stages. - Instructions enter the pipeline in stage IFU0, where entire 32-byte lines are loaded from the I-cache. - Since the IA-32 instruction set has variable-length instructions with many formats, IFU1 analyzes the byte stream to locate the start of each instruction. - IFU2 aligns the instructions so the next stage can decode them easily. - Decoding starts in ID0. Each IA-32 instruction is broken up into one or more micro-operations. Simple instructions may require just 1 micro-op.

The Fetch/Decode Unit

The micro-operations are queued in stage ID1. This stage also does branch prediction.
The static predictor predicts backward branches to be taken and forward ones not to be. After that, the dynamic branch predictor uses a 4-bit history-based algorithm. If the branch is not in the history table, the static prediction is used.
To avoid WAR and WAW dependencies, the Pentium II supports register renaming using one of 40 internal scratch registers. This is done in the RAT stage.
Finally, the micro-operations are deposited in the ROB three per clock-cycle. The micro-op will be issued when all required resources are ready.

In-Order & Out-of-Order Execution: Pentium II, Pentium 4, UltraSPARC III, Slides of Information and Computer Technology

Related documents

Partial preview of the text

Download In-Order & Out-of-Order Execution: Pentium II, Pentium 4, UltraSPARC III and more Slides Information and Computer Technology in PDF only on Docsity!

In-Order Execution

best performance on superscalar machines.

In-Order Execution

In-Order Execution

which can cause problems (instruction stalls):

In-Order Execution

order to have the property of precise

interrupts.

of-order execution and out-of-order

retirement.

Out-of-Order Execution

technique called register renaming.

Speculative Execution

Speculative Execution

Speculative Execution

Speculative Execution

Speculative Execution

Pentium II Microarchitecure

Pentium II Microarchitecure

The Fetch/Decode Unit

The Fetch/Decode Unit