Module 3-6: Computer Architecture - Datapath, Pipelining, Parallelism, Memory Hierarchy, Slides of Advanced Computer Architecture

An in-depth exploration of various concepts in computer architecture, including datapath implementation, pipelining, instruction level parallelism, and memory hierarchy system. It covers topics such as single/multiple cycle approaches, pipeline hazards, dynamic scheduling, tomasulo's approach, and static scheduling. Students will gain a solid understanding of these fundamental concepts and their role in enhancing processor performance.

Typology: Slides

2011/2012

Uploaded on 08/06/2012

amrusha
amrusha 🇮🇳

4.5

(33)

147 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Module 3: Datapath Implementation
It consists of registers, internal buses,
arithmetic units and shifters
Each register in the register file has:
- a load control line that enables data load to
register
- a set of tri-state buffers between its output
and the bus
- a read control line that enables its buffer
and place the register on the bus
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Module 3-6: Computer Architecture - Datapath, Pipelining, Parallelism, Memory Hierarchy and more Slides Advanced Computer Architecture in PDF only on Docsity!

Module 3: Datapath Implementation

It consists of registers, internal buses, arithmetic units and shifters Each register in the register file has:

  • a load control line that enables data load to register
  • a set of tri-state buffers between its output and the bus
  • a read control line that enables its buffer and place the register on the bus

Module 3: Single/Multiple Cycle Approach

In the Single Cycle implementation, the cycle time is set to accommodate the longest instruction, the Load instruction. In the Multiple Cycles implementation, the cycle time is set to accomplish longest step, the memory read/write Consequently, the cycle time for the Single Cycle implementation can be five times longer than the multiple cycle implementation.

Module 3: Pipeline Hazards

Structural hazards occur when same

resource is accessed by more than one

instructions; e.g.,

One memory port or one register write port

It can be removed by using either multiple

resources or inserting stall

Stall degrades the pipeline performance

Module 3: Pipeline Hazards

Data Hazards occur when attempt is made to read invalid data Data hazard can be removed by using stall and forwarding techniques Control hazards occur when an attempt is made to branch prior to the evaluation of the condition Four ways to handle control hazards

Module 4: Instruction Level parallelism Simple pipeline facilitates in-order execution Whereas, in order to enhance the performance of the pipeline, we want to begin execution as soon as the data operands are available, i.e., out-of-order execution Out-of-order execution may introduce data hazards of type WAR and WAW Instruction Level Parallelism can be achieved by Hardware or Software

Module 4: Instruction Level Parallelism In SW parallelism, the dependencies are defined by program result in hazards if HW cannot resolve HW exploiting ILP works when dependence cannot be determined at run time These hardware techniques to exploit ILP are referred to as Dynamic Scheduling techniques

Recap: ILP- Dynamic Scheduling

We discussed the score-boarding and

Tomasulo’s algorithm as the basic

concepts for dynamic scheduling in integer

and floating-point datapath

The structures implementing these

concepts facilitate out-of-order execution

to minimize data dependencies thus avoid

data hazards without stalls

Module 4: Instruction Level Parallelism Tomasulo's Approach for IBM 360/91 to achieve high Performance without special compilers Here, the control and buffers are distributed with Function Units (FU) Registers in instructions are replaced by values or pointers to reservation stations(RS) ; i.e., the registers are renamed Unlike Scoreboard, Tomasulo can have multiple loads outstanding

Module 4: Instruction Level Parallelism

We studied extensions to the Tomasulo’s

structure by including hardware-based

speculation

It allows to speculate that branch is

correctly predicted, thus may execute out-

of-order

but commit in-order having confirmed that

the speculation is correct and no

exceptions exist

Module 4: Instruction Level Parallelism The major hardware-based techniques studied are summarized here: Technique Hazards type stalls Reduced

  • Forwarding and Potential Data Hazard Stalls bypass
  • Delayed Branching Control Hazard Stalls and Branch Scheduling
  • Basic Dynamic Data Hazard Stalls from Scheduling (score boarding) true dependences

Module 5: Static Approach for ILP

The multiple-instruction-issues per cycle

processors are rated as the high-

performance processors

These processors exist in a variety of

flavors, such as:

  • Superscalar Processors
  • VLIW processors
  • Vector Processors

Module 5: Static Approach for ILP

The superscalar processors exploit ILP

using static as well as dynamic scheduling

approaches

The VLIW processors, on the other hand,

exploits ILP using static scheduling only

The major software scheduling techniques,

under discussion, to reduce the data and

control stalls, are as follows:

Module 6: Memory Hierarchy System Here, we discussed how the gap between the speed of processor and the storage devices - DRAM, SRAM and Disk is increasing with time We studied that in order to obtain high speed storage at the cheapest cost per byte, different types of memory modules are organize in hierarchy, based on the:

Concept of Caching and

Principle of Locality

Module 6: Memory Hierarchy System The principle of locality states that to obtain data or instructions of a program, the processor access, at any instant of time, a relatively small portion of the address space of the fastest memory closet to the processor There are two different types of locality: Temporal locality is the locality in time Spatial locality is the locality in space