Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Overlapped - High Performance Computing - Lecture Slides, Slides of Computer Science

Biju Patnaik University of Technology Computer Science

Some concept of High Performance Computing are Addressing Modes, Program Execution, Basic Computer Organization, Control Hazard Solutions, Least Recently Used, Memory Hierarchy Progression. Main points of this lecture are: Overlapped, Activity, Increment, Instruction, Effective Address Calculation, Instruction Decode, Memory Are Loads, Dominated, Mechanism, Cache Memory

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan 🇮🇳

3.8

(4)

43 documents

1 / 21

This page cannot be seen from the preview

Don't miss anything!

High Performance Computing

Lecture 10

Docsity.com

Discover Slides of Computer Science Biju Patnaik University of Technology

Partial preview of the text

Download Overlapped - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 10

We will assume that …

1. Activity is overlapped in time where possible

PC increment and instruction fetch from memory?
Instruction decode and effective address calculation

2. Load-store ISA: the only instructions that take

operands from memory are loads & stores

3. Main memory delays are not typically seen by the

processor

Otherwise the timeline is dominated by them
There is some hardware mechanism through which most memory access requests can be satisfied at processor speeds (cache memory)

Steps in Instruction Execution

 Fetch instruction from memory to processor

IR = Memory[PC]
Increment PC

 Decode instruction and get its operands

Decode
Get Operands from registers

 Execute the operation

Trigger appropriate functional hardware
Load/store: get data from main memory

 Write back the result

Write result to destination register cache memory access (simple) ALU operation (simple) logic circuitry register access cache access register access ALU operation

 Unit of timescale of processor; time required to do a

basic operation

 Cache memory access  Register access + some logic (like decode)  ALU operation

Term: Processor Cycle Time

Steps in Execution: JR R

Fetch instruction from memory to processor

IR = Memory[PC]; Increment PC

Decode instruction and get its operands

Decode; Get Operands from registers

Execute the operation

Trigger appropriate functional hardware
Load/store: get data from main memory

Write back the result

Write result to destination register 1 cycle 1 cycle 1 cycle 1 cycle 1 cycle

Steps in Execution: ADD R1, R2, R

Fetch instruction from memory to processor

IR = Memory[PC]; Increment PC

Decode instruction and get its operands

Decode; Get Operands from registers

Execute the operation

Trigger appropriate functional hardware
Load/store: get data from main memory

Write back the result

Write result to destination register 1 cycle 1 cycle 1 cycle 1 cycle 1 cycle

 Unit of timescale of processor; time required to do a

basic operation

 Cache memory access  Register access + some logic (like decode)  ALU operation

 A MIPS 1 instruction can be processed in 3-5 cycles

 Jump: IFetch, Decode/OpFetch, DoOp (3)  ALU: IFetch, Decode/OpFetch, DoOp, WriteReg (4)  Load: IFetch, Decode, EffAddr, Cache, WriteReg (5)

 Addressing modes: (R) vs d(R)

Term: Processor Cycle Time

Instruction Execution

Mem IR

PC NPC

Instruction Fetch (IF)

from program memory to instruction register IR Mem [PC] Increment PC Instr Fetch

Instruction Execution..

Execution (EX)

Arithmetic Inst:

ALU-Out A op B ALU-Out A op Imm

Load/Store Inst:

ALU-Out A + Imm

Branch Inst:

ALU-Out NPC + Imm

Jump Inst:

PC NPC 31 - 28 || IR 25 - 0 || Imm NPC ALU- out ALU Zero? B A Cond. Execution

Instruction Execution…

Memory (MEM)

Execution Memory Imm NPC ALU out

ALU

Zero? Mem LMD B A Cond Store Instr Mem[ALUOut] B Load Instr LMD Mem[ALUout]

Inside the Processor

Mem IR

PC NPC Reg File sign extend A Imm B Inst Fetch IF Inst Decode ID 4 ALU ALU out Zero? Mem LMD Execution EX Memory MEM Cond WB

Reality Check

 Problem: There could be many programs

running on a machine concurrently

 Sharing the resources of the computer

 Processor time

 Main memory

 They must be protected from each other

 One program should not be able to access the

variables of another

 This is typically done through Address Translation

Idea of Address Translation

 Each program is compiled to use addresses in the

range 0 .. MaxAddress (e.g., 0 .. 2

 These addresses are not real, but only Virtual

Addresses

 They have to be translated into actual main memory

addresses

 The translation can be done to ensure that one

program can not access variables of another program

 Many programs in execution can then safely share

main memory

 Terminology: virtual address, physical address

 Memory Management Unit (MMU): The hardware that

does the address translation

Recall: Basic Computer Organization

Cache Memory I/O Bus I/O I/O MMU ALU Registers

CPU

Control

Overlapped - High Performance Computing - Lecture Slides, Slides of Computer Science

Related documents

Partial preview of the text

Download Overlapped - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 10

We will assume that …

1. Activity is overlapped in time where possible

2. Load-store ISA: the only instructions that take

operands from memory are loads & stores

3. Main memory delays are not typically seen by the

processor

Steps in Instruction Execution

 Fetch instruction from memory to processor

 Decode instruction and get its operands

 Execute the operation

 Write back the result

 Unit of timescale of processor; time required to do a

basic operation

Term: Processor Cycle Time

Steps in Execution: JR R

Fetch instruction from memory to processor

Decode instruction and get its operands

Execute the operation

Write back the result

Steps in Execution: ADD R1, R2, R

Fetch instruction from memory to processor

Decode instruction and get its operands

Execute the operation

Write back the result

 Unit of timescale of processor; time required to do a

basic operation

 A MIPS 1 instruction can be processed in 3-5 cycles

 Addressing modes: (R) vs d(R)

Term: Processor Cycle Time

Instruction Execution

Instruction Fetch (IF)

Instruction Execution..

Execution (EX)

Arithmetic Inst:

Load/Store Inst:

Branch Inst:

Jump Inst:

Instruction Execution…

Memory (MEM)

ALU

Inside the Processor

Reality Check

 Problem: There could be many programs

running on a machine concurrently

 Sharing the resources of the computer

 Processor time

 Main memory

 They must be protected from each other

 One program should not be able to access the

variables of another

 This is typically done through Address Translation

Idea of Address Translation

 Each program is compiled to use addresses in the

range 0 .. MaxAddress (e.g., 0 .. 2

 These addresses are not real, but only Virtual

Addresses

 They have to be translated into actual main memory

addresses

 The translation can be done to ensure that one

program can not access variables of another program

 Many programs in execution can then safely share

main memory

 Terminology: virtual address, physical address

 Memory Management Unit (MMU): The hardware that

does the address translation

Recall: Basic Computer Organization

CPU