Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

organization-review, Study notes of Advanced Computer Architecture

Osmania University Advanced Computer Architecture

Good material for Advanced Computer Architecutre

Typology: Study notes

2013/2014

Uploaded on 06/03/2014

nagesh 🇮🇳

4.6

(14)

7 documents

1 / 21

This page cannot be seen from the preview

Don't miss anything!

CS 211: Computer Architecture

Instructor: Prof. Bhagi Narahari

Dept. of Computer Science

Course URL: www.seas.gwu.edu/~narahari/cs211/

CS 211: BhagiNarahari,CS, GWU

Summary: Architecture Trends ?

•Moore’s law: density doubles every 18-24

months

¾smaller processors, faster clocks

¾Price drops due to volume and dev. costs what next?

•Interconnect delays could dominate over feature

delay

¾Need for simpler architectures

¾Distributed logic and control

•More functionality

¾communicating processors

¾network of embedded processors

•To extract max performance

¾Thumb rules: Amdahl’s law, Paralle lism, Locality

¾Software and compiler support needed!!!

CS 211: BhagiNarahari,CS, GWU

Next: Review

Computer Organization in an hour!

•Overview of Computer Organization

¾Components

¾Sample processor design process

CS 211: BhagiNarahari,CS, GWU

Review: Computer Organization Basics

•What are the components of a CPU

•What is the microarchitecture level ?

•What is an ISA - Instruction set

architecture ?

•How does a sample processor design

look ?

¾A simple processor architecture

•what is the basic concept of pipelining

Discover Study notes of Advanced Computer Architecture Osmania University

Partial preview of the text

Download organization-review and more Study notes Advanced Computer Architecture in PDF only on Docsity!

CS 211: Computer ArchitectureCS 211: Computer Architecture

Instructor: Prof. Bhagi Narahari

Dept. of Computer Science

Course URL: www.seas.gwu.edu/~narahari/cs211/

CS 211: Bhagi Narahari,CS, GWU

Summary: Architecture Trends?

• Moore’s law: density doubles every 18-

months

¾ smaller processors, faster clocks

¾ Price drops due to volume and dev. costs what next?

• Interconnect delays could dominate over feature

delay

¾ Need for simpler architectures

¾ Distributed logic and control

• More functionality

¾ communicating processors

¾ network of embedded processors

• To extract max performance

¾ Thumb rules: Amdahl’s law, Parallelism, Locality

¾ Software and compiler support needed!!!

CS 211: Bhagi Narahari,CS, GWU

Next: Review

Computer Organization in an hour!

• Overview of Computer Organization

¾ Components

¾ Sample processor design process

CS 211: Bhagi Narahari,CS, GWU

Review: Computer Organization Basics

• What are the components of a CPU

• What is the microarchitecture level?

• What is an ISA - Instruction set

architecture?

• How does a sample processor design

look?

¾ A simple processor architecture

• what is the basic concept of pipelining

CS 211: Bhagi Narahari,CS, GWU

A Computer

The computer is composed of input devices, a central processing

unit, a memory unit and output devices.

Input Device Central Processing Unit

Output Device

Input Device

Memory

Auxiliary Storage Device

CS 211: Bhagi Narahari,CS, GWU

Memory Unit

An ordered sequence of storage cells,

each capable of holding a piece of data.

Volatile Memory

¾ RAM – Random Access Memory

Non-volatile Memory

¾ ROM – Read Only Memory

CS 211: Bhagi Narahari,CS, GWU

Computer System

diskDisk diskDisk

Memory-I/O busMemory-I/O bus

ProcessorProcessor

CacheCache

MemoryMemory

I/O

controller

I/O

controller

I/O

controller

I/O

controller

I/O

controller

I/O

controller

DisplayDisplay NetworkNetwork

interrupts

CS 211: Bhagi Narahari,CS, GWU

Memory Hierarchy: The Tradeoff

CPU CPU

regsregs

C a c h e

MemorMemoryy (^) diskdisk

size: speed: $/Mbyte: block size:

608 B

1.4 ns

4 B

L2-cache reference

memory reference

disk memory reference

512kB -- 4MB 16.8 ns $90/MB 16 B

128 MB

112 ns $2-6/MB 4-8 KB

27GB

9 ms $0.01/MB

larger, slower, cheaper

16 B 8 B 4 KB

cache virtual memory

C a c h e

128k B 4.2 ns

4 B

L1-cache reference

(Numbers are for a 21264 at 700MHz)

CS 211: Bhagi Narahari,CS, GWU

Architecture Models: Von Neumann

architecture

Memory holds data, instructions.
Central processing unit (CPU) fetches

instructions from memory.

¾ Separate CPU and memory distinguishes programmable computer.

CPU registers help out: program counter

(PC), instruction register (IR), general-

purpose registers, etc.

CS 211: Bhagi Narahari,CS, GWU

CPU + memory

memory CPU

address

data

200 ADD r5,r1,r3 IR

200

ADD r5,r1,r

CS 211: Bhagi Narahari,CS, GWU

Harvard architecture

CPU

data memory

program memory

address

data

address

data

CS 211: Bhagi Narahari,CS, GWU

von Neumann vs. Harvard

Harvard can’t use self-modifying code.
Harvard allows two simultaneous

memory fetches.

Most DSPs use Harvard architecture for

streaming data:

¾ greater memory bandwidth;

¾ more predictable bandwidth.

CS 211: Bhagi Narahari,CS, GWU

Instruction Set Architecture

The Instruction Set Architecture (ISA)

describes a set of instructions whose

syntactic and semantic characteristics

are defined by the underlying computer architecture.

CS 211: Bhagi Narahari,CS, GWU

Programming model

Programming model: registers visible to

the programmer.

Some registers are not visible (IR).

CS 211: Bhagi Narahari,CS, GWU

Multiple implementations

Successful architectures have several

implementations:

¾ varying clock speeds;

¾ different bus widths;

¾ different cache sizes;

¾ etc.

CS 211: Bhagi Narahari,CS, GWU

Assembly language

One-to-one with instructions (more or

less).

Basic features:

¾ One instruction per line.

¾ Labels provide names for addresses (usually in first column).

¾ Instructions often start in later columns.

¾ Columns run to end of line.

CS 211: Bhagi Narahari,CS, GWU

Evolution of Instruction Sets

Major advances in computer architecture are typically associated with landmark instruction set designs

¾ Ex: Stack vs GPR (System 360)

Design decisions must take into account:

¾ technology

¾ machine organization

¾ programming languages

¾ compiler technology

¾ operating systems

¾ applications

And they in turn influence these

CS 211: Bhagi Narahari,CS, GWU

CISC vs. RISC

Complex instruction set computer

(CISC):

¾ many addressing modes;

¾ many operations.

Reduced instruction set computer (RISC):

¾ load/store;

¾ pipelined instructions.

CS 211: Bhagi Narahari,CS, GWU

CISC Processors

Instruction decoding is performed with

large microcode ROMs

Some instructions require more than a

single instruction cycle to execute

Many addressing modes supported
Register set was designed to support

specific functions

CS 211: Bhagi Narahari,CS, GWU

RISC Processors

Instruction decoding is performed with

static (hard-wired) logic for a much faster result

Instructions are designed to execute in a

single instruction cycle

Data processing instructions operate

only on registers. Load and store

instructions were designated to access

memory

(in many cases)

CS 211: Bhagi Narahari,CS, GWU

IA - 32

1978: The Intel 8086 is announced (16 bit architecture)
1980: The 8087 floating point coprocessor is added
1982: The 80286 increases address space to 24 bits, +instructions
1985: The 80386 extends to 32 bits, new addressing modes
1989-1995: The 80486, Pentium, Pentium Pro add a few instructions

(mostly designed for higher performance)

1997: 57 new “MMX” instructions are added, Pentium II
1999: The Pentium III added another 70 instructions (SSE)
2001: Another 144 instructions (SSE2)
2003: AMD extends the architecture to increase address space to 64 bits,

widens all registers to 64 bits and other changes (AMD64)

2004: Intel capitulates and embraces AMD64 (calls it EM64T) and adds

more media extensions

“This history illustrates the impact of the “golden handcuffs” of compatibility

-“adding new features as someone might add clothing to a packed bag”

-“an architecture that is difficult to explain and impossible to love”

CS 211: Bhagi Narahari,CS, GWU

IA-32 Overview

Complexity:

¾ Instructions from 1 to 17 bytes long

¾ one operand must act as both a source and destination

¾ one operand can come from memory

¾ complex addressing modes

e.g., “base or scaled index with 8 or 32 bit

displacement”

Saving grace:

¾ the most frequently used instructions are not too

difficult to build

¾ compilers avoid the portions of the architecture that

are slow

“what the 80x86 lacks in style is made up in quantity, making it beautiful from the right perspective”

CS 211: Bhagi Narahari,CS, GWU

Quick look at ISA

Will use MIPS

¾ Simple RISC ISA

¾ Widely used

CS 211: Bhagi Narahari,CS, GWU

Instruction set characteristics

Fixed vs. variable length.
Addressing modes.
Number of operands.
Types of operands.

CS 211: Bhagi Narahari,CS, GWU

The Big Picture: The Performance Perspective

Performance of a machine is determined by:

¾ Instruction count

¾ Clock cycle time

¾ Clock cycles per instruction

Processor design (datapath and control) will determine:

¾ Clock cycle time

¾ Clock cycles per instruction

CPI

Inst. Count Cycle Time

CS 211: Bhagi Narahari,CS, GWU

Microarchitecture Design: How?

Any design must attempt to meet the

requirements

¾ Where do the requirements come from?

¾ Ex: need to represent numbers in binary; integers, text, floating point

How to proceed with design?

CS 211: Bhagi Narahari,CS, GWU

Some History…

The Indiana Legislature once introduced

legislation declaring that the value of π

was exactly 3.

CS 211: Bhagi Narahari,CS, GWU

How to Design a Processor: step-by-step

1. Analyze instruction set => datapath requirements

¾ the meaning of each instruction is given by the register transfers

¾ datapath must include storage element for ISA registers

¾ possibly more

¾ datapath must support each register transfer

1. Select set of datapath components and establish clocking methodology
1. Assemble datapath meeting the requirements
1. Analyze implementation of each instruction to determine setting of control points that effects the register transfer.
1. Assemble the control logic
Let’s look at a single cycle ISA…

CS 211: Bhagi Narahari,CS, GWU

The MIPS Instruction Formats

All MIPS instructions are 32 bits long. The three instruction formats :

¾ R-type

¾ I-type

¾ J-type

The different fields are :

¾ op: operation of the instruction

¾ rs, rt, rd: the source and destination register specifiers

¾ shamt: shift amount

¾ funct: selects the variant of the operation in the “op” field

¾ address / immediate: address offset or immediate value

¾ target address: target address of the jump instruction

op target address

6 bits 26 bits

op rs rt rd shamt funct

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

op rs rt immediate

6 bits 5 bits 5 bits 16 bits

CS 211: Bhagi Narahari,CS, GWU

Step 1a: The MIPS-Inst Set (eg.)

ADD and SUB

¾ addU rd, rs, rt

¾ subU rd, rs, rt

OR Immediate:

¾ ori rt, rs, imm

LOAD and STORE Word

¾ lw rt, rs, imm

¾ sw rt, rs, imm

BRANCH :

¾ beq rs, rt, imm

op rs rt rd shamt funct

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

op rs rt immediate

6 bits 5 bits 5 bits 16 bits

op rs rt immediate

6 bits 5 bits 5 bits 16 bits

op rs rt immediate

6 bits 5 bits 5 bits 16 bits

Register rs and rt are the source registers.
If the instruction has three operand register, then rd is the destination register
If the instruction has two operand register, then rt is the destination register

CS 211: Bhagi Narahari,CS, GWU

Logical Register Transfers

RTL gives the meaning of the instructions
All start by fetching the instruction

op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 = MEM[ PC ]

inst Register Transfers

ADDU R[rd] <– R[rs] + R[rt]; PC <– PC + 4 SUBU R[rd] <– R[rs] – R[rt]; PC <– PC + 4 ORi R[rt] <– R[rs] | zero_ext(Imm16); PC <– PC + 4

LOAD R[rt] <– MEM[ R[rs] + sign_ext(Imm16)]; PC <– PC + 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt]; PC <– PC + 4

BEQ if ( R[rs] == R[rt] ) then PC <– PC + 4 + sign_ext(Imm16)] || 00 else PC <– PC + 4

CS 211: Bhagi Narahari,CS, GWU

Step 2: Components of the Datapath

Combinational Elements
Storage Elements

¾ Clocking methodology

CS 211: Bhagi Narahari,CS, GWU

Clocking Methodology

Clocks needed in sequential logic to decide when an element

that contains state should be updated.

A clock is a free-running circuit with a fixed cycle time or clock

period. The clock frequency is the inverse of the cycle time.

The clock cycle time or clock period is divided into two

portions: when the clock is high and when the clock is low.

Edge-triggered clocking: all state changes occur on a clock

edge.

Clk

Don’t Care

Setup Hold Setup Hold

Clock Period

Rising Edge Falling Edge

CS 211: Bhagi Narahari,CS, GWU

Step 3: Assemble DataPath meeting our requirements

Register Transfer Requirements ⇒ Datapath Assembly
Instruction Fetch
Read Operands and Execute Operation

The common RTL operations for all instructions are:

(a) Fetch the instruction using the Program Counter (PC) at the beginning of an

instruction’s execution (PC -> Instruction Memory -> Instruction Word).

(b) Then at the end of the instruction’s execution, you need to update the

Program Counter (PC -> Next Address Logic -> PC).

More specifically, you need to increment the PC by 4 if you are executing sequential code.

For Branch and Jump instructions, you need to update the program counter to “something

else” other than plus 4.

The Next Address Logic block:

Add 4 (number of bytes in an instruction) or
Branch and Jump instructions

CS 211: Bhagi Narahari,CS, GWU

3a: Overview of the Instruction Fetch Unit

The common RTL operations

¾ Fetch the Instruction: mem[PC]

¾ Update the program counter:

¾ Sequential Code: PC <- PC + 4

¾ Branch and Jump: PC <- “something else”

Instruction Word

Address

Instruction Memory

Clk PC

Next Address Logic

CS 211: Bhagi Narahari,CS, GWU

3b: Add & Subtract

R[rd] <- R[rs] op R[rt] Example: addU rd, rs, rt

¾ Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields

¾ ALUctr and RegWr: control logic after decoding the

instruction

Result

ALUctr

Clk

busW

RegWr

busA

busB

Rw Ra Rb

32 32-bit Registers

Rd Rs Rt

ALU

op rs rt rd shamt funct

6 bits 5 bits 5 bits 5 bits 5 bits 6 bits

CS 211: Bhagi Narahari,CS, GWU

Putting it All Together: A Single Cycle Datapath

imm

ALUctr

Clk

busW

RegWr

busA

busB

Rw Ra Rb 32 32-bit Registers

RegDst

Extender

Mux

imm

ExtOpALUSrc

Mux

MemtoReg

Clk

Data In

(^32) WrEnAdr

Data Memory

MemWr

ALU

Equal

Instruction<31:0>

Rs Rt Rd Imm

Adder

PC

Clk

Mux

nPC_sel

PC Ext

Adr

Inst Memory

CS 211: Bhagi Narahari,CS, GWU

An Abstract View of the Critical Path

¾ The CLK input is a factor ONLY during write operation

¾ During read operation, behave as combinational logic:

¾ Address valid => Output valid after “access time.” Critical Path (Load Operation) = PC’s Clk-to-Q + Instruction Memory’s Access Time + Register File’s Access Time + ALU to Perform a 32-bit Add + Data Memory Access Time + Setup Time for Register File Write + Clock Skew

Clk

Rw Ra Rb 32 32-bit Registers

ALU

Clk

Data In

Data Address Ideal Data Memory

Instruction

Instruction Address

Ideal Instruction Memory

Clk

PC

Rs 5

Rt 16

Imm

A

B

Next Address

CS 211: Bhagi Narahari,CS, GWU

An Abstract View of the Implementation

Data Out

Clk

Rw Ra Rb 32 32-bit Registers

ALU

Clk

Data In

Data Address Ideal Data Memory

Instruction

Instruction Address

Ideal Instruction Memory

Clk

PC

Rs 5

A

B

Next Address

Control

Datapath

Control Signals (^) Conditions

CS 211: Bhagi Narahari,CS, GWU

Step 4: Given Datapath: RTL -> Control

RegDst ExtOp ALUSrc^ ALUctrMemWr^ MemtoReg Equal

Instruction<31:0> <21:25><16:20><11:15><0:15>

Rt RsRd Imm

nPC_sel

Adr

Inst Memory

DATA PATH

Control

Fun

RegWr

CS 211: Bhagi Narahari,CS, GWU

Summary

5 steps to design a processor

¾ 1. Analyze instruction set => datapath requirements

¾ 2. Select set of datapath components & establish clock

methodology

¾ 3. Assemble datapath meeting the requirements

¾ 4. Analyze implementation of each instruction to determine

setting of control points that effects the register transfer.

¾ 5. Assemble the control logic

MIPS makes it easier

¾ Instructions same size

¾ Source registers always in same place

¾ Immediates same size, location

¾ Operations always on registers/immediates

Single cycle datapath => CPI=1, CCT => long CS 211: Bhagi Narahari,CS, GWU

Systematic Generation of Control

In a single-cycle processor, each instruction is realized by exactly one control command or “ microinstruction”

¾ in general, the controller is a finite state machine

¾ microinstruction can also control sequencing (see later)

Control Logic / Store

(PLA, ROM)

OPcode

Datapath

Instruction

Decode

Conditions

Control

Points

microinstruction

CS 211: Bhagi Narahari,CS, GWU

What’s wrong with our CPI=1 processor?

Long Cycle Time
All instructions take as much time as the slowest
Real memory is not as nice as our idealized memory

¾ cannot always get the job done in one (short) cycle

PC Inst Memory mux ALU Data Mem mux

PC Inst Memory Reg File mux ALU mux

PC Inst Memory mux ALU Data Mem

PC Inst Memory cmp mux

Reg File

Arithmetic & Logical

Load

Store

Branch

Critical Path

setup

CS 211: Bhagi Narahari,CS, GWU

Partitioning the CPI=1 Datapath

Add registers between smallest steps

PC

Next PC Operand

Fetch

Exec

Reg.File

Mem

Access

DataMem

Instruction

Fetch

Result Store

ALUctr

RegDst ExtOpALUSrc nPC_sel MemRdMemWr RegWrMemWr

Equal

CS 211: Bhagi Narahari,CS, GWU

Example Multicycle Datapath

Critical Path?

PC

Next PC

Operand

Fetch

Instruction

Fetch

nPC_sel

IR

Reg

File ExtALU

Reg.File

Mem

Access

DataMem

Result Store

RegDstRegWr MemRdMemWr

S

M

MemToReg

Equal

ExtOpALUSrc^ ALUctr

A

B

E

CS 211: Bhagi Narahari,CS, GWU

Controller Design

The state digrams that arise define the controller for an instruction

set processor are highly structured

Use this structure to construct a simple “microsequencer”
Control reduces to programming this very simple device

⇒ microprogramming

sequencer

control

datapath control

micro-PC

sequencer

microinstruction

CS 211: Bhagi Narahari,CS, GWU

Microprogramming

Microprogramming is a convenient method for implementing structured control state diagrams:

¾ Random logic replaced by microPC sequencer and ROM

¾ Each line of ROM called a μinstruction:

contains sequencer control + values for control points

¾ limited state transitions:

branch to zero, next sequential,

branch to μinstruction address from displatch ROM

Horizontal μCode: one control bit in μInstruction for every control line in datapath
Vertical μCode: groups of control-lines coded together in μInstruction (e.g. possible ALU dest)
Control design reduces to Microprogramming

¾ Part of the design process is to develop a “language”

that describes control and is easy for humans to

understand

CS 211: Bhagi Narahari,CS, GWU

Microprogramming

Microprogramming is a fundamental concept

¾ implement an instruction set by building a very simple

processor and interpreting the instructions

¾ essential for very complex instructions and when few

register transfers are possible

¾ overkill when ISA matches datapath 1-

sequencer

control

datapath control

micro-PC

μ-sequencer:

fetch,dispatch,

sequential

microinstruction (μ)

Dispatch

ROM

Opcode

μ-Code ROM

DecodeDecode

To DataPath

Decoders implement our μ- code language:

For instance: rt-ALU rd-ALU mem-ALU

CS 211: Bhagi Narahari,CS, GWU

Sequential Laundry

Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

T a s k O r d e r

Time

CS 211: Bhagi Narahari,CS, GWU

Pipelined Laundry

Start work ASAP

Pipelined laundry takes 3.5 hours for 4 loads

6 PM 7 8 9 10 11 Midnight

T a s k O r d e r

Time

30 40 40 40 40 20

CS 211: Bhagi Narahari,CS, GWU

Pipelining Lessons

Pipelining doesn’t help

latency of single task, it

helps throughput of

entire workload

Pipeline rate limited by

slowest pipeline stage

Multiple tasks operating

simultaneously

Potential speedup =

Number pipe stages

Unbalanced lengths of

pipe stages reduces

speedup

Time to “fill” pipeline

and time to “drain” it

reduces speedup

6 PM 7 8 9

T a s k O r d e r

Time

30 40 40 40 40 20

CS 211: Bhagi Narahari,CS, GWU

Instruction Pipeline

Instruction execution process lends itself

naturally to pipelining

¾ overlap the subtasks of instruction fetch, decode and execute

CS 211: Bhagi Narahari,CS, GWU

How to improve performance?

Recall performance is function of

¾ CPI: cycles per instruction

¾ Clock cycle

¾ Instruction count

Reducing any of the 3 factors will lead to

improved performance

CS 211: Bhagi Narahari,CS, GWU

How to improve performance?

First step is to apply concept of

pipelining to the instruction execution

process

¾ Overlap computations

What does this do?

¾ Decrease clock cycle

¾ Decrease effective CPU time compared to original clock cycle

CS 211: Bhagi Narahari,CS, GWU

Pipeline Approach to Improve System

Performance

Analogous to fluid flow in pipelines and

assembly line in factories

Divide process into “stages” and send

tasks into a pipeline

¾ Overlap computations of different tasks by operating on them concurrently in different stages

CS 211: Bhagi Narahari,CS, GWU

Instruction Level Parallel Processors

(ILP)

early ILP - one of two orthogonal concepts:

¾ pipelining - vertical approach

¾ multiple (non-pipelined) units - horizontal approach

progression to multiple pipelined units
instruction issue became bottleneck, led to

¾ superscalar ILP processors

¾ Very Large Instruction Word (VLIW)

Note: key performance metric in all ILP processor classes is IPC (instructions per cycle)

¾ this is the degree of parallelism achieved

organization-review, Study notes of Advanced Computer Architecture

Related documents

Partial preview of the text

Download organization-review and more Study notes Advanced Computer Architecture in PDF only on Docsity!

CS 211: Computer ArchitectureCS 211: Computer Architecture

Instructor: Prof. Bhagi Narahari

Dept. of Computer Science

Course URL: www.seas.gwu.edu/~narahari/cs211/

Summary: Architecture Trends?

• Moore’s law: density doubles every 18-

months

¾ smaller processors, faster clocks

¾ Price drops due to volume and dev. costs what next?

• Interconnect delays could dominate over feature

delay

¾ Need for simpler architectures

¾ Distributed logic and control

• More functionality

¾ communicating processors

¾ network of embedded processors

• To extract max performance

¾ Thumb rules: Amdahl’s law, Parallelism, Locality

¾ Software and compiler support needed!!!

Next: Review

Computer Organization in an hour!

• Overview of Computer Organization

¾ Components

¾ Sample processor design process

Review: Computer Organization Basics

• What are the components of a CPU

• What is the microarchitecture level?

• What is an ISA - Instruction set

architecture?

• How does a sample processor design

look?

¾ A simple processor architecture

• what is the basic concept of pipelining

The computer is composed of input devices, a central processing

unit, a memory unit and output devices.

diskDisk diskDisk

Memory-I/O busMemory-I/O bus

ProcessorProcessor

CacheCache

MemoryMemory

I/O

controller

I/O

controller

I/O

controller

I/O

controller

I/O

controller

I/O

controller

DisplayDisplay NetworkNetwork

CPU CPU

608 B

128 MB

27GB

larger, slower, cheaper

16 B 8 B 4 KB

cache virtual memory

(Numbers are for a 21264 at 700MHz)

¾ Ex: Stack vs GPR (System 360)

¾ technology

¾ machine organization

¾ programming languages

¾ compiler technology

¾ operating systems

¾ applications

(mostly designed for higher performance)

widens all registers to 64 bits and other changes (AMD64)

more media extensions

-“adding new features as someone might add clothing to a packed bag”

-“an architecture that is difficult to explain and impossible to love”

¾ Instructions from 1 to 17 bytes long

¾ one operand must act as both a source and destination

¾ one operand can come from memory