CA226: Advanced Computer Architecture - Module Code: ca226, Study notes of Advanced Computer Architecture

Information about the Advanced Computer Architecture module (ca226) offered by the School of Computing at DCU. It includes details on lab schedules, exams, and course content. Topics covered include interrupts, processor speed, 64-bit systems, CISC versus RISC, multi-core systems, and computer performance.

Typology: Study notes

2021/2022

Uploaded on 09/07/2022

adnan_95
adnan_95 🇮🇶

4.3

(39)

918 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CA226 Advanced
Computer Architecture
Stephen Blott <[email protected]>
Table of Contents
CA226 — Advanced
Computer Architecture
2
Preliminaries
Contacting me:
1. before or after lectures, or during labs
2. in my office: L1.11
please put the module code (ca226) in the subject line
CA226 — Advanced
Computer Architecture
3
More Preliminaries
Course web site:
http://ca226.computing.dcu.ie/
use your School of Computing credentials
There’s a link to this site on Moodle [http://moodle.dcu.ie/].
CA226 — Advanced
Computer Architecture
4
Still More Preliminaries
Labs:
begin week five
Lab exams:
weeks eight and twelve
in the regular lab slot (Friday’s at 14:00)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download CA226: Advanced Computer Architecture - Module Code: ca226 and more Study notes Advanced Computer Architecture in PDF only on Docsity!

CA226 — Advanced

Computer Architecture

Stephen Blott

Table of Contents

CA226 — Advanced Computer Architecture

Preliminaries

Contacting me:

  1. before or after lectures, or during labs
  2. in my office: L1.
  3. at [email protected] [mailto:[email protected]] please put the module code (ca226) in the subject line

CA226 — Advanced Computer Architecture

More Preliminaries

Course web site:

  • http://ca226.computing.dcu.ie/ use your School of Computing credentials

There’s a link to this site on Moodle [http://moodle.dcu.ie/].

CA226 — Advanced Computer Architecture

Still More Preliminaries

Labs:

  • begin week five

Lab exams:

  • weeks eight and twelve in the regular lab slot (Friday’s at 14:00)

Computer Architecture

Starters for 10.

  • List the powers of 2?
  • What is 2^{32}?
  • What is 2^{64}?

Computer Architecture

Starters for 10..

  • What is a register?
  • What is a bus?
  • What does USB stand for?
  • What is a frame buffer?
  • What is an interrupt?

CA226 — Advanced Computer Architecture

Starters for 10…

  • What’s special about this IP address: 127.0.0.1?
  • What’s special about this IP address: 192.168.3.3?
  • What’s special about this IP address: 192.168.3.255?
  • Could every person on earth be allocated a unique IP address?
  • Old versions of the Linux ext2 file system had a 2GB limit on file sizes. Why?

CA226 — Advanced Computer Architecture

Observations on Processor Speed

Computer Architecture

CISC versus RISC

Memory constraints influenced early processor designs:

  • with small memories, high code density [http://en.wikipedia.org/wiki/ Instruction_set#Code_density] was necessary
  • this led to the development of processors with complex instruction sets:
    • a single instruction might implement a high-level programming-language operation
    • complex addressing modes
    • e.g. b = a[i] + 1

Computer Architecture

CISC versus RISC

As memory costs reduced:

  • memory size constraints lessened
  • code did not need to be so dense
  • reduced instruction sets became viable
    • a single high-level programming-language operation might be implemented by several instructions

Almost all modern processors implement reduced instruction sets.

CA226 — Advanced Computer Architecture

A simple computer…

Note

Source [http://www-cs-faculty.stanford.edu/~eroberts/courses/soco/projects/risc/ risccisc/].

CA226 — Advanced Computer Architecture

Example — The Problem

The problem:

  • a = a * b;
  • so: multiply memory locations 5:2 and 2:3 (say)

Computer Architecture

Example — CISC Approach

CISC approach:

MULT 5:2 2:

  • a single, complex instruction
  • load both memory locations into registers
  • multiply
  • store the result back in the appropriate memory location say 5:

Just one instruction encodes a commonly-occurring programming operation which, at the hardware level, involves several steps.

Computer Architecture

Example — RISC Approach

RISC approach:

LOAD A, 2: LOAD B, 5: MULT A, B STORE 2:3, A

Four steps are required:

  • so the program memory required is (well, may be) four times larger
  • so this approach was only possible when cheaper/larger memory systems became more widespread

CA226 — Advanced Computer Architecture

RISC

RISC:

  • reduced instruction set computing
  • computations are performed only on register contents
  • the only memory operations are LOAD and STORE
  • few, uniformly-sized instructions

CA226 — Advanced Computer Architecture

RISC Advantages

Both approaches are likely to require roughly the same number of computational steps.

RISC advantages:

  • moves complexity from hardware to software (compilers)
  • smarter compilers make better use of registers
  • fewer transistors:
    • so smaller, can be clocked faster, reduced power consumption, less heat
  • pipelining (and super-scalar processing)

Computer Architecture

Answer?

It depends.

Computer Architecture

Answer?

Usually:

  • we’re interested in how long it takes to get some work done

So:

  • wall-clock time might be a good measure

CA226 — Advanced Computer Architecture

However …

It depends how/why we’re measuring.

Wall-clock time includes:

  • user CPU time
  • system CPU time
  • interrupt handling time
  • I/O time (to/from terminal, disk, network)

CA226 — Advanced Computer Architecture

CPU Architectures

If we’re interested in comparing processors:

  • we may be more interested in the number of clock cycles necessary to complete some task

Computer Architecture

Clock Rate

Clock rate:

  • the number of clock cycles per unit time (usually, per second)
  • say, 2GHz

Computer Architecture

CA226 — Advanced Computer Architecture

CPU Clock Cycles

CPU clock cycles:

  • the number of clock cycles necessary to complete some job

CA226 — Advanced Computer Architecture

Computer Architecture

Alternatively

But that approach:

  • is too dependent on a single job

Computer Architecture

Alternatively

Better:

  • derive a metric which is (somewhat) independent of any particular job
  • let IC be the instruction count the number of instructions needed to complete some job

Say:

  • IC is 2 times 10^8

CA226 — Advanced Computer Architecture

Then …

Then:

  • cycles per instruction (CPI): text{CPI} = text{CPU clock cycles}/text{IC}

Example:

  • text{CPI} = {4 times 10^8} / {2 times 10^8} = 2 so, two cycles per instruction

CA226 — Advanced Computer Architecture

Then again …

Then:

  • CPU time: text{CPU time} = {text{IC} times text{CPI}} / text{clock rate}

Example:

  • text{CPU time} = {2 times 10^8 times 2} / {2 times 10^9} = 0.2s

Computer Architecture

So …

  • text{CPU time} = {text{IC} times text{CPI}} / text{clock rate}

So, to make things go faster (reduce CPU time):

  • reduce the instruction count (IC)
  • reduce the number of cycles per instruction (CPI), or
  • increase the clock rate

Computer Architecture

Improvements in CPI

The Intel 8086 instruction PUSH AX:

  • 8086 — 11 clock cycles
  • 80286 — 3 clock cycles
  • 80386 — 2 clock cycles
  • 80486 — 1 clock cycles

So:

  • it is not just clock speed that has improved over the years
  • in fact: it is now commonplace to see text{CPI} le 1

CA226 — Advanced Computer Architecture

Example

Example:

  • two machines (A and B) implementing the same instruction set architecture
    • A has cycle time of 10ns and CPI of 2.0 (for some prog. P)
    • B has cycle time of 20ns and CPI of 1.2 (for same P)

Which is faster?

CA226 — Advanced Computer Architecture

Aside

Note

The cycle time (in seconds) is just the reciprocal of the clock speed (in Herz) — and vice versa.

Computer Architecture

More Common Metrics

MIPS:

  • text{MIPS} = text{clock rate} / {text{CPI} times 10^6}

MFLOPS:

  • text{MFLOPS} = text{clock rate} / {text{C-per-FPI} times 10^6}

Computer Architecture

MIPS and MFLOPS

These can be poor metrics for comparing different processors:

  • some implement FP division (e.g. Pentium)
  • some don’t (e.g. SPARC)

Instruction counts:

  • they may have different instruction sets (so the ICs will be different)
  • for complex operations like sine and cosine may be quite large
  • so these differences can be significant

CA226 — Advanced Computer Architecture

Improving Performance

Generally:

  • optimise for the common case

CA226 — Advanced Computer Architecture

Improving Performance

However, (particularly) with computer hardware:

  • optimisation is expensive (it requires substantial investment)

So:

  • we need to decide where to invest in optimisation, and
  • we need to know that the payback is going to be worth it

Computer Architecture

Speedup

Consider some possible hardware or software enhancement.

Speedup:

  • text{performance without enhancement} / text{performance with enhancement}

Note

"Performance", here, might be response time (say). With speedup, larger values are better.

Computer Architecture

Speedup — Example

Example:

  • a baseline implementation might execute a job in 3 seconds
  • with some enhancement, that might be reduced to 2 seconds

Speedup:

  • 3/2 = 1.5

CA226 — Advanced Computer Architecture

Important Gotcha!

Typically:

  • only a portion of an entire job will be sped up by any proposed enhancement

Example:

  • sort the contents of a disk file, storing the sorted results back in a new file on disk so: read data in, sort it, write data out
  • an enhanced sorting algorithm can only improve the CPU costs, not the IO costs
  • an enhanced IO subsystem can only improve the IO costs, not the sorting costs

CA226 — Advanced Computer Architecture

Example

Assume:

  • some job involving sub-jobs A and B
  • B accounts for 70% of the execution time, A the rest

Given a proposed enhancement:

  • running B 20 times faster

How much faster would our job run overall?

Computer Architecture

Amdahl’s Law — Example

Overall speedup:

  • 1 / {(1-P) + P/S}
  • 1 / {(1-0.7) + 0.7/20}
  • 1 / {0.3 + 0.035}
  • 2.985 (approximately)

Computer Architecture

Example

Given a proposed enhancement:

  • running B 20 times faster

How much faster would our job run overall?

It will run in about three times faster:

  • this may be less than you intuitively expected.

CA226 — Advanced Computer Architecture

Another Example

Amdahl’s law also allows comparison between two or more design alternatives.

CA226 — Advanced Computer Architecture

Another Example

Example:

  • a program spends:
    • half its time doing floating-point operations
    • including 20% of its time calculating floating-point square roots

Alternative optimisations:

  1. Add floating-point square root hardware which speeds up such operations by a factor of 10.
  2. Make all floating-point operations run twice as fast.

Computer Architecture

Engineering

Assuming we can only choose one:

  • in which of these optimisations should we invest?

Computer Architecture

Engineering — First Case

Optimisations:

  • Add floating-point square root hardware which speeds up such operations by a factor of 10.

Amdahl’s law:

  • text{speedup} = 1 / {0.8 + 0.2 / 10} = 1.22

CA226 — Advanced Computer Architecture

Engineering — Second Case

Optimisations:

  • Make all floating-point operations run twice as fast.

Amdahl’s law:

  • text{speedup} = 1 / {0.5 + 0.5 / 2} = 1.33

So, under these assumptions, the second approach looks like the better investment.

CA226 — Advanced Computer Architecture

Corollary

Amdahl’s law tells us to:

  • make the common case fast!

Or:

  • we can never see a big speedup by optimising the uncommon case