Understanding Cache Memory and CPU Performance, Exercises of Advanced Computer Architecture

Cache memory, its role in improving cpu performance, and the concept of clock cycles per instruction (cpi). It also discusses the stored program concept, different instruction sets, and the impact of instruction mix on performance. Additionally, it covers the cpu performance equation and the factors affecting cpu performance.

Typology: Exercises

2015/2016

Uploaded on 07/11/2016

Punitha.Suresh
Punitha.Suresh 🇮🇳

1 document

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
UNIT I
FUNDAMENTALS OF COMPUTER DESIGN
Review of Fundamentals of CPU - Memory and IO - Trends in technology, Power,
energy and cost - Dependability - Performance Evaluation.
PART A (2-MARKS)
1. Define computer architecture.
It is defined as the functional operation of the individual hardware unit. In
a computer system and the flow of information among the control of those units.
2. What is meant by cache memory?
A memory that is smaller and faster than main memory and that is
interposed between the CPU and main memory. The cache acts as a buffer for
recently used memory location.
3. What is locality of reference?
Many instructions in localized area of the program are executed
repeatedly during some time period and remainder of the program is accessed
relatively infrequently. This is referred as locality of reference.
4. Specify the three types of DMA transfer techniques?
Single transfer mode (cyclestealing mode)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Understanding Cache Memory and CPU Performance and more Exercises Advanced Computer Architecture in PDF only on Docsity!

UNIT I

FUNDAMENTALS OF COMPUTER DESIGN

Review of Fundamentals of CPU - Memory and IO - Trends in technology , Power, energy and cost - Dependability - Performance Evaluation.

PART A (2-MARKS)

  1. Define computer architecture. It is defined as the functional operation of the individual hardware unit. In a computer system and the flow of information among the control of those units.
  2. What is meant by cache memory? A memory that is smaller and faster than main memory and that is interposed between the CPU and main memory. The cache acts as a buffer for recently used memory location.
  3. What is locality of reference? Many instructions in localized area of the program are executed repeatedly during some time period and remainder of the program is accessed relatively infrequently. This is referred as locality of reference.
  4. Specify the three types of DMA transfer techniques?
    • Single transfer mode (cyclestealing mode)
  • (^) Block transfer mode(brust mode)
  • Demand transfer mode
  • Cascade mode
  1. What is an interrupt?

An interrupt is an event that causes the execution of one program to be suspended and another to be executed.

  1. What are embedded computers? List their characteristics.

Embedded computers are computers that are lodged into other devices where the presence of the computer is not immediately obvious. These devices range from every day machine to hand held digital devices. They had a wide range of processing power and cost.

  1. Define Response Time and Throughput.

Response time is the time between the start and the completion of the event. Also referred to as execution time or latency. Throughput is the totla amount of workdone in a given amount of time.

  1. Mention the use of transaction processing benchmarks.

It measures the ability of the system to handle transactions, which consists of a database accesses and updates. An airline reservation system and bank ATM are the examples of TP system.

  1. State Amdalh’s law.
  1. Define CPI.

The term Clock Cycles Per Instruction which is the average number of clock cycles each instruction takes to execute, is often abbreviated as CPI. CPI = CPU clock cycles / Instruction count.

  1. Define stored program concept.
    • Storing program and their data in the same hi-speed memory.
    • It enables a program to modify its own instructions.
  2. Distinguish between static RAM and dynamic RAM.

Static RAM are fast, but they come at high cost because their cells require several transistors. Less expensive RAM can be implemented if simpler cells are used. However such cells do not retain their state indefinitely. Hence they are called dynamic RAM.

  1. Differentiate between RISC and CISC.

RISC CISC

  1. Reduced instruction set computer 1. Complex instruction set computer
  2. Simple instructions take one cycle per instruction

2.Complex instruction take multiple cycles per operation.

  1. Few instructions and address modes are used.

3.Many instruction and address modes.

  1. Fixed format instructions are used.

4.Variable format instructions are used.

  1. RISC machines are multiple register set.

5.CISC machines are single register set.

PART B (16-MARKS)

  1. Explain about the benchmarks to evaluate the performance equation.
    • Two primary metrics: wall clock time (response time for a program) and throughput (jobs performed in unit time)
    • To optimize throughput, must ensure that there is minimal waste of resources
    • Clock cycles per instruction (CPI)-Average number of clock cycles per instruction for a program or program fragment.
    • Performance is measured with benchmark suites: a collection of programs that are likely relevant to the user SPEC CPU 2006: cpu-oriented programs (for desktops) - SPEC web, TPC: throughput-oriented (for servers) - (^) EEMBC: for embedded processors/workloads - Consider 25 programs from a benchmark set – how do we capture the behavior of all 25 programs with a single number? P1 P2 P Sys-A 10 8 25 Sys-B 12 9 20 Sys-C 8 8 30 Sum of execution times
  • (^) CPI (cycles per instruction) or IPC (instructions per cycle) can not be accurately estimated analytically
  1. State the CPU performance equation and discuss the factors that affect performance. Performance assessment among different computers is quite complicated to purchasers and to designers. Therefore, understanding how to measure performance and the limitations of performance measurements is important in selecting a computer.
  2. Performance Evaluation
  • Computer performance evaluation is primarily based on throughput and response time.
  • The computer user is interested in reducing response time—the time between the start and the completion of an event—also referred to as execution time.
  • The manager of a large data processing center may be interested in increasing throughput—the total amount of work done in a given time.
  • To maximize performance, we want to minimize response time or execution time for some task.
  • The relation between performance and execution time for a computer X:

X Executiontimex

This means that for two computers X and Y, if the performance of X is greater than the Performance of Y, we have Performancex = Performance^ y This means that for two computers X and Y, if the performance of X is greater than the Performance of Y, we have Performancex = Performance^ y Executiontimex= Executiontimey That is, the execution time on Y is longer than that on X, if X is faster than Y Therefore, the number of clock cycles required for a program can be written a CPU performance <=Instructions for a program * Average Clock Cycles per instruction

Table 1: The basic components of performance and its units of measure.

S.No Components of Performance Units of measure 1 CPU execution time for a program Seconds for the program

2 Instruction Count Instructions Executed for the Program 3 Clock cycles per instruction (CPI) Average number of clock cycles per instruction 4 Clock cycle time Seconds per clock cycle

The Classic CPU Performance Equation Relative Performance Example If computer A runs a program in 10 seconds and computer B runs the same program in 15 seconds, how much faster is A than B? We know that A is n times faster than B if

  • (^) We can improve performance by reducing either the length of the clock cycle or the numberof clock cycles required for a program
  • Hardware designers must often trade off clock rate against cycle count
  • Using the Performance Equation
  • Computers A and B implement the same Instruction Set Architecture (ISA).
  • Computer A has a clock cycle time of 250 ps and an effective CPI of 2.0 for some program and computer B has a clock cycle time of 500 ps and an effective CPI of 1.2 for the same program. Which computeris faster and by how much?

Each computer executes the same number of instructions, I, so

CPU timeA = I x 2.0 x 250 ps = 500 x I ps

CPU timeB = I x 1.2 x 500 ps = 600 x I ps

Clearly, A is faster … by the ratio of execution times

PerformanceA ExecutiontimeB 600*/ps

----------------------------- =^ ----------------------------------- =^ -------------------- = 1.

PerformanceB ExecutiontimeA 500*/ps

Improving Performance Example

A program runs on computer A with a 2 GHz clock in 10 seconds. What clock rate must computer B run at to run this program in 6 seconds? Unfortunately, to accomplish this, computer B will require 1.2 times as many clock cycles as computer A to run the program.

CPU TimeA = CPU ClockcylesA


ClockrateA

CPU Clockcyles (^) A = 10sec210 9 cycles/sec = 20*10^9 cycles

CPU Time (^) B = 1.2210^9 cycles


ClockrateB

Clockrate (^) B = 1.2210 9 cycles

--------------------------------- = 4 GH^ z 6 sec

Instruction Count and CPI

Clock Cycles = Instruction Count xCycles per Instruction

Or CPU time = Instruction_count x CPI


Clock rate

These equations separate the three key factors that affect performance

_ Can measure the CPU execution time by running the program

_ The clock rate is usually given

_ Can measure overall instruction count by using profilers/ simulators

_ CPI varies by instruction type and ISA implementation

Performance Summary

Performance depends on

_ Algorithm: affects IC, possibly CPI

_ Programming language: affects IC, CPI

_ Compiler: affects IC, CPI

_ Instruction set architecture: affects IC, CPI, Tc

  1. Explain about the basics of a computer and CPU. Same components for all kinds of computers Desktop, server, embedded _ Input/output includes _ User-interface devices _ Display, keyboard, mouse _

Storage devices _ Hard disk, CD/DVD, flash _ Network adapters _ For communicating with other computers

· A modern computer consisting of hardware and software.

· The hardware has five different types of functional units (figure: 1.2): Memory unit, Data path unit (ALU), Control unit, input unit and output unit.

Input unit ;

A mechanism through which the computer is fed information. E.g. Keyboard, Joystick, Trackball and Mouse

Mouse:

  • (^) The original mouse was electromechanical and used a large ball that when rolled across a surface would cause an x and y counter to be incremented.
  • The amount of increase in each counter told how far the mouse had been moved.
  • Nowadays, the electromechanical mouse is replaced by optical mouse.
  • The replacement of optical mouse, reduce cost and increase reliability.
  • (^) It includes Light emitting diode (LED) to provide lighting, a tiny black and white camera.
  • The LED is underneath the mouse, the camera takes 1500 sample pictures/second.
  • The sample pictures are sent to optical processor that compare the images and determine how far the mouse has moved.
  • (^) The memory inside the computer is volatile storage, such as DRAM, that retainsdata only if it is receiving power. S
  • Secondary memory (Nonvolatile storage) is a form of storage that retains dataeven in the absence of a power source and that is used to store the programsbetween runs.

E.g.Magnetic disks.

Types

  • Volatile Memory:
    • Data lost when the power turns off and that is used to hold data and program while they are running. Eg: DRAM
  • Non-Volatile Memory:
    • It retains data even in the absence of a power source and that is store programs between runs. - Eg: Magnetic disk, Flash memory, Optical Disk
    • Magnetic Disk consist of a collection of platters, which rotate on a spindle at 5400 revolution/min.
    • The metal platter are covered with magnetic recording material on both sides. Optical Disk: Include both Compact Disk(CD) and Digital Video Disk(DVD)
  • Read-Only CD/DVD:
  • (^) Data is recorded in a spiral fashion, with individual bits being recordedby burning small pit.
  • (^) The disk is read by shining a laser at the CD surface and determiningby examining the reflected light whether there is a pit or flat surface.
  • Rewritable CD/DVD: Use different recording surface that as a crystal line, reflective materials, pits are formed that are not reflective.
  • Erase CD/DVD: The surface is heated and cooled slowly, allowing an annealing process to restore the surface recording layer to its crystalline structure. • Output unit A mechanism that conveys the result of a computation to a user, such as a display, or to another computer. E.g.Printer,Floppy Disk, DVD

Monitor:

  • All laptops and desktop computers use Liquid Crystal Display (LCD) to get a thin, low-power display.
  • (^) A tiny transistor switch at each pixel to control current and make sharper images
  • The image is composed of a matrix of picture elements or pixels, which can be represented as a matrix of bits called bitmap.
  • Depending on the size of the screen and the resolution, the display matrix ranges in size from 640480 t0 25601600.
  • A red, green, blue (RGB) associated with each dot on the display determines the intensity of the three color components in the final image.
  • (^) Defect rate determined by manufacturing process
  • Die area determined by architecture and circuit design

Ultra Large-Scale Integrated

  • (^) CPU execution time, and so on.
  • Throughput - Also called bandwidth. It is the number of tasks completed per unit time.
  • To maximize performance, we want to minimize response time or execution time for some task.