Reality Check - High Performance Computing - Lecture Slides, Slides of Computer Science

Some concept of High Performance Computing are Addressing Modes, Program Execution, Basic Computer Organization, Control Hazard Solutions, Least Recently Used, Memory Hierarchy Progression. Main points of this lecture are: Reality Check, Real Caches, Physical Addresses, Virtual Addresses, Pipelining, Caches, Address Translation, Physical Addressed Cache, Virtual Addressed Cache, Physical Addressed Cache

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan
dewaan 🇮🇳

3.8

(4)

43 documents

1 / 22

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
High Performance Computing
Lecture 32
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16

Partial preview of the text

Download Reality Check - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 32

2

Reality Check

 Question 1: Are real caches built to work on

virtual addresses or physical addresses?

 Question 2: Do modern processors use

pipelining of the kind that we studied?

4

Which is less preferable?

 Physical addressed cache

 Hit time higher (cache access after translation)

 Virtual addressed cache

 Data/instruction of different processes with same virtual

address in cache at the same time …

 Flush cache on context switch, or

 Include Process id as part of each cache directory entry

 Synonyms

 Virtual addresses that translate to same physical address

 More than one copy of a block in cache …

5

Another possibility: Overlapped operation

MMU

Cache

Virtual

Address

Physical

Address

Indexing into

cache directory

using virtual

address

Tag comparison

using physical

address

Virtual indexed physical tagged cache

7

Cache Tag

16 bits

Physical Page No

18 bits

Physical Indexed Physical Tagged Cache

Virtual Address

Virtual Page No

18 bits

Page Offset

14 bits

C

offset

MMU

Physical Address

Physical Cache

5

16KB page size

64KB direct mapped

cache with 32B block

size

C-Index

11 bits

Page Offset

14 bits

8

Virtual Index Virtual Tagged Cache

Virtual Address

Physical Address

VPN

18 bits

Block

offset

C-Index

11 bits

MMU

5

Page Offset

14 bits

PPN

18 bits

Hit/Miss

10

Reality Check

 Question 1: Are real caches built to work on

virtual addresses or physical addresses?

 Question 2: Do modern processors use

pipelining of the kind that we studied?

11

Q2: High Performance Pipelined Processors

 Pipelining

 Overlaps execution of consecutive instructions

 Performance of processor improves

 Current processors use more aggressive

techniques for more performance

 Some exploit Instruction Level Parallelism -

often, many consecutive instructions are

independent of each other and can be

executed in parallel (at the same time)

13

Instruction Level Parallelism Processors

 Challenge: identifying which instructions are

independent

 Approach 1: build processor hardware to

analyze and keep track of dependences

 Approach 2: compiler does analysis and

packs suitable instructions together for

parallel execution by processor

 VLIW (very long instruction word) processors

14

Agenda

  1. Program execution: Compilation, Object files, Function call

and return, Address space, Data & its representation (4)

  1. Computer organization: Memory, Registers, Instruction set

architecture, Instruction processing (6)

  1. Virtual memory: Address translation, Paging (4)
  2. Operating system: Processes, System calls,

Process management (6)

  1. Pipelined processors: Structural, data and control hazards,

impact on programming (4)

  1. Cache memory: Organization, impact on programming (5)
  2. Program profiling (2)
  3. File systems: Disk management, Name management,

Protection (4)

  1. Parallel programming: Inter-process communication,

Synchronization, Mutual exclusion, Parallel architecture,

Programming with message passing using MPI (5)

16

Timing

Timing: measuring the time spent in specific

parts of your program

  • Examples of `parts’: Functions, loops, …
  • Recall: Different kinds of time that can be

measured (real/wallclock/elapsed vs

virtual/CPU)

1. Decide

  • which time you are interested in measuring
  • at what granularity

2. Find out what mechanisms are available

and their granularity of measurement

17

time command

Usage: % time a.out

Example: % time ls

0.00user 0.002sys 0:0.003elapsed

Example: % time man csh

0.268user 0.032sys 0:15.486elapsed

Reports Real/Elapsed/Wallclock

time, CPU time in user mode,

CPU time in system mode

19

Using gettimeofday( )

struct timeval before, after;

gettimeofday(&before);

/ region of program you want to time

gettimeofday(&after);

printf (“%d\n”, after.tv_sec – before.tv_sec);

Your C program

20

High resolution, real timers

 Most modern processors provide a

hardware cycle counting mechanism

1. A special purpose register that is

incremented every clock cycle

2. An instruction to read the value in that

register

 Example: Intel

®

time stamp counter

and rdtsc instruction