Revolution in Computer Architecture: Old Conventional Wisdom vs. New Conventional Wisdom, Slides of Electronics engineering

The shift in conventional wisdom in computer architecture, moving from the belief that chips are reliable internally and power is free, to acknowledging the high error rates and power expenses in sub-65nm technologies. It also explores the challenges of building believable prototypes and the shift towards compiler optimizations and architecture innovation taking over a decade to be adopted. The document also touches upon the sea change in chip design towards multiple cores or processors per chip.

Typology: Slides

2012/2013

Uploaded on 03/23/2013

dhrupad
dhrupad 🇮🇳

4.4

(17)

213 documents

1 / 45

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
High Level Message
Everything is changing; Old conventional wisdom
is out
We DESPERATELY need a new architectural
solution for microprocessors based on parallelism
Need to create a “watering hole” to bring
everyone together to quickly find that solution
architects, language designers, application experts,
numerical analysts, algorithm designers,
programmers, …
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d

Partial preview of the text

Download Revolution in Computer Architecture: Old Conventional Wisdom vs. New Conventional Wisdom and more Slides Electronics engineering in PDF only on Docsity!

High Level Message

  • Everything is changing; Old conventional wisdom

is out

  • We DESPERATELY need a new architectural

solution for microprocessors based on parallelism

  • Need to create a “watering hole” to bring

everyone together to quickly find that solution

  • architects, language designers, application experts, numerical analysts, algorithm designers, programmers, …

Docsity.com

Outline

• Part I: A New Agenda for Computer

Architecture

  • Old Conventional Wisdom vs. New Conventional

Wisdom

• Part II: A “Watering Hole” for Parallel Systems

  • Research Accelerator for Multiple Processors

• Conclusion

Docsity.com

Conventional Wisdom (CW) in Computer Architecture

  • Old CW: Power is free, Transistors expensive
  • New CW: “Power wall” Power expensive, Xtors free

(Can put more on chip than can afford to turn on)

  • Old: Multiplies are slow, Memory access is fast
  • New: “Memory wall” Memory slow, multiplies fast

(200 clocks to DRAM memory, 4 clocks for FP multiply)

  • Old : Increasing Instruction Level Parallelism via compilers,

innovation (Out-of-order, speculation, VLIW, …)

  • New CW: “ILP wall” diminishing returns on more ILP
  • New: Power Wall + Memory Wall + ILP Wall = Brick Wall
    • Old CW: Uniprocessor performance 2X / 1.5 yrs
    • New CW: Uniprocessor performance only 2X / 5 yrs? Docsity.com

Uniprocessor Performance (SPECint)

1

10

100

1000

10000

1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006

Performance (vs. VAX-11/780) 25%/year

52%/year

??%/year

  • VAX : 25%/year 1978 to 1986
  • RISC + x86: 52%/year 1986 to 2002
  • RISC + x86: ??%/year 2002 to present

From Hennessy and Patterson, Computer Architecture: A Quantitative Approach , 4th edition, 2006

Sea change in chip design: multiple “cores” or processors per chip

3X

Docsity.com

Déjà vu all over again? “… today’s processors … are nearing an impasse as technologies approach the speed of light..” David Mitchell, The Transputer: The Time Is Now (1989)

  • Transputer had bad timing (Uniprocessor performance↑) ⇒ Procrastination rewarded: 2X seq. perf. / 1.5 years
  • “We are dedicating all of our future product development to multicore designs. … This is a sea change in computing” Paul Otellini, President, Intel (2005)
  • All microprocessor companies switch to MP (2X CPUs / 2 yrs) ⇒ Procrastination penalized: 2X sequential perf. / 5 yrs

Manufacturer/Year (^) AMD/’05 Intel/’06 IBM/’04 Sun/’

Processors/chip 2 2 2

Threads/Processor (^1 2 2 )

Threads/chip 2 4 4 32 Docsity.com

21 st^ Century Computer Architecture

  • Old CW: Since cannot know future programs, find

set of old programs to evaluate designs of

computers for the future

  • E.g., SPEC
  • What about parallel codes?
  • Few available, tied to old models, languages, architectures, …
  • New approach: Design computers of future for

numerical methods important in future

  • Claim: key methods for next decade are 7

dwarves (+ a few), so design for them!

  • Representative codes may vary over time, but these numerical methods will be important for > 10 years Docsity.com

6/11 Dwarves Covers 24/30 SPEC

  • SPECfp
    • 8 Structured grid
      • 3 using Adaptive Mesh Refinement
    • 2 Sparse linear algebra
    • 2 Particle methods
    • 5 TBD: Ray tracer, Speech Recognition, Quantum Chemistry, Lattice Quantum Chromodynamics (many kernels inside each benchmark?)
  • SPECint
    • 8 Finite State Machine
    • 2 Sorting/Searching
    • 2 Dense linear algebra (data type differs from dwarf)
    • 1 TBD: 1 C compiler (many kernels?) Docsity.com

21 st^ Century Code Generation

• Old CW: Takes a decade for compilers to

introduce an architecture innovation

• New approach: “Auto-tuners” 1st run

variations of program on computer to find

best combinations of optimizations (blocking,

padding, …) and algorithms, then produce C

code to be compiled for that computer

  • E.g., PHiPAC (BLAS), Atlas (BLAS),

Sparsity (Sparse linear algebra), Spiral (DSP), FFT-

W

  • Can achieve 10X over conventional compiler Docsity.com

Best Sparse Blocking for 8 Computers

  • All possible column block sizes selected for 8 computers; How could compiler know?

Intel Pentium M

Sun Ultra 2, Sun Ultra 3, AMD Opteron IBM Power 4, Intel/HP Itanium

Intel/HP Itanium 2

IBM

Power 3

8

4

2 1 1 2 4 8

row block size (r)

column block size (c)

Docsity.com

21 st^ Century Measures of Success

• Old CW: Don’t waste resources on accuracy,

reliability

  • Speed kills competition
  • Blame Microsoft for crashes

• New CW: SPUR is critical for future of IT

  • S ecurity
  • P rivacy
  • U sability (cost of ownership)
  • R eliability

• Success not limited to performance/cost“20th century vs. 21st century C&C: the SPUR manifesto,”

Communications of the ACM , 48:3, 2005.Docsity.com

Parallel Framework – Apps (so far)  Original 7 dwarves: 6 data parallel, 1 no coupling TLP  Bonus 4 dwarves: 2 data parallel, 2 no coupling TLP  EEMBC (Embedded): Stream 10, DLP 19, Barrier TLP 2  SPEC (Desktop): 14 DLP, 2 no coupling TLP

E E M B C

E E M B C

S P E C

S P E C (^) D w a r f S

D W A R F S

Streaming DLP DLP No coupling TLP Barrier TLP Tight TLP

Most New Architectures

Most Important Apps?

Docsity.com

Outline

• Part I: A New Agenda for Computer

Architecture

  • Old Conventional Wisdom vs. New Conventional

Wisdom

• Part II: A “Watering Hole” for Parallel Systems

  • Research Accelerator for Multiple Processors

• Conclusion

Docsity.com

Build Academic MPP from

• As ≈ 25 CPUs will fit in Field Programmable Gate ArrayFPGAs

(FPGA), 1000-CPU system from ≈ 40 FPGAs?

  • 16 32-bit simple “soft core” RISC at 150MHz in 2004

(Virtex-II)

  • FPGA generations every 1.5 yrs; ≈ 2X CPUs, ≈ 1.2X clock

rate

  • HW research community does logic design (“gate

shareware”) to create out-of-the-box, MPP

  • E.g., 1000 processor, standard ISA binary-compatible,

64-bit, cache-coherent supercomputer @ ≈ 100

MHz/CPU in 2007

  • RAMPants: Arvind (MIT), Krste Asanovíc (MIT), DerekDocsity.com

Why RAMP Good for Research MPP?

SMP Cluster Simulate RAMP

Scalability (1k CPUs) C A A A

Cost (1k CPUs) F ($40M) C ($2-3M) A+ ($0M) A ($0.1-0.2M)

Cost of ownership A D A A

Power/Space (kilowatts, racks)

D (120 kw, 12 racks)

D (120 kw, 12 racks)

A+ (.1 kw, 0.1 racks)

A (1.5 kw, 0.3 racks)

Community D A A A

Observability D C A+ A+

Reproducibility B D A+ A+

Reconfigurability D C A+ A+

Credibility A+ A+ F B+/A-

Perform. (clock) A (2 GHz) A (3 GHz) F (0 GHz) C (0.1-.2 GHz)

GPA C B- B A- Docsity.com