

























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The changing landscape of computer architecture and the need for new approaches based on parallelism. It challenges old conventional wisdom and highlights the importance of bringing together architects, language designers, application experts, numerical analysts, algorithm designers, and programmers to find solutions. The document also discusses the challenges of building large designs at ≤65 nm and the memory wall. It ends by discussing the shift towards increasing parallelism and the industry's bet on breakthroughs before it's too late.
Typology: Lecture notes
1 / 65
This page cannot be seen from the preview
Don't miss anything!


























































and a cast of thousands
U.C. BerkeleyJanuary, 2007
High Level Message^ ^
Everything is changing Old conventional wisdom is out We desperately
need new approach to HW
and SW based on parallelism since industryhas bet its future that parallelism works
^ Need to create a “watering hole” to bringeveryone together to quickly find thatsolution^
architects, language designers, application experts, numericalanalysts, algorithm designers, programmers, …
Conventional Wisdom (CW)
in Computer Architecture
: Power is free, but transistors expensive
is the “
Power is expensive, but transistors are “free” ^ Can put more transistors on a chip than have the power to turn on
: Only concern is dynamic power
: For desktops and servers, static power due to leakage is 40% of total power
: Monolithic uniprocessors are reliable internally, with errors occurring only at pins
: As chips drop below 65 nm feature sizes, they will have high soft and hard error rates
Conventional Wisdom (CW)
in Computer Architecture
: By building upon prior successes, continue raising level of abstraction and size of HW designs
: Wire delay, noise, cross coupling, reliability, clock jitter, design validation, … stretch developmenttime and cost of large designs at ≤65 nm
: Researchers demonstrate new architectures by building chips
: Cost of 65 nm masks, cost of ECAD, and design time for GHz clocks ⇒^ Researchers no longer build believable chips
: Performance improves latency & bandwidth
: BW improves > (latency improvement)
2
10000 1000 100 10 1 1978
1980 1982 1984 1986 1988 1990
1992 1994 1996
1998
2000 2002 2004
2006
Performance (vs. VAX-11/780)
25%/year
52%/year
??%/year
Uniprocessor Performance (SPECint) • VAX^
: 25%/year 1978 to 1986
-^ RISC + x86: 52%/year 1986 to 2002 •^ RISC + x86: ??%/year 2002 to present
From Hennessy and Patterson,
Computer Architecture: A
Quantitative Approach
, 4th edition, Sept. 15, 2006
⇒⇒⇒⇒^ Sea change in chipdesign: multiple “cores” orprocessors per chip
3X
Sea Change in Chip Design ^ Intel 4004 (1971): 4-bit processor,2312 transistors, 0.4 MHz,10 micron PMOS, 11 mm
2 chip
^ RISC II (1983): 32-bit, 5 stagepipeline, 40,760 transistors, 3 MHz,3 micron NMOS, 60 mm •^ Processor is the new transistor!
2 chip
^ 125 mm
2 chip, 0.065 micron CMOS = 2312 RISC II+FPU+Icache+Dcache^ ^ RISC II shrinks to
≈^ 0.02 mm
2 at 65 nm
^ Caches via DRAM or 1 transistor SRAM or 3D chip stacking ^ Proximity Communication via capacitive coupling at > 1 TB/s ?(Ivan Sutherland @ Sun / Berkeley)
Parallelism again? What’s differentthis time?^ “This shift toward increasing parallelism is not atriumphant stride forward based on breakthroughsin novel software and architectures for parallelism;instead, this
plunge into parallelism is actually a
retreat from even greater challenges that thwartefficient silicon implementation of traditionaluniprocessor architectures
Berkeley View, December 2006
^ HW/SW Industry bet its future that breakthroughswill appear before its too late
From Multiprogramming toMultithreading^ ^
Multiprogrammed workloads (mix ofindependent sequential tasks) mightobviously benefit from first few generationsof multicores But how will single tasks get faster on futuremanycores?
7 Questions for Parallelism Applications: 1. What are the apps?2. What are kernels of apps? Hardware: 3. What are the HW buildingblocks?4. How to connect them? Programming Model &Systems Software: 5. How to describe apps andkernels?6. How to program the HW? Evaluation: 7. How to measure success?
(Inspired by a view of theGolden Gate Bridge from Berkeley)
^ Old CW: Since cannot know future programs,use old programs to evaluate future computers^
e.g., SPEC2006, EEMBC ^ What about parallel codes?^
Few, tied to old models, languages, architectures, … ^ New approach: Design future computers forpatterns of computation and communicationimportant in the future ^ Claim: 13 “dwarfs” are key for next decade,so design for them!^
Representative codes may vary over time, but thesedwarfs will be important for > 10 years Apps and Kernels
Do dwarfs work well outside HPC?^ ^
Examine effectiveness 7 dwarfs elsewhere
1.^
Embedded Computing (EEMBC benchmark)
2.^
Desktop/Server Computing (SPEC2006)
3.^
Machine Learning ^ Advice from Mike Jordan and Dan Klein of UC Berkeley
4.^
Games/Graphics/Vision
5.^
Data Base Software ^ Advice from Jim Gray of Microsoft and Joe Hellerstein of UC ^
Result: Added 7 more dwarfs, revised 2original dwarfs, renumbered list
13 Dwarfs (so far)^ 1. Dense Linear Algebra2. Sparse Linear Algebra3. Spectral Methods4. N-Body Methods5. Structured Grids6. Unstructured Grids7. MapReduce
7 Questions for Parallelism Applications:1. What are the apps?2. What are kernels of apps? Hardware: 3. What are the HW buildingblocks?4. How to connect them? Programming Model &Systems Software: 5. How to describe apps andkernels?6. How to program the HW? Evaluation: 7. How to measure success?
(Inspired by a view of theGolden Gate Bridge from Berkeley)
Intel Tejas Pentium 4 cancelled due to power issues
IBM quotes yields of 10 – 20% on 8-processor
Cell