



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of parallel computer architecture, its importance, and the evaluation of its performance. It covers the role of computer architecture, the architect's job, the growth rate of the computer market, and the applications of parallel architecture. The document also discusses the major sectors of the computer market, including desktops, servers, and embedded systems, and their unique requirements. Furthermore, it explores the concept of parallel architecture, the reasons for its study, and the performance metrics used to evaluate it.
Typology: Slides
1 / 6
This page cannot be seen from the preview
Don't miss anything!




[From Chapter 1 of Culler, Singh, Gupta]
Amdahl, Blaauw and Brookes, 1964 (IBM 360 team): The structure of a computer that a machine language programmer must understand to write a correct (timing independent) program for that machine Loosely speaking, it is the science of designing computers “leading to glorious failures and some notable successes”
Design and engineer various parts of a computer system to maximize performance and programmability within the technology limits and cost budget Technology limit could mean process/circuit technology in case of microprocessor architecture For bigger systems technology limit could mean interconnect technology (how one component talks to another at macro level)
Two major architectural reasons Advent of RISC (Reduced Instruction Set Computer) made it easy to implement many aggressive architectural techniques for extracting parallelism Introduction of caches Made easy by Moore’s law Two major impacts Highest performance microprocessors today outperform supercomputers designed less than 10 years ago Microprocessor-based products have dominated all sectors of computing: desktops, workstations, minicomputers are replaced by servers, mainframes are replaced by
Three major sectors Desktop: ranges from low-end PCs to high-end workstations; market trend is very sensitive to price-performance ratio Server: used in large-scale computing or service-oriented market such as heavy- weight scientific computing, databases, web services, etc; reliability, availability and scalability are very important; servers are normally designed for high throughput Embedded: fast growing sector; very price-sensitive; present in most day-to-day appliances such as microwave ovens, washing machines, printers, network switches, palmtops, cell phones, smart cards, game engines; software is usually specialized/tuned for one particular system
Very different in three sectors This difference is the main reason for different design styles in these three areas Desktop market demands leading-edge microprocessors, high-performance graphics engines; must offer balanced performance for a wide range of applications; customers are happy to spend a reasonable amount of money for high performance i.e. the metric is price-performance Server market integrates high-end microprocessors into scalable multiprocessors; throughput is very important; could be floating-point or graphics or transaction throughput Embedded market adopts high-end microprocessor techniques paying immense attention to low price and low power; processors are either general purpose (to some extent) or application-specific
Collection of processing elements that co-operate to solve large problems fast Design questions that need to be answered How many processing elements (scalability)? How capable is each processor (computing power)? How to address memory (shared or distributed)? How much addressable memory (address bit allocation)? How do the processors communicate (through memory or by messages)? How do the processors avoid data races (synchronization)? How do you answer all these to achieve highest performance within your cost envelope?
Parallelism helps There are applications that can be parallelized easily There are important applications that require enormous amount of computation (10 GFLOPS to 1 TFLOPS) NASA taps SGI, Intel for Supercomputers: 20 512p SGI Altix using Itanium 2
There are important applications that need to deliver high throughput
Parallelism is ubiquitous Need to understand the design trade-offs Microprocessors are now multiprocessors (more later) Today a computer architect’s primary job is to find out how to efficiently extract parallelism Get involved in interesting research projects Make an impact Shape the future development Have fun
Need benchmark applications SPLASH (Stanford ParalleL Applications for SHared memory) SPEC (Standard Performance Evaluation Corp.) OMP ScaLAPACK (Scalable Linear Algebra PACKage) for message-passing machines TPC (Transaction Processing Performance Council) for database/transaction processing performance NAS (Numerical Aerodynamic Simulation) for aerophysics applications NPB2 port to MPI for message-passing only PARKBENCH (PARallel Kernels and BENCHmarks) for message-passing only Comparing two different parallel computers Execution time is the most reliable metric Sometimes MFLOPS, GFLOPS, TFLOPS are used, but could be misleading Evaluating a particular machine Use speedup to gauge scalability of the machine (provided the application itself scales) Speedup(P) = Uniprocessor time/Time on P processors Normally the input data set is kept constant when measuring speedup