Search in the document preview
www.dbeBooks.com - An Ebook Library
In Praise of
Computer Architecture: A Quantitative Approach
“The multiprocessor is here and it can no longer be avoided. As we bid farewell to single-core processors and move into the chip multiprocessing age, it is great timing for a new edition of Hennessy and Patterson’s classic. Few books have had as significant an impact on the way their discipline is taught, and the current edi- tion will ensure its place at the top for some time to come.”
—Luiz André Barroso, Google Inc.
“What do the following have in common: Beatles’ tunes, HP calculators, choco- late chip cookies, and
? They are all classics that have stood the test of time.”
—Robert P. Colwell, Intel lead architect
“Not only does the book provide an authoritative reference on the concepts that all computer architects should be familiar with, but it is also a good starting point for investigations into emerging areas in the field.”
—Krisztián Flautner, ARM Ltd.
“The best keeps getting better! This new edition is updated and very relevant to the key issues in computer architecture today. Plus, its new exercise paradigm is much more useful for both students and instructors.”
—Norman P. Jouppi, HP Labs
builds on fundamentals that yielded the RISC revolution, including the enablers for CISC translation. Now, in this new edition, it clearly explains and gives insight into the latest microarchitecture techniques needed for the new generation of multithreaded multicore processors.”
—Marc Tremblay, Fellow & VP, Chief Architect, Sun Microsystems
“This is a great textbook on all key accounts: pedagogically superb in exposing the ideas and techniques that define the art of computer organization and design, stimulating to read, and comprehensive in its coverage of topics. The first edition set a standard of excellence and relevance; this latest edition does it again.”
—Milos̆ Ercegovac, UCLA
“They’ve done it again. Hennessy and Patterson emphatically demonstrate why they are the doyens of this deep and shifting field. Fallacy: Computer architecture isn’t an essential subject in the information age. Pitfall: You don’t need the 4th edition of
—Michael D. Smith, Harvard University
“Hennessy and Patterson have done it again! The 4th edition is a classic encore that has been adapted beautifully to meet the rapidly changing constraints of ‘late-CMOS-era’ technology. The detailed case studies of real processor products are especially educational, and the text reads so smoothly that it is difficult to put down. This book is a must-read for students and professionals alike!”
—Pradip Bose, IBM
“This latest edition of
is sure to provide students with the architectural framework and foundation they need to become influential archi- tects of the future.”
— Ravishankar Iyer, Intel Corp.
“As technology has advanced, and design opportunities and constraints have changed, so has this book. The 4th edition continues the tradition of presenting the latest in innovations with commercial impact, alongside the foundational con- cepts: advanced processor and memory system design techniques, multithreading and chip multiprocessors, storage systems, virtual machines, and other concepts. This book is an excellent resource for anybody interested in learning the architec- tural concepts underlying real commercial products.”
—Gurindar Sohi, University of Wisconsin–Madison
“I am very happy to have my students study computer architecture using this fan- tastic book and am a little jealous for not having written it myself.”
—Mateo Valero, UPC, Barcelona
“Hennessy and Patterson continue to evolve their teaching methods with the changing landscape of computer system design. Students gain unique insight into the factors influencing the shape of computer architecture design and the poten- tial research directions in the computer systems field.”
—Dan Connors, University of Colorado at Boulder
“With this revision,
will remain a must-read for all com- puter architecture students in the coming decade.”
—Wen-mei Hwu, University of Illinois at Urbana–Champaign
“The 4th edition of
continues in the tradition of providing a relevant and cutting edge approach that appeals to students, researchers, and designers of computer systems. The lessons that this new edition teaches will continue to be as relevant as ever for its readers.”
—David Brooks, Harvard University
“With the 4th edition, Hennessy and Patterson have shaped
Computer Architec- ture
back to the lean focus that made the 1st edition an instant classic.”
—Mark D. Hill, University of Wisconsin–Madison
A Quantitative Approach
John L. Hennessy
is the president of Stanford University, where he has been a member of the faculty since 1977 in the departments of electrical engineering and computer science. Hen- nessy is a Fellow of the IEEE and ACM, a member of the National Academy of Engineering and the National Academy of Science, and a Fellow of the American Academy of Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for his contributions to RISC tech- nology, the 2001 Seymour Cray Computer Engineering Award, and the 2000 John von Neu- mann Award, which he shared with David Patterson. He has also received seven honorary doctorates.
In 1981, he started the MIPS project at Stanford with a handful of graduate students. After com- pleting the project in 1984, he took a one-year leave from the university to cofound MIPS Com- puter Systems, which developed one of the first commercial RISC microprocessors. After being acquired by Silicon Graphics in 1991, MIPS Technologies became an independent company in 1998, focusing on microprocessors for the embedded marketplace. As of 2006, over 500 million MIPS microprocessors have been shipped in devices ranging from video games and palmtop computers to laser printers and network switches.
David A. Patterson
has been teaching computer architecture at the University of California, Berkeley, since joining the faculty in 1977, where he holds the Pardee Chair of Computer Sci- ence. His teaching has been honored by the Abacus Award from Upsilon Pi Epsilon, the Distin- guished Teaching Award from the University of California, the Karlstrom Award from ACM, and the Mulligan Education Medal and Undergraduate Teaching Award from IEEE. Patterson re- ceived the IEEE Technical Achievement Award for contributions to RISC and shared the IEEE Johnson Information Storage Award for contributions to RAID. He then shared the IEEE John von Neumann Medal and the C & C Prize with John Hennessy. Like his co-author, Patterson is a Fellow of the American Academy of Arts and Sciences, ACM, and IEEE, and he was elected to the National Academy of Engineering, the National Academy of Sciences, and the Silicon Valley En- gineering Hall of Fame. He served on the Information Technology Advisory Committee to the U.S. President, as chair of the CS division in the Berkeley EECS department, as chair of the Com- puting Research Association, and as President of ACM. This record led to a Distinguished Service Award from CRA.
At Berkeley, Patterson led the design and implementation of RISC I, likely the first VLSI reduced instruction set computer. This research became the foundation of the SPARC architecture, cur- rently used by Sun Microsystems, Fujitsu, and others. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project, which led to dependable storage systems from many com- panies. He was also involved in the Network of Workstations (NOW) project, which led to cluster technology used by Internet companies. These projects earned three dissertation awards from the ACM. His current research projects are the RAD Lab, which is inventing technology for reli- able, adaptive, distributed Internet services, and the Research Accelerator for Multiple Proces- sors (RAMP) project, which is developing and distributing low-cost, highly scalable, parallel computers based on FPGAs and open-source hardware and software.
A Quantitative Approach
John L. Hennessy
David A. Patterson
University of California at Berkeley
With Contributions by
Andrea C. Arpaci-Dusseau
University of Wisconsin–Madison
Remzi H. Arpaci-Dusseau
University of Wisconsin–Madison
Massachusetts Institute of Technology
Robert P. Colwell
R&E Colwell & Associates, Inc.
Thomas M. Conte
North Carolina State University
Universitat Politècnica de València
California Polytechnic State University, San Luis Obispo
Xerox Palo Alto Research Center
Wen-mei W. Hwu
University of Illinois at Urbana–Champaign
Norman P. Jouppi
Timothy M. Pinkston
University of Southern California
John W. Sias
University of Illinois at Urbana–Champaign
David A. Wood
University of Wisconsin–Madison
Amsterdam • Boston • Heidelberg • London New York • Oxford • Paris • San Diego
San Francisco • Singapore • Sydney • Tokyo
Denise E. M. Penrose
Dusty Friedman, The Book Company
In-house Senior Project Manager
Elisabeth Beller and Ross Carron Design
Richard I’Anson’s Collection: Lonely Planet Images
Rebecca Evans & Associates
David Ruppe, Impact Publications
Ken Della Penta
Maple-Vail Book Manufacturing Group
Morgan Kaufmann Publishers is an Imprint of Elsevier 500 Sansome Street, Suite 400, San Francisco, CA 94111
This book is printed on acid-free paper.
© 1990, 1996, 2003, 2007 by Elsevier, Inc. All rights reserved.
Published 1990. Fourth edition 2007
Designations used by companies to distinguish their products are often claimed as trademarks or reg- istered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: permissions@elsevier. com. You may also complete your request on-line via the Elsevier Science homepage (
), by selecting “Customer Support” and then “Obtaining Permissions.”
Library of Congress Cataloging-in-Publication Data
Hennessy, John L. Computer architecture : a quantitative approach / John L. Hennessy, David A. Patterson ; with contributions by Andrea C. Arpaci-Dusseau . . . [et al.]. —4th ed. p.cm. Includes bibliographical references and index. ISBN 13: 978-0-12-370490-0 (pbk. : alk. paper) ISBN 10: 0-12-370490-1 (pbk. : alk. paper) 1. Computer architecture. I. Patterson, David A. II. Arpaci-Dusseau, Andrea C. III. Title.
QA76.9.A73P377 2006 004.2'2—dc22
For all information on all Morgan Kaufmann publications, visit our website at
Printed in the United States of America 06 07 08 09 10 5 4 3 2 1
To Andrea, Linda, and our four sons
I am honored and privileged to write the foreword for the fourth edition of this most important book in computer architecture. In the first edition, Gordon Bell, my first industry mentor, predicted the book’s central position as the definitive text for computer architecture and design. He was right. I clearly remember the excitement generated by the introduction of this work. Rereading it now, with significant extensions added in the three new editions, has been a pleasure all over again. No other work in computer architecture—frankly, no other work I have read in any field—so quickly and effortlessly takes the reader from igno- rance to a breadth and depth of knowledge.
This book is dense in facts and figures, in rules of thumb and theories, in examples and descriptions. It is stuffed with acronyms, technologies, trends, for- mulas, illustrations, and tables. And, this is thoroughly appropriate for a work on architecture. The architect’s role is not that of a scientist or inventor who will deeply study a particular phenomenon and create new basic materials or tech- niques. Nor is the architect the craftsman who masters the handling of tools to craft the finest details. The architect’s role is to combine a thorough understand- ing of the state of the art of what is possible, a thorough understanding of the his- torical and current styles of what is desirable, a sense of design to conceive a harmonious total system, and the confidence and energy to marshal this knowl- edge and available resources to go out and get something built. To accomplish this, the architect needs a tremendous density of information with an in-depth understanding of the fundamentals and a quantitative approach to ground his thinking. That is exactly what this book delivers.
As computer architecture has evolved—from a world of mainframes, mini- computers, and microprocessors, to a world dominated by microprocessors, and now into a world where microprocessors themselves are encompassing all the complexity of mainframe computers—Hennessy and Patterson have updated their book appropriately. The first edition showcased the IBM 360, DEC VAX, and Intel 80x86, each the pinnacle of its class of computer, and helped introduce the world to RISC architecture. The later editions focused on the details of the 80x86 and RISC processors, which had come to dominate the landscape. This lat- est edition expands the coverage of threading and multiprocessing, virtualization
by Fred Weber, President and CEO of MetaRAM, Inc.
and memory hierarchy, and storage systems, giving the reader context appropri- ate to today’s most important directions and setting the stage for the next decade of design. It highlights the AMD Opteron and SUN Niagara as the best examples of the x86 and SPARC (RISC) architectures brought into the new world of multi- processing and system-on-a-chip architecture, thus grounding the art and science in real-world commercial examples.
The first chapter, in less than 60 pages, introduces the reader to the taxono- mies of computer design and the basic concerns of computer architecture, gives an overview of the technology trends that drive the industry, and lays out a quan- titative approach to using all this information in the art of computer design. The next two chapters focus on traditional CPU design and give a strong grounding in the possibilities and limits in this core area. The final three chapters build out an understanding of system issues with multiprocessing, memory hierarchy, and storage. Knowledge of these areas has always been of critical importance to the computer architect. In this era of system-on-a-chip designs, it is essential for every CPU architect. Finally the appendices provide a great depth of understand- ing by working through specific examples in great detail.
In design it is important to look at both the forest and the trees and to move easily between these views. As you work through this book you will find plenty of both. The result of great architecture, whether in computer design, building design or textbook design, is to take the customer’s requirements and desires and return a design that causes that customer to say, “Wow, I didn’t know that was possible.” This book succeeds on that measure and will, I hope, give you as much pleasure and value as it has me.
Fundamentals of Computer Design
Classes of Computers 4
Defining Computer Architecture 8
Trends in Technology 14
Trends in Power in Integrated Circuits 17
Trends in Cost 19
Measuring, Reporting, and Summarizing Performance 28
Quantitative Principles of Computer Design 37
Putting It All Together: Performance and Price-Performance 44
Fallacies and Pitfalls 48
Concluding Remarks 52
Historical Perspectives and References 54 Case Studies with Exercises by Diana Franklin 55
Instruction-Level Parallelism and Its Exploitation
Instruction-Level Parallelism: Concepts and Challenges 66
Basic Compiler Techniques for Exposing ILP 74
Reducing Branch Costs with Prediction 80
Overcoming Data Hazards with Dynamic Scheduling 89
Dynamic Scheduling: Examples and the Algorithm 97
Hardware-Based Speculation 104
Exploiting ILP Using Multiple Issue and Static Scheduling 114
Exploiting ILP Using Dynamic Scheduling, Multiple Issue, and Speculation 118
Advanced Techniques for Instruction Delivery and Speculation 121
Putting It All Together: The Intel Pentium 4 131
Fallacies and Pitfalls 138
Concluding Remarks 140
Historical Perspective and References 141 Case Studies with Exercises by Robert P. Colwell 142
Limits on Instruction-Level Parallelism
Studies of the Limitations of ILP 154
Limitations on ILP for Realizable Processors 165
Crosscutting Issues: Hardware versus Software Speculation 170
Multithreading: Using ILP Support to Exploit Thread-Level Parallelism 172
Putting It All Together: Performance and Efficiency in Advanced Multiple-Issue Processors 179
Fallacies and Pitfalls 183
Concluding Remarks 184
Historical Perspective and References 185 Case Study with Exercises by Wen-mei W. Hwu and John W. Sias 185
Multiprocessors and Thread-Level Parallelism
Symmetric Shared-Memory Architectures 205
Performance of Symmetric Shared-Memory Multiprocessors 218
Distributed Shared Memory and Directory-Based Coherence 230
Synchronization: The Basics 237
Models of Memory Consistency: An Introduction 243
Crosscutting Issues 246
Putting It All Together: The Sun T1 Multiprocessor 249
Fallacies and Pitfalls 257
Concluding Remarks 262
Historical Perspective and References 264 Case Studies with Exercises by David A. Wood 264
Memory Hierarchy Design
Eleven Advanced Optimizations of Cache Performance 293
Memory Technology and Optimizations 310
Protection: Virtual Memory and Virtual Machines 315
Crosscutting Issues: The Design of Memory Hierarchies 324
Putting It All Together: AMD Opteron Memory Hierarchy 326
Fallacies and Pitfalls 335
Concluding Remarks 341
Historical Perspective and References 342 Case Studies with Exercises by Norman P. Jouppi 342
Advanced Topics in Disk Storage 358
Definition and Examples of Real Faults and Failures 366
I/O Performance, Reliability Measures, and Benchmarks 371
6.5 A Little Queuing Theory 379 6.6 Crosscutting Issues 390 6.7 Designing and Evaluating an I/O System—The Internet
Archive Cluster 392 6.8 Putting It All Together: NetApp FAS6000 Filer 397 6.9 Fallacies and Pitfalls 399 6.10 Concluding Remarks 403 6.11 Historical Perspective and References 404
Case Studies with Exercises by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau 404
Appendix A Pipelining: Basic and Intermediate Concepts
A.1 Introduction A-2 A.2 The Major Hurdle of Pipelining—Pipeline Hazards A-11 A.3 How Is Pipelining Implemented? A-26 A.4 What Makes Pipelining Hard to Implement? A-37 A.5 Extending the MIPS Pipeline to Handle Multicycle Operations A-47 A.6 Putting It All Together: The MIPS R4000 Pipeline A-56 A.7 Crosscutting Issues A-65 A.8 Fallacies and Pitfalls A-75 A.9 Concluding Remarks A-76 A.10 Historical Perspective and References A-77
Appendix B Instruction Set Principles and Examples
B.1 Introduction B-2 B.2 Classifying Instruction Set Architectures B-3 B.3 Memory Addressing B-7 B.4 Type and Size of Operands B-13 B.5 Operations in the Instruction Set B-14
B.6 Instructions for Control Flow B-16 B.7 Encoding an Instruction Set B-21 B.8 Crosscutting Issues: The Role of Compilers B-24 B.9 Putting It All Together: The MIPS Architecture B-32 B.10 Fallacies and Pitfalls B-39 B.11 Concluding Remarks B-45 B.12 Historical Perspective and References B-47
Appendix C Review of Memory Hierarchy
C.1 Introduction C-2 C.2 Cache Performance C-15 C.3 Six Basic Cache Optimizations C-22 C.4 Virtual Memory C-38 C.5 Protection and Examples of Virtual Memory C-47 C.6 Fallacies and Pitfalls C-56 C.7 Concluding Remarks C-57 C.8 Historical Perspective and References C-58
Companion CD Appendices
Appendix D Embedded Systems Updated by Thomas M. Conte
Appendix E Interconnection Networks Revised by Timothy M. Pinkston and José Duato
Appendix F Vector Processors Revised by Krste Asanovic
Appendix G Hardware and Software for VLIW and EPIC
Appendix H Large-Scale Multiprocessors and Scientific Applications
Appendix I Computer Arithmetic by David Goldberg
Appendix J Survey of Instruction Set Architectures
Appendix K Historical Perspectives and References
Online Appendix (textbooks.elsevier.com/0123704901)
Appendix L Solutions to Case Study Exercises
Why We Wrote This Book
Through four editions of this book, our goal has been to describe the basic princi- ples underlying what will be tomorrow’s technological developments. Our excitement about the opportunities in computer architecture has not abated, and we echo what we said about the field in the first edition: “It is not a dreary science of paper machines that will never work. No! It’s a discipline of keen intellectual interest, requiring the balance of marketplace forces to cost-performance-power, leading to glorious failures and some notable successes.”
Our primary objective in writing our first book was to change the way people learn and think about computer architecture. We feel this goal is still valid and important. The field is changing daily and must be studied with real examples and measurements on real computers, rather than simply as a collection of defini- tions and designs that will never need to be realized. We offer an enthusiastic welcome to anyone who came along with us in the past, as well as to those who are joining us now. Either way, we can promise the same quantitative approach to, and analysis of, real systems.
As with earlier versions, we have strived to produce a new edition that will continue to be as relevant for professional engineers and architects as it is for those involved in advanced computer architecture and design courses. As much as its predecessors, this edition aims to demystify computer architecture through an emphasis on cost-performance-power trade-offs and good engineering design. We believe that the field has continued to mature and move toward the rigorous quantitative foundation of long-established scientific and engineering disciplines.
The fourth edition of Computer Architecture: A Quantitative Approach may be the most significant since the first edition. Shortly before we started this revision, Intel announced that it was joining IBM and Sun in relying on multiple proces- sors or cores per chip for high-performance designs. As the first figure in the book documents, after 16 years of doubling performance every 18 months, sin-
gle-processor performance improvement has dropped to modest annual improve- ments. This fork in the computer architecture road means that for the first time in history, no one is building a much faster sequential processor. If you want your program to run significantly faster, say, to justify the addition of new features, you’re going to have to parallelize your program.
Hence, after three editions focused primarily on higher performance by exploiting instruction-level parallelism (ILP), an equal focus of this edition is thread-level parallelism (TLP) and data-level parallelism (DLP). While earlier editions had material on TLP and DLP in big multiprocessor servers, now TLP and DLP are relevant for single-chip multicores. This historic shift led us to change the order of the chapters: the chapter on multiple processors was the sixth chapter in the last edition, but is now the fourth chapter of this edition.
The changing technology has also motivated us to move some of the content from later chapters into the first chapter. Because technologists predict much higher hard and soft error rates as the industry moves to semiconductor processes with feature sizes 65 nm or smaller, we decided to move the basics of dependabil- ity from Chapter 7 in the third edition into Chapter 1. As power has become the dominant factor in determining how much you can place on a chip, we also beefed up the coverage of power in Chapter 1. Of course, the content and exam- ples in all chapters were updated, as we discuss below.
In addition to technological sea changes that have shifted the contents of this edition, we have taken a new approach to the exercises in this edition. It is sur- prisingly difficult and time-consuming to create interesting, accurate, and unam- biguous exercises that evenly test the material throughout a chapter. Alas, the Web has reduced the half-life of exercises to a few months. Rather than working out an assignment, a student can search the Web to find answers not long after a book is published. Hence, a tremendous amount of hard work quickly becomes unusable, and instructors are denied the opportunity to test what students have learned.
To help mitigate this problem, in this edition we are trying two new ideas. First, we recruited experts from academia and industry on each topic to write the exercises. This means some of the best people in each field are helping us to cre- ate interesting ways to explore the key concepts in each chapter and test the reader’s understanding of that material. Second, each group of exercises is orga- nized around a set of case studies. Our hope is that the quantitative example in each case study will remain interesting over the years, robust and detailed enough to allow instructors the opportunity to easily create their own new exercises, should they choose to do so. Key, however, is that each year we will continue to release new exercise sets for each of the case studies. These new exercises will have critical changes in some parameters so that answers to old exercises will no longer apply.
Another significant change is that we followed the lead of the third edition of Computer Organization and Design (COD) by slimming the text to include the material that almost all readers will want to see and moving the appendices that
some will see as optional or as reference material onto a companion CD. There were many reasons for this change:
1. Students complained about the size of the book, which had expanded from 594 pages in the chapters plus 160 pages of appendices in the first edition to 760 chapter pages plus 223 appendix pages in the second edition and then to 883 chapter pages plus 209 pages in the paper appendices and 245 pages in online appendices. At this rate, the fourth edition would have exceeded 1500 pages (both on paper and online)!
2. Similarly, instructors were concerned about having too much material to cover in a single course.
3. As was the case for COD, by including a CD with material moved out of the text, readers could have quick access to all the material, regardless of their ability to access Elsevier’s Web site. Hence, the current edition’s appendices will always be available to the reader even after future editions appear.
4. This flexibility allowed us to move review material on pipelining, instruction sets, and memory hierarchy from the chapters and into Appendices A, B, and C. The advantage to instructors and readers is that they can go over the review material much more quickly and then spend more time on the advanced top- ics in Chapters 2, 3, and 5. It also allowed us to move the discussion of some topics that are important but are not core course topics into appendices on the CD. Result: the material is available, but the printed book is shorter. In this edition we have 6 chapters, none of which is longer than 80 pages, while in the last edition we had 8 chapters, with the longest chapter weighing in at 127 pages.
5. This package of a slimmer core print text plus a CD is far less expensive to manufacture than the previous editions, allowing our publisher to signifi- cantly lower the list price of the book. With this pricing scheme, there is no need for a separate international student edition for European readers.
Yet another major change from the last edition is that we have moved the embedded material introduced in the third edition into its own appendix, Appen- dix D. We felt that the embedded material didn’t always fit with the quantitative evaluation of the rest of the material, plus it extended the length of many chapters that were already running long. We believe there are also pedagogic advantages in having all the embedded information in a single appendix.
This edition continues the tradition of using real-world examples to demon- strate the ideas, and the “Putting It All Together” sections are brand new; in fact, some were announced after our book was sent to the printer. The “Putting It All Together” sections of this edition include the pipeline organizations and memory hierarchies of the Intel Pentium 4 and AMD Opteron; the Sun T1 (“Niagara”) 8- processor, 32-thread microprocessor; the latest NetApp Filer; the Internet Archive cluster; and the IBM Blue Gene/L massively parallel processor.
Topic Selection and Organization
As before, we have taken a conservative approach to topic selection, for there are many more interesting ideas in the field than can reasonably be covered in a treat- ment of basic principles. We have steered away from a comprehensive survey of every architecture a reader might encounter. Instead, our presentation focuses on core concepts likely to be found in any new machine. The key criterion remains that of selecting ideas that have been examined and utilized successfully enough to permit their discussion in quantitative terms.
Our intent has always been to focus on material that is not available in equiva- lent form from other sources, so we continue to emphasize advanced content wherever possible. Indeed, there are several systems here whose descriptions cannot be found in the literature. (Readers interested strictly in a more basic introduction to computer architecture should read Computer Organization and Design: The Hardware/Software Interface, third edition.)
An Overview of the Content
Chapter 1 has been beefed up in this edition. It includes formulas for static power, dynamic power, integrated circuit costs, reliability, and availability. We go into more depth than prior editions on the use of the geometric mean and the geo- metric standard deviation to capture the variability of the mean. Our hope is that these topics can be used through the rest of the book. In addition to the classic quantitative principles of computer design and performance measurement, the benchmark section has been upgraded to use the new SPEC2006 suite.
Our view is that the instruction set architecture is playing less of a role today than in 1990, so we moved this material to Appendix B. It still uses the MIPS64 architecture. For fans of ISAs, Appendix J covers 10 RISC architectures, the 80x86, the DEC VAX, and the IBM 360/370.
Chapters 2 and 3 cover the exploitation of instruction-level parallelism in high-performance processors, including superscalar execution, branch prediction, speculation, dynamic scheduling, and the relevant compiler technology. As men- tioned earlier, Appendix A is a review of pipelining in case you need it. Chapter 3 surveys the limits of ILP. New to this edition is a quantitative evaluation of multi- threading. Chapter 3 also includes a head-to-head comparison of the AMD Ath- lon, Intel Pentium 4, Intel Itanium 2, and IBM Power5, each of which has made separate bets on exploiting ILP and TLP. While the last edition contained a great deal on Itanium, we moved much of this material to Appendix G, indicating our view that this architecture has not lived up to the early claims.
Given the switch in the field from exploiting only ILP to an equal focus on thread- and data-level parallelism, we moved multiprocessor systems up to Chap- ter 4, which focuses on shared-memory architectures. The chapter begins with the performance of such an architecture. It then explores symmetric and distributed memory architectures, examining both organizational principles and performance. Topics in synchronization and memory consistency models are
next. The example is the Sun T1 (“Niagara”), a radical design for a commercial product. It reverted to a single-instruction issue, 6-stage pipeline microarchitec- ture. It put 8 of these on a single chip, and each supports 4 threads. Hence, soft- ware sees 32 threads on this single, low-power chip.
As mentioned earlier, Appendix C contains an introductory review of cache principles, which is available in case you need it. This shift allows Chapter 5 to start with 11 advanced optimizations of caches. The chapter includes a new sec- tion on virtual machines, which offers advantages in protection, software man- agement, and hardware management. The example is the AMD Opteron, giving both its cache hierarchy and the virtual memory scheme for its recently expanded 64-bit addresses.
Chapter 6, “Storage Systems,” has an expanded discussion of reliability and availability, a tutorial on RAID with a description of RAID 6 schemes, and rarely found failure statistics of real systems. It continues to provide an introduction to queuing theory and I/O performance benchmarks. Rather than go through a series of steps to build a hypothetical cluster as in the last edition, we evaluate the cost, performance, and reliability of a real cluster: the Internet Archive. The “Putting It All Together” example is the NetApp FAS6000 filer, which is based on the AMD Opteron microprocessor.
This brings us to Appendices A through L. As mentioned earlier, Appendices A and C are tutorials on basic pipelining and caching concepts. Readers relatively new to pipelining should read Appendix A before Chapters 2 and 3, and those new to caching should read Appendix C before Chapter 5.
Appendix B covers principles of ISAs, including MIPS64, and Appendix J describes 64-bit versions of Alpha, MIPS, PowerPC, and SPARC and their multi- media extensions. It also includes some classic architectures (80x86, VAX, and IBM 360/370) and popular embedded instruction sets (ARM, Thumb, SuperH, MIPS16, and Mitsubishi M32R). Appendix G is related, in that it covers architec- tures and compilers for VLIW ISAs.
Appendix D, updated by Thomas M. Conte, consolidates the embedded mate- rial in one place.
Appendix E, on networks, has been extensively revised by Timothy M. Pink- ston and José Duato. Appendix F, updated by Krste Asanovic, includes a descrip- tion of vector processors. We think these two appendices are some of the best material we know of on each topic.
Appendix H describes parallel processing applications and coherence proto- cols for larger-scale, shared-memory multiprocessing. Appendix I, by David Goldberg, describes computer arithmetic.
Appendix K collects the “Historical Perspective and References” from each chapter of the third edition into a single appendix. It attempts to give proper credit for the ideas in each chapter and a sense of the history surrounding the inventions. We like to think of this as presenting the human drama of computer design. It also supplies references that the student of architecture may want to pursue. If you have time, we recommend reading some of the classic papers in the field that are mentioned in these sections. It is both enjoyable and educational
to hear the ideas directly from the creators. “Historical Perspective” was one of the most popular sections of prior editions.
Appendix L (available at textbooks.elsevier.com/0123704901) contains solu- tions to the case study exercises in the book.
Navigating the Text
There is no single best order in which to approach these chapters and appendices, except that all readers should start with Chapter 1. If you don’t want to read everything, here are some suggested sequences:
ILP: Appendix A, Chapters 2 and 3, and Appendices F and G
Memory Hierarchy: Appendix C and Chapters 5 and 6
Thread-and Data-Level Parallelism: Chapter 4, Appendix H, and Appendix E
ISA: Appendices B and J
Appendix D can be read at any time, but it might work best if read after the ISA and cache sequences. Appendix I can be read whenever arithmetic moves you.
The material we have selected has been stretched upon a consistent framework that is followed in each chapter. We start by explaining the ideas of a chapter. These ideas are followed by a “Crosscutting Issues” section, a feature that shows how the ideas covered in one chapter interact with those given in other chapters. This is followed by a “Putting It All Together” section that ties these ideas together by showing how they are used in a real machine.
Next in the sequence is “Fallacies and Pitfalls,” which lets readers learn from the mistakes of others. We show examples of common misunderstandings and architectural traps that are difficult to avoid even when you know they are lying in wait for you. The “Fallacies and Pitfalls” sections is one of the most popular sec- tions of the book. Each chapter ends with a “Concluding Remarks” section.
Case Studies with Exercises
Each chapter ends with case studies and accompanying exercises. Authored by experts in industry and academia, the case studies explore key chapter concepts and verify understanding through increasingly challenging exercises. Instructors should find the case studies sufficiently detailed and robust to allow them to cre- ate their own additional exercises.
Brackets for each exercise (<chapter.section>) indicate the text sections of primary relevance to completing the exercise. We hope this helps readers to avoid exercises for which they haven’t read the corresponding section, in addition to providing the source for review. Note that we provide solutions to the case study
exercises in Appendix L. Exercises are rated, to give the reader a sense of the amount of time required to complete an exercise:
 Less than 5 minutes (to read and understand)
 5–15 minutes for a full answer
 15–20 minutes for a full answer
 1 hour for a full written answer
 Short programming project: less than 1 full day of programming
 Significant programming project: 2 weeks of elapsed time
[Discussion] Topic for discussion with others
A second set of alternative case study exercises are available for instructors who register at textbooks.elsevier.com/0123704901. This second set will be revised every summer, so that early every fall, instructors can download a new set of exercises and solutions to accompany the case studies in the book.
The accompanying CD contains a variety of resources, including the following:
Reference appendices—some guest authored by subject experts—covering a range of advanced topics
Historical Perspectives material that explores the development of the key ideas presented in each of the chapters in the text
Search engine for both the main text and the CD-only content
Additional resources are available at textbooks.elsevier.com/0123704901. The instructor site (accessible to adopters who register at textbooks.elsevier.com) includes:
Alternative case study exercises with solutions (updated yearly)
Instructor slides in PowerPoint
Figures from the book in JPEG and PPT formats
The companion site (accessible to all readers) includes:
Solutions to the case study exercises in the text
Links to related material on the Web
List of errata
New materials and links to other resources available on the Web will be added on a regular basis.
Helping Improve This Book
Finally, it is possible to make money while reading this book. (Talk about cost- performance!) If you read the Acknowledgments that follow, you will see that we went to great lengths to correct mistakes. Since a book goes through many print- ings, we have the opportunity to make even more corrections. If you uncover any remaining resilient bugs, please contact the publisher by electronic mail (firstname.lastname@example.org). The first reader to report an error with a fix that we incor- porate in a future printing will be rewarded with a $1.00 bounty. Please check the errata sheet on the home page (textbooks.elsevier.com/0123704901) to see if the bug has already been reported. We process the bugs and send the checks about once a year or so, so please be patient.
We welcome general comments to the text and invite you to send them to a separate email address at email@example.com.
Once again this book is a true co-authorship, with each of us writing half the chapters and an equal share of the appendices. We can’t imagine how long it would have taken without someone else doing half the work, offering inspiration when the task seemed hopeless, providing the key insight to explain a difficult concept, supplying reviews over the weekend of chapters, and commiserating when the weight of our other obligations made it hard to pick up the pen. (These obligations have escalated exponentially with the number of editions, as one of us was President of Stanford and the other was President of the Association for Computing Machinery.) Thus, once again we share equally the blame for what you are about to read.
John Hennessy David Patterson
Although this is only the fourth edition of this book, we have actually created nine different versions of the text: three versions of the first edition (alpha, beta, and final) and two versions of the second, third, and fourth editions (beta and final). Along the way, we have received help from hundreds of reviewers and users. Each of these people has helped make this book better. Thus, we have cho- sen to list all of the people who have made contributions to some version of this book.
Contributors to the Fourth Edition
Like prior editions, this is a community effort that involves scores of volunteers. Without their help, this edition would not be nearly as polished.
Krste Asanovic, Massachusetts Institute of Technology; Mark Brehob, University of Michigan; Sudhanva Gurumurthi, University of Virginia; Mark D. Hill, Uni- versity of Wisconsin–Madison; Wen-mei Hwu, University of Illinois at Urbana– Champaign; David Kaeli, Northeastern University; Ramadass Nagarajan, Univer- sity of Texas at Austin; Karthikeyan Sankaralingam, Univeristy of Texas at Aus- tin; Mark Smotherman, Clemson University; Gurindar Sohi, University of Wisconsin–Madison; Shyamkumar Thoziyoor, University of Notre Dame, Indi- ana; Dan Upton, University of Virginia; Sotirios G. Ziavras, New Jersey Institute of Technology
Krste Asanovic, Massachusetts Institute of Technology; José Duato, Universitat Politècnica de València and Simula; Antonio González, Intel and Universitat Politècnica de Catalunya; Mark D. Hill, University of Wisconsin–Madison; Lev G. Kirischian, Ryerson University; Timothy M. Pinkston, University of Southern California
Krste Asanovic, Massachusetts Institute of Technology (Appendix F); Thomas M. Conte, North Carolina State University (Appendix D); José Duato, Universi- tat Politècnica de València and Simula (Appendix E); David Goldberg, Xerox PARC (Appendix I); Timothy M. Pinkston, University of Southern California (Appendix E)
Case Studies with Exercises
Andrea C. Arpaci-Dusseau, University of Wisconsin–Madison (Chapter 6); Remzi H. Arpaci-Dusseau, University of Wisconsin–Madison (Chapter 6); Robert P. Col- well, R&E Colwell & Assoc., Inc. (Chapter 2); Diana Franklin, California Poly- technic State University, San Luis Obispo (Chapter 1); Wen-mei W. Hwu, University of Illinois at Urbana–Champaign (Chapter 3); Norman P. Jouppi, HP Labs (Chapter 5); John W. Sias, University of Illinois at Urbana–Champaign (Chapter 3); David A. Wood, University of Wisconsin–Madison (Chapter 4)
John Mashey (geometric means and standard deviations in Chapter 1); Chenming Hu, University of California, Berkeley (wafer costs and yield parameters in Chapter 1); Bill Brantley and Dan Mudgett, AMD (Opteron memory hierarchy evaluation in Chapter 5); Mendel Rosenblum, Stanford and VMware (virtual machines in Chapter 5); Aravind Menon, EPFL Switzerland (Xen measurements in Chapter 5); Bruce Baumgart and Brewster Kahle, Internet Archive (IA cluster in Chapter 6); David Ford, Steve Kleiman, and Steve Miller, Network Appliances (FX6000 information in Chapter 6); Alexander Thomasian, Rutgers (queueing theory in Chapter 6)
Finally, a special thanks once again to Mark Smotherman of Clemson Univer- sity, who gave a final technical reading of our manuscript. Mark found numerous bugs and ambiguities, and the book is much cleaner as a result.
This book could not have been published without a publisher, of course. We wish to thank all the Morgan Kaufmann/Elsevier staff for their efforts and sup- port. For this fourth edition, we particularly want to thank Kimberlee Honjo who coordinated surveys, focus groups, manuscript reviews and appendices, and Nate McFadden, who coordinated the development and review of the case studies. Our warmest thanks to our editor, Denise Penrose, for her leadership in our continu- ing writing saga.
We must also thank our university staff, Margaret Rowland and Cecilia Pracher, for countless express mailings, as well as for holding down the fort at Stanford and Berkeley while we worked on the book.
Our final thanks go to our wives for their suffering through increasingly early mornings of reading, thinking, and writing.