




























































































Estude fácil! Tem muito documento disponível na Docsity
Ganhe pontos ajudando outros esrudantes ou compre um plano Premium
Prepare-se para as provas
Estude fácil! Tem muito documento disponível na Docsity
Prepare-se para as provas com trabalhos de outros alunos como você, aqui na Docsity
Encontra documentos específicos para os exames da tua universidade
Prepare-se com as videoaulas e exercícios resolvidos criados a partir da grade da sua Universidade
Responda perguntas de provas passadas e avalie sua preparação.
Ganhe pontos para baixar
Ganhe pontos ajudando outros esrudantes ou compre um plano Premium
GPU Computing Gems Emerald Edition Outro Livro de programação da GPU.
Tipologia: Manuais, Projetos, Pesquisas
1 / 889
Esta página não é visível na pré-visualização
Não perca as partes importantes!





























































































Wen-mei W. Hwu
AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO •^ SINGAPORE •^ SYDNEY •^ TOKYO Morgan Kaufmann Publishers is an imprint of Elsevier
Acquiring Editor: Todd Green Assistant Editor: Robyn Day Project Manager: Paul Gottehrer Designer: Dennis Schaefer Morgan Kaufmann is an imprint of Elsevier 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA ©c 2011 NVIDIA Corporation and Wen-mei W. Hwu. Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data GPU computing gems / editor, Wen-mei W. Hwu. p. cm. Includes bibliographical references. ISBN 978-0-12-384988-
For information on all MK publications visit our website at www.mkp.com
Printed in the United States of America 11 12 13 14 15 11 10 9 8 7 6 5 4 3 2 1
Parallel Hybrid Architectures................................................... 133 Luigi Genovese, Matthieu Ospici, Brice Videau, Thierry Deutsch, Jean-Franc¸ ois M ´ehaut
SECTION 2 LIFE SCIENCES Bertil Schmidt
Algorithm........................................................................ 155 Łukasz Ligowski, Witold R. Rudnicki, Yongchao Liu, Bertil Schmidt
Ben Weiss, Mike Bailey
Ali Khajeh-Saeed, J. Blair Perot
Guillaume Rizk, Dominique Lavenier, Sanjay Rajopadhye
Wu-chun Feng, Yong Cao, Debprakash Patnaik, Naren Ramakrishnan
SECTION 3 STATISTICAL MODELING Mike Giles
Thomas Bradley, Jacques du Toit, Robert Tong, Mike Giles, Paul Woodhams
L ´aszl ´o Szirmay-Kalos, Bal ´azs T ´oth, Mil ´an Magdics
Christoph Schied, Johannes Hanika, Holger Dammertz, Hendrik P. A. Lensch
SECTION 4 EMERGING DATA-INTENSIVE APPLICATIONS Volodymyr Kindratenko
Jerod J. Weinman, Augustus Lidaka, Shitanshu Aggarwal
Sergio Herrero-Lopez
Paul Richmond, Daniela Romano
Robin M. Weiss
SECTION 5 ELECTRONIC DESIGN AUTOMATION
Sunil P. Khatri
Debapriya Chatterjee, Andrew DeOrio, Valeria Bertacco
Yifang Liu, Jiang Hu
SECTION 6 RAY TRACING AND RENDERING
Austin Robison
Robert Geist, James Westall
Jan Nov ´ak, Vlastimil Havran, Carsten Dachsbacher
Bernd Bickel, Manuel Lang
Transparency.................................................................... 427 Mengcheng Huang, Fang Liu, Xuehui Liu, Enhua Wu
SECTION 7 COMPUTER VISION
James Fung
P.J. Narayanan, Vibhav Vineet, Timo Stich
Anis Rahman, Dominique Houzet, Denis Pellerin
on Generic Multi-Core Processors and GPGPU.................................. 659 Abderrahim Benquassmi, Eric Fontaine, Hsien-Hsin S. Lee
with CUDA....................................................................... 679 Guillem Pratx, Jing-Yu Cui, Sven Prevrhal, Craig S. Levin
Iterative CT Reconstruction Algorithms......................................... 693 Wei Xu, Klaus Mueller
Inhomogeneity Compensation................................................... 709 Yue Zhuo, Xiao-Long Wu, Justin P. Haldar, Thibault Marin, Wen-mei W. Hwu, Zhi-Pei Liang, Bradley P. Sutton
1 Minimization in 1-SPIRiT Compressed Sensing MRI Reconstruction....... 723Mark Murphy, Miki Lustig
Won-Ki Jeong, Hanspeter Pfister, Massimiliano Fatica
James Shackelford, Nagarajan Kandasamy, Gregory Sharp
Linh Ha, Jens Kr ¨uger, Sarang Joshi, Cl ´audio T. Silva
Visualization in Large-Scale Electron Micrographs............................. 793 Won-Ki Jeong, Hanspeter Pfister, Johanna Beyer, Markus Hadwiger
Transport Algorithm Implemented in CUDA...................................... 813 Andreu Badal, Aldo Badano
Index.............................................................................................. 831
This page intentionally left blank
Reza Farivar, University of Illinois at Urbana-Champaign Vladimir Frolov, NVIDIA Vladimir Glavtchev, BMW Technology Office Kanupriya Gulati, Intel Corporation Trym Vegard Haavardsholm, Norwegian Defense Research Establishment Ken Hawick, University of Auckland, New Zealand Jared Hoberock, NVIDIA Tim Kaldewey, Oracle Vinay Karkala, Advanced Micro Devices Christian Linz, Technical University, Braunschweig Christian Lipski, Technical University, Braunschweig Weiguo Liu, Nanyang Technological University Dave Luebke, NVIDIA W. James MacLean, Google Corey Manders, A*STAR Institute for Infocomm Research Morgan McGuire, Williams College, Massachusetts Derek Nowrouzezahrai, Disney Research Zurich Ming Ouyang, University of Louisville, Kentucky Steven Parker, NVIDIA Kalyan Perumalla, Oak Ridge National Laboratory Nicolas Pinto, Massachusetts Institute of Technology Tobias Preis, Johannes Gutenberg University Ramtin Shams, Australian National University Craig Steffen, University of Illinois at Urbana-Champaign Andrei Tatarinov, NVIDIA Cristina Nader Vasconcelos, Institulo de Computac¸ ˜ao, Universidade Federal Fluminense, Brazil Ben Weiss, Shell and Slate Software Ruediger Westermann, Technical University, Munich Jan Woetzel, MeVis Medical Solutions, AG Kesheng Wu, Berkeley Lab, University of California Ren Wu, HP Labs Weihang Zhu, Lamar University, Texas
Shitanshu Aggarwal, Grinnell College, Iowa (Chapter 19)
Mike Bailey, Oregon State University (Chapter 12)
Andreu Badal, US Food and Drug Administration (CDRH/OSEL/DIAM) (Chapter 50)
Aldo Badano, US Food and Drug Administration (CDRH/OSEL/DIAM) (Chapter 50)
Lorena A. Barba, Boston University (Chapter 9)
Bedˇrich Beneˇs, Purdue University, Indiana (Chapter 35)
Abderrahim Benquassmi, Georgia Institute of Technology (Chapter 41)
Valeria Bertacco, University of Michigan (Chapter 23)
Johanna Beyer, King Abdullah University of Science and Technology (KAUST) (Chapter 49)
Bernd Bickel, Disney Research, Zurich (Chapter 27)
Thomas Bradley, NVIDIA (Chapter 16)
Benjamin Brown, Northeastern University (Chapter 40)
Martin Burtscher, Texas State University, San Marcos (Chapter 6)
Yong Cao, Virginia Tech (Chapter 15)
Debapriya Chatterjee, University of Michigan (Chapter 23)
Yifeng Chen, Peking University (Chapter 39)
Jike Chong, University of California, Berkeley (Chapter 37)
Jing-Yu Cui, Stanford University (Chapter 42)
Xiang Cui, Peking University (Chapter 39)
Carsten Dachsbacher, Karlsruhe Institute of Technology (Chapter 26)
Holger Dammertz, Ulm University (Chapter 18)
Andrew DeOrio, University of Michigan (Chapter 23)
Thierry Deutsch, Laboratoire de Simulation Atomistique (Chapter 10)
Rodrigo Dominguez, Northeastern University (Chapter 40)
Jacques Du Toit, Numerical Algorithms Group (Chapter 16)
Gabriel Falcao, University of Coimbra (Chapter 38)
Massimiliano Fatica, NVIDIA (Chapter 46)
Wu-chu Feng, Virginia Tech and Wake Forest University (Chapter 15)
Eric Fontaine, Georgia Institute of Technology (Chapter 41)
James Fung, NVIDIA (Chapter 36)
Robert Geist, Clemson University (Chapter 25)
Dominique Lavenier, Ecole Normale Sup ´´ erieure de Cachan (Chapter 14)
Hsien-Hsin S. Lee, Georgia Institute of Technology (Chapter 41)
Hendrik Lensch, Ulm University (Chapter 18)
Craig S. Levin, Stanford University (Chapter 42)
Zhi-Pei Liang, University of Illinois at Urbana-Champaign (Chapter 44)
Augustus Lidaka, Grinnell College (Chapter 19)
Łukasz Ligowski, University of Warsaw (Chapter 11)
Fang Liu, Chinese Academy of Sciences (Chapter 28)
Xuehui Liu, Chinese Academy of Sciences (Chapter 28)
Yifang Liu, Texas A&M University (Chapter 24)
Yongchao Liu, Nanyang Technological University (Chapter 11)
Berker Logoglu, Middle East Technical University (Chapter 34)
Nathan Luehr, Stanford University and SLAC National Accelerator Laboratory (Chapter 3)
Miki Lustig, University of California, Berkeley (Chapter 45)
Mil ´an Magdics, Budapest University of Technology and Economics (Chapter 17)
Thibault Marin, Illinois Institute of Technology (Chapter 44)
Todd Martinez, Stanford University and SLAC National Accelerator Laboratory (Chapter 3)
Jean-Franc¸ ois M ´ehaut, Universite Joseph Fourier (Chapter 10)
Hong Mei, Peking University (Chapter 39)
Perhaad Mistry, Northeastern University (Chapter 40)
Richard Moore, Massachusetts General Hospital (Chapter 40)
Keiji Morokuma, Kyoto University (Chapter 5)
Klaus Mueller, State University of New York, Stony Brook (Chapter 43)
Mark Murphy, University of California, Berkeley (Chapter 45)
Pinar Muyan- ¨Ozc¸ elik, University of California, Davis (Chapter 32)
P. J. Narayanan, International Institute of Information Technology Hyderabad (Chapter 29)
Jan Nov ´ak, Karlsruhe Institute of Technology (Chapter 26)
Anton Obukhov, NVIDIA (Chapter 33)
Fatih Omruuzun, Middle East Technical University (Chapter 34)
Matthieu Ospici, Laboratoire d’Informatique de Grenoble (Chapter 10)
Jeffery M. Ota, BMW Group Technology Office (Chapter 32)
John D. Owens, University of California, Davis (Chapter 32)
Vijay S. Pande, Stanford University (Chapter 2) Debprakash Patnaik, Virginia Tech (Chapter 15) Denis Pellerin, GIPSA-lab (Chapter 30) J. Blair Perot, University of Massachusetts, Amherst (Chapter 13) Hanspeter Pfister, Harvard University (Chapters 46 and 49) Keshay Pingali, Texas State University, San Marcos (Chapter 6) Guillem Pratx, Stanford University (Chapter 42) Sven Prevrhal, Philips Healthcare (Chapter 42) Anis Rahman, GIPSA-lab (Chapter 30) Sanjay Rajopadhye, Colorado State University (Chapter 14) Naren Ramakrishnan, Virginia Tech (Chapter 15) Paul Richmond, University of Sheffield (Chapter 21) Guillaume Rizk, Institut de Recherche en Informatique et Syst `emes Al ´eatoires, Universit ´e de Rennes (Chapter 14) Christopher Rodrigues, University of Illinois at Urbana-Champaign (Chapter 4) Daniela Romano, University of Sheffield (Chapter 21) Witold R. Rudnicki, University of Warsaw (Chapter 11) Jan Saam, University of Illinois at Urbana-Champaign (Chapter 1) Karthikeyan Sankaralingam, University of Wisconsin-Madison (Chapter 7) Dana Schaa, Northeastern University (Chapter 40) Christoph Schied, Ulm University (Chapter 18) Bertil Schmidt, Nanyang Technological University (Chapter 11) Klaus Schulten, University of Illinois at Urbana-Champaign (Chapters 1 and 4) James Shackleford, Drexel University (Chapter 47) Gregory Sharp, Massachusetts General Hospital (Chapter 47) John Silberholz, University of Maryland (Chapter 8) Claudio Silva, University of Utah (Chapter 48) Vitor Silva, University of Coimbra (Chapter 38) Matthew D. Sinclair, University of Wisconsin-Madison (Chapter 7) Leonel Sousa, Technical University of Lisbon (Chapter 38) Joe Stam, NVIDIA (Chapter 36) Ondˇrej ˇStava, Purdue University (Chapter 35)
This page intentionally left blank
Introduction
Wen-mei W. Hwu
We are entering the golden age of GPU computing. Since the introduction of CUDA in 2007, more than 100 million computers with CUDA-capable GPUs have been shipped to end users. Unlike the previous GPGPU shader programming models, CUDA supports parallel programming in C. From my own experience in teaching CUDA programming, C programmers can begin to write basic CUDA programs after only attending one lecture and reading one textbook chapter. With such a low barrier of entry, researchers all over the world have been engaged in developing new algorithms and applications to take advantage of the extreme floating point execution throughout these GPUs. Today, there is a large community of GPU computing practitioners. Many of them have reported a 10 to 100 times speedup of their applications with GPU computing. To put this into perspective, with the historical 2X performance growth every 2 years, these researchers are experiencing the equivalent of time travel of 8 to 12 years. That is, they are getting the performance today that they would have to wait for 8 to 12 years if they went for the “free-ride” advancement of performance in microprocessors. Interestingly, such “free ride” advancement is no longer available. Furthermore, once they develop their application in CUDA, they will likely see continued performance growth of 2X for every two years from this day forward. After discussing with numerous researchers, I have reached the conclusion that many of them are solving similar algorithm problems in their programming efforts. Although they are working on diverse applications, they often end up developing similar algorithmic strategies. The idea of GPU Comput- ing Gems is to provide a convenient means for application developers in diverse application areas to benefit from each other’s experience. In this volume, we have collected 50 gem articles written by researchers in 10 diverse areas. Each gems article reports a successful application experience in GPU computing. These articles describe the techniques or “secret sauce” that contributed to the success. The authors highlight the potential applicability of their techniques to other application areas. In our editorial process, we have emphasized the accessibility of these gems to researchers in other areas. When we issued the call for proposals for the first GPU Computing Gems , we received more than 280 submissions, an overwhelming response. After careful review, we accepted 110 proposals that have a high likelihood of making valuable contributions to other application developers. Many high- quality proposals were not accepted because of concerns that they may not be accessible to a large audience. With so many accepted proposals, we were forced to divide these gems into two volumes. This volume covers 50 gems in the application areas of scientific simulation, life sciences, statistical modeling, emerging data-intensive applications, electronic design automation, ray tracing and render- ing, computer vision, video and image processing, signal and audio processing, and medical imaging.