


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A course on High Performance Machine Learning (HPML) that focuses on applying High-Performance Computing (HPC) techniques to ML algorithms. The course covers topics such as PyTorch, CUDA programming, MPI, and DNN architectures (CNN, RNN, LSTM, Attention, Transformers) in Pytorch. The course aims to teach students how to use HPC techniques to find and solve performance bottlenecks, evaluate the performance of different ML software stacks and hardware systems, and develop high-performance distributed ML algorithms. Prerequisites include intermediate programming skills in C/C++ and Python, as well as an understanding of computer architecture, operating systems, and Machine Learning concepts and Neural Networks algorithms.
Typology: Lecture notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



During the past decades, the field of High-Performance Computing (HPC) has been about building supercomputers to solve some of the biggest challenges in science. HPC is where cutting edge technology (GPUs, low latency interconnects, etc.) is applied to the solution of scientific and data-driven problems. One of the key ingredients to the current success of ML is the ability to perform computations on very large amounts of training data. Today, the application of HPC techniques to ML algorithms is a fundamental driver for the progress of Artificial Intelligence. In this course, you will learn HPC techniques that are typically applied to supercomputing software, and how they are applied to obtain the maximum performance out of ML algorithms. You will also learn about techniques for building efficient ML systems. The course is based on PyTorch, CUDA programming, MPI.
At the end of the course, you will be able to: ● Use HPC techniques to find and solve performance bottlenecks ● Do performance measurements and profiling of ML software ● Evaluate the performance of different ML software stacks and hardware systems ● Develop high performance distributed ML algorithms ● Use fast math libraries, CUDA and C++ to accelerate High-Performance ML algorithms ● Model compression
● Knowledge of computer architecture and operating system ● C/C++: intermediate programming skills ● Python: intermediate programming skills. ● Understanding of Machine Learning concepts and Neural Networks algorithms: The course is focused on the system performance rather than the algorithms, and a basic explanation of the algorithms will be part of the course. However, it is strongly recommended to start the course with a good understanding of the following algorithms: logistic regression, feed
Introduction to High Performance Computing for Scientists and Engineers Authors: Georg Hager, Gerhard Wellein Editor: CRC Press ISBN: 9781439811924 Introduction to High Performance Scientific Computing (ONLINE) Authors: Victor Eijkhout with Edmond Chow, Robert van de Geijn
Computer Architecture 5th Edition - A Quantitative Approach Authors: John Hennessy, David Patterson Editor: Morgan Kaufmann ISBN: 9780123838728 Efficient Processing of Deep Neural Networks Authors: Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, Joel Emer Morgan & Claypool Publishers ISBN-13: 978- Topics covered ML/DL and PyTorch basics PyTorch performance Performance optimization in Pytorch Parallel performance modeling Intro to CUDA Math libraries for ML (cuDNN) DNNs architectures (CNN, RNN, LSTM, Attention, Transformers) in Pytorch Intro to MPI Distributed ML Distributed PyTorch algorithms, parallel data loading, and ring reduction Hardware acceleration for ML and AI Quantization and model compression Course Information ● Instructors : Dr. Parijat Dube and Dr. Kaoutar El Maghraoui ● Grading: Homework (50%) + Final Project (20%) + Final Exam (20%) + Quizzes (10%) ● Homework : There will be five homework assignments mostly involving programming and experiments involving GPUs. Assignments will be based on C/C++, Python, and PyTorch ● Course project ○ Project proposals are due by midterm ○ Final presentations of all projects towards the end of the course. Weekly Lesson Plan ● Week-1: Introduction to HPC and ML Course introduction and organization; HPC and ML technology; ML/DL success drivers; HPC for ML; hardware overview: CPUs, accelerators, high speed networks; software overview: algorithms, math libraries, frameworks ● Week-2: ML performance optimization Factors affecting ML performance; software performance optimization for ML; Performance optimization methodology: measurement, analysis, optimization; Measurement: metrics, benchmarking workloads, time/resources, throughput, time to
Determining bit-width; Mixed and varying precision; Quantization: post-training quantization, static vs dynamic quantization, quantization aware training, graph mode quantization; hardware aware quantization ● Week-13: Sparsity and Model Compression Activation sparsity, weight sparsity ; Compression; Sparse Dataflow; Low-rank approximation; Knowledge distillation; Distilled architectures in convolutional and recurrent networks ● Week 14 : Designing Efficient DNNs Improving efficiency in manual network design; Neural architecture search (NAS), hardware-aware NAS; Near memory and In-memory processing; Analog AI