



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
University of Illinois at Urbana-Champaign ... for solving triangular systems of linear equations, IEEE International Parallel and.
Typology: Lecture notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Edgar Solomonik
University of Illinois at Urbana-Champaign
May 16, 2017
”Engineering FLOPs is not a design constraint – data movement presents the most daunting engineering and computer architecture challenge.” – Shalf, Dosanjh, Morrison, VECPAR 2010 numerical computations are prevalent in all data sciences Goal: design fundamental numerical algorithms that achieve better scalability by avoiding data movement
Extend CholeskyQR2 algorithm, obtaining ideal accuracy for well conditioned matrices (κ = O(1/
)), to a general parallel QR algorithm
new practical parallel algorithm reduces bandwidth cost by O(p^1 /^6 ) with respect to best-existing implementation analysis and development by Edward Hutter (BS ECE 2017, starting PhD in CS at UIUC in Fall 2017)
QR and SVD are critical to data-fitting and compression new algorithms for QR factorization and eigenvalue computation for symmetric matrices faster by O(p^1 /^6 ) in communication cost
0
5
10
15
20
144 288 576 1152 2304 4608 9216
Teraflops
#cores
QR weak scaling on Cray XE6 (n=15K to n=131K) Two-Level CAQR-HR Elemental QR ScaLAPACK QR
ongoing work on SVD factorization via QR with pivoting and randomized projections
Ongoing work and future directions in CTF
integration with faster parallel numerical solvers development of new (sparse) tensor applications algebraic multigrid finite/spectral element methods FFT, bitonic sort, parallel scan, HSS matrix computations tensor factorizations and tensor networks existing CTF applications Aquarius (lead by Devin Matthews) QChem via Libtensor (lead by Evgeny Epifanovsky) QBall DFT for metallic systems (lead by Eric Draeger) CC4S (lead by Andreas Gr¨uneis) early collaborations involving Lattice QCD and DMRG faster methods for shortest-path and graph centrality computations