









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Various divide-and-conquer algorithms for multiplication and matrix multiplication, including karatsuba multiplication and fast matrix multiplication. It covers recursive calls, additions, shifts, and the use of submatrices. The document also discusses the time complexity and recursion trees of these algorithms.
Typology: Slides
1 / 16
This page cannot be seen from the preview
Don't miss anything!










1
integers, matrices, and polynomials 2
Complex multiplication. ( a + bi ) ( c + di ) = x + yi. Grade-school. x = ac - bd , y = bc + ad. Q. Is it possible to do with fewer multiplications? 4 multiplications, 2 additions 3
Complex multiplication. ( a + bi ) ( c + di ) = x + yi. Grade-school. x = ac - bd , y = bc + ad. Q. Is it possible to do with fewer multiplications? A. Yes. [Gauss] x = ac - bd , y = ( a + b ) ( c + d ) - ac - bd. Remark. Improvement if no hardware multiply. 4 multiplications, 2 additions 3 multiplications, 5 additions
5 Addition. Given two n -bit integers a and b , compute a + b. Grade-school. Θ( n ) bit operations. Remark. Grade-school addition algorithm is optimal.
**1 1 1 0 1
Multiplication. Given two n -bit integers a and b , compute a × b. Grade-school. Θ( n^2 ) bit operations. Q. Is grade-school multiplication algorithm optimal? 1 1 1 0 0 0 1 1 1 0 1 0 1 1 1 0 1 0 1 1 1 1 0 1 00000000 10101010 10101010 10101010 10101010 10101010 00000000 110100000000001 0 1 1 1 1 1 0 0 × 7 To multiply two n -bit integers a and b : Multiply four ½ n -bit integers, recursively. Add and shift to obtain result. Ex.
!
recursive calls 14 2 43 +^ "( n ) add, shift 123 #^ T^ ( n )^ =^ "( n (^2) ) ! a = 2 n^ /^2 " a 1 + a 0 b = 2 n^ /^2 " b 1 + b 0 a b = (^) ( 2 n^ /^2 " a 1 + a 0 ) ( 2 n^ /^2 " b 1 + b 0 ) = 2 n^ " a 1 b 1 + 2 n^ /^2 " (^) ( a 1 b 0 + a 0 b 1 ) + a 0 b 0 a = 10001101 b = 11100001 a 1 a 0 b 1 b 0 8
! T ( n ) = 0 if^ n^ =^0 4 T ( n / 2 ) + n otherwise " # $ n 4(n/2) 16(n/4) 4 k^ (n / 2 k) 4 lg^ n^ (1) T(n) T(n/2) T(n/4) T(n/4) T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(2) T(n / 2 k) T(n/4) T(n/2) T(n/4) T(n/4) T(n/2) T(n/4) T(n/4) T(n/4) ... ... ! T ( n ) = n 2 k k = 0 lg n " = n^2 1 + lg n (^) # 1 2 # 1 $ % & ' ( ) =^2 n^2 #^ n T(n/2) ... ... ... ...
Matrix Multiplication 14 Dot product. Given two length n vectors a and b , compute c = a ⋅ b. Grade-school. Θ( n ) arithmetic operations. Remark. Grade-school dot product algorithm is optimal.
! a " b = ai bi i = 1 n
a = (^) [ .70 .20 .10] b = (^) [ .30 .40 .30] a " b = (. 70 # .30) + (.20 # .40) + (. 10 # .30) =. 15 Matrix multiplication. Given two n -by- n matrices A and B , compute C = AB. Grade-school. Θ( n^3 ) arithmetic operations. Q. Is grade-school matrix multiplication algorithm optimal?
! cij = aik bkj k = 1 n " ! c 11 c 12 L c 1 n c 21 c 22 L c 2 n M M O M cn 1 cn 2 L cnn "
$ $ $ $ % & ' ' ' ' = a 11 a 12 L a 1 n a 21 a 22 L a 2 n M M O M an 1 an 2 L ann "
$ $ $ $ % & ' ' ' ' ( b 11 b 12 L b 1 n b 21 b 22 L b 2 n M M O M bn 1 bn 2 L bnn "
$ $ $ $ % & ' ' ' ' ! .59 .32. .31 .36. .45 .31. "
$ $ $ % & ' ' ' = .70 .20. .30 .60. .50 .10. "
$ $ $ % & ' ' ' ( .80 .30. .10 .40. .10 .30. "
$ $ $ % & ' ' ' 16
! C 11 = A 11 " B 11 + A 12 " B 21 = 0 1 4 5
$^ % & '^ (^ " 16 17 20 21
$^ % & '^ (^
2 3 6 7
$^ % & '^ (^ " 24 25 28 29
$^ % & '^ (^ = 152 158 504 526
$^ % & '^ ( ! 152 158 164 170 504 526 548 570 856 894 932 970 1208 1262 1316 1370 "
$ $ $ $ % & ' ' ' ' = 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 "
$ $ $ $ % & ' ' ' ' ( 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 "
$ $ $ $ % & ' ' ' ' C 11^ A^11 A^12 B^11 B 11
17
To multiply two n -by- n matrices A and B : Divide: partition A and B into ½ n -by-½ n blocks. Conquer: multiply 8 pairs of ½ n -by-½ n matrices, recursively. Combine: add appropriate products using 4 matrix additions. !
T ( n ) = 8 T (^) ( n / (^2) ) recursive calls
18
Key idea. multiply 2 -by- 2 blocks with only 7 multiplications. 7 multiplications. 18 = 8 + 10 additions and subtractions.
! C 11 C 12 C 21 C 22 " #^ $ % &^ '^ = A 11 A 12 A 21 A 22 " #^ $ % &^ '^ ( B 11 B 12 B 21 B 22 " #^ $ % &^ ' 19
To multiply two n -by- n matrices A and B : [Strassen 1969] Divide: partition A and B into ½ n -by-½ n blocks. Compute: 14 ½ n -by-½ n matrices via 10 matrix additions. Conquer: multiply 7 pairs of ½ n -by-½ n matrices, recursively. Combine: 7 products into 4 terms using 8 matrix additions. Analysis. Assume n is a power of 2. T ( n ) = # arithmetic operations. !
recursive calls 14 2 43 +^ "( n (^2) ) add, subtract 1 42 43 #^ T^ ( n )^ =^ "( n log (^2 7) ) = O ( n 2.81 (^) ) 20
Implementation issues. Sparsity. Caching effects. Numerical stability. Odd matrix dimensions. Crossover to classical algorithm around n = 128. Common misperception. “Strassen is only a theoretical curiosity.” Apple reports 8 x speedup on G4 Velocity Engine when n ≈ 2,500. Range of instances where it's useful is a subject of controversy. Remark. Can "Strassenize" Ax = b , determinant, eigenvalues, SVD, ….
25
Sinusoids. Sum of sine an cosines. Sinusoids. Sum of complex exponentials. eix^ = cos x + i sin x Euler's identity 26
Signal. [touch tone button 1] Time domain. Frequency domain. ! y ( t ) = 12 sin( 2 " # 697 t ) + 12 sin( 2 " # 1209 t ) Reference: Cleve Moler, Numerical Computing with MATLAB frequency (Hz) amplitude 0. time (seconds) pressure^ sound 27
Signal. [recording, 8192 samples per second] Magnitude of discrete Fourier transform. Reference: Cleve Moler, Numerical Computing with MATLAB 28
FFT. Fast way to convert between time-domain and frequency-domain. Alternate viewpoint. Fast way to multiply and evaluate polynomials. If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. -Numerical Recipes we take this approach
29
Applications. Optics, acoustics, quantum physics, telecommunications, radar, control systems, signal processing, speech recognition, data compression, image processing, seismology, mass spectrometry… Digital media. [DVD, JPEG, MP3, H.264] Medical diagnostics. [MRI, CT, PET scans, ultrasound] Numerical solutions to Poisson's equation. Shor's quantum factoring algorithm. … The FFT is one of the truly great computational developments of [the 20th] century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. -Charles van Loan 30
Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres. Runge-König (1924). Laid theoretical groundwork. Danielson-Lanczos (1942). Efficient algorithm, x-ray crystallography. Cooley-Tukey (1965). Monitoring nuclear tests in Soviet Union and tracking submarines. Rediscovered and popularized FFT. Importance not fully realized until advent of digital computers. 31
Polynomial. [coefficient representation] Add. O ( n ) arithmetic operations. Evaluate. O ( n ) using Horner's method. Multiply (convolve). O ( n^2 ) using brute force.
A ( x ) = a 0 + a 1 x + a 2 x^2 +L+ an " 1 xn "^1 ! B ( x ) = b 0 + b 1 x + b 2 x^2 +L+ bn " 1 xn "^1 ! A ( x ) + B ( x ) = ( a 0 + b 0 ) + ( a 1 + b 1 ) x +L+ ( an " 1 + bn " 1 ) xn "^1 ! A ( x ) = a 0 + ( x ( a 1 + x ( a 2 +L+ x ( an " 2 + x ( an " 1 ))L)) ! A ( x ) " B ( x ) = ci xi i = 0 2 n # 2 $ ,^ where^ ci =^ a^ j bi # j j = 0 i $ 32
"New Proof of the Theorem That Every Algebraic Rational Integral Function In One Variable can be Resolved into Real Factors of the First or the Second Degree."
37
Point-value ⇒ coefficient. Given n distinct points x 0 , ... , xn -1 and values y 0 , ... , yn -1, find unique polynomial a 0 + a 1 x + ... + an -1 xn -1, that has given values at given points. Running time. O ( n^3 ) for Gaussian elimination.
y 0 y 1 y 2 M yn " 1
1 x 0 x 02 L x 0^ n "^1 1 x 1 x 12 L x 1^ n "^1 1 x 2 x 22 L x 2^ n "^1 M M M O M 1 xn " 1 xn^2 "^1 L xn^ n "" 11
a 0 a 1 a 2 M an " 1
Vandermonde matrix is invertible iff xi distinct or O ( n 2.376) via fast matrix multiplication 38
Decimation in frequency. Break up polynomial into low and high powers. A ( x ) = a 0 + a 1 x + a 2 x^2 + a 3 x^3 + a 4 x^4 + a 5 x^5 + a 6 x^6 + a 7 x^7_._ Alow ( x ) = a 0 + a 1 x + a 2 x^2 + a 3 x^3_._ Ahigh ( x ) = a 4 + a 5 x + a 6 x^2 + a 7 x^3_._ A ( x ) = Alow ( x ) + x^4 Ahigh ( x ). Decimation in time. Break polynomial up into even and odd powers. A ( x ) = a 0 + a 1 x + a 2 x^2 + a 3 x^3 + a 4 x^4 + a 5 x^5 + a 6 x^6 + a 7 x^7_._ Aeven ( x ) = a 0 + a 2 x + a 4 x^2 + a 6 x^3_._ Aodd ( x ) = a 1 + a 3 x + a 5 x^2 + a 7 x^3_._ A ( x ) = Aeven ( x^2 ) + x Aodd ( x^2 ). 39
Coefficient ⇒ point-value. Given a polynomial a 0 + a 1 x + ... + an -1 xn -1, evaluate it at n distinct points x 0 , ..., xn -1. Divide. Break polynomial up into even and odd powers. A ( x ) = a 0 + a 1 x + a 2 x^2 + a 3 x^3 + a 4 x^4 + a 5 x^5 + a 6 x^6 + a 7 x^7_._ Aeven ( x ) = a 0 + a 2 x + a 4 x^2 + a 6 x^3_._ Aodd ( x ) = a 1 + a 3 x + a 5 x^2 + a 7 x^3_._ A ( x ) = Aeven ( x^2 ) + x Aodd ( x^2 ). A ( -x ) = Aeven ( x^2 ) - x Aodd ( x^2 ). Intuition. Choose two points to be ±1. A ( 1) = Aeven (1) + 1 Aodd (1). A (-1) = Aeven (1) - 1 Aodd (1). (^) Can evaluate polynomial of degree ≤ n at 2 points by evaluating two polynomials of degree ≤ ½ n at 1 point. we get to choose which ones! 40
Coefficient ⇒ point-value. Given a polynomial a 0 + a 1 x + ... + an -1 xn -1, evaluate it at n distinct points x 0 , ..., xn -1. Divide. Break polynomial up into even and odd powers. A ( x ) = a 0 + a 1 x + a 2 x^2 + a 3 x^3 + a 4 x^4 + a 5 x^5 + a 6 x^6 + a 7 x^7_._ Aeven ( x ) = a 0 + a 2 x + a 4 x^2 + a 6 x^3_._ Aodd ( x ) = a 1 + a 3 x + a 5 x^2 + a 7 x^3_._ A ( x ) = Aeven ( x^2 ) + x Aodd ( x^2 ). A ( -x ) = Aeven ( x^2 ) - x Aodd ( x^2 ). Intuition. Choose four complex points to be ±1, ± i. A (1) = Aeven ( 1 ) + 1 Aodd (1). A (-1) = Aeven ( 1 ) - 1 Aodd ( 1 ). A ( i ) = Aeven (-1) + i Aodd (-1). A ( -i ) = Aeven (-1) - i Aodd (-1). Can evaluate polynomial of degree ≤ n at 4 points by evaluating two polynomials of degree ≤ ½ n at 2 points. we get to choose which ones!
41
Coefficient ⇒ point-value. Given a polynomial a 0 + a 1 x + ... + an -1 xn -1, evaluate it at n distinct points x 0 , ..., xn -1. Key idea. Choose xk = ω k^ where ω is principal nth^ root of unity. DFT ! y 0 y 1 y 2 y 3 M yn " 1
1 )^1 )^2 )^3 L ) n "^1 1 )^2 )^4 )^6 L )^2 ( n "^1 ) 1 )^3 )^6 )^9 L )3( n "^1 ) M M M M O M 1 ) n "^1 )^2 ( n "^1 )^ )3( n "^1 )^ L )( n "^1 )( n "^1 )
a 0 a 1 a 2 a 3 M an " 1
Fourier matrix Fn 42
Def. An nth^ root of unity is a complex number x such that xn^ = 1. Fact. The nth^ roots of unity are: ω^0 , ω^1 , …, ω n -1^ where ω = e^2 π^ i / n. Pf. (ω k ) n^ = ( e^2 π^ i k / n ) n^ = (e π^ i^ ) 2 k^ = (-1) 2 k^ = 1. Fact. The ½ nth^ roots of unity are: ν^0 , ν^1 , …, ν n /2-1^ where ν = ω^2 = e^4 π^ i / n. ω^0 = ν^0 = 1 ω^1 ω^2 = ν^1 = i ω^3 ω^4 = ν^2 = - ω^5 ω^6 = ν^3 = - i ω^7 n = 8 43
Goal. Evaluate a degree n -1 polynomial A ( x ) = a 0 + ... + an -1 xn -1^ at its nth^ roots of unity: ω^0 , ω^1 , …, ω n -1. Divide. Break up polynomial into even and odd powers. Aeven ( x ) = a 0 + a 2 x + a 4 x^2 + … + an -2 x n /2 - 1_._ Aodd ( x ) = a 1 + a 3 x + a 5 x^2 + … + an -1 x n /2 - 1_._ A ( x ) = Aeven ( x^2 ) + x Aodd ( x^2 ). Conquer. Evaluate Aeven ( x ) and Aodd ( x ) at the ½ nth roots of unity: ν^0 , ν^1 , …, ν n /2-1. Combine. A ( ω k ) = Aeven ( ν k ) + ω k^ Aodd ( ν k ), 0 ≤ k < n / A ( ω k+^ ½n ) = Aeven ( ν k ) – ω k^ Aodd ( ν k ), 0 ≤ k < n / ω k +^ ½ n^ = - ω k ν k^ = (ω k^ )^2 ν k^ = (ω k^ +^ ½ n^ )^2 44 fft(n, a 0 ,a 1 ,…,an-1) { if (n == 1) return a 0 (e 0 ,e 1 ,…,en/2-1) ← FFT(n/2, a 0 ,a 2 ,a 4 ,…,an-2) (d 0 ,d 1 ,…,dn/2-1) ← FFT(n/2, a 1 ,a 3 ,a 5 ,…,an-1) for k = 0 to n/2 - 1 { ω k^ ← e^2 π ik/n yk+n/2 ← ek + ω k^ dk yk+n/2 ← ek - ω k^ dk } return (y 0 ,y 1 ,…,yn-1) }
49
Claim. Fn and Gn are inverses. Pf. Summation lemma. Let ω be a principal nth^ root of unity. Then Pf. If k is a multiple of n then ω k^ = 1 ⇒ series sums to n. Each nth^ root of unity ω k^ is a root of xn^ - 1 = ( x - 1) (1 + x + x^2 + ... + xn -1). if ω k^ ≠ 1 we have: 1 + ω k^ + ω k (2)^ + … + ω k ( n -1)^ = 0 ⇒ series sums to 0. ▪
" k^ j j = 0 n # 1 $ = n if k % 0 mod n 0 otherwise
(^ Fn Gn ) (^) k k "=^1 n
j = 0 n $ 1 % =^
n #( k $^ k^ ")^ j j = 0 n $ 1 % =^ 1 if k = k " 0 otherwise
summation lemma 50
ifft(n, a 0 ,a 1 ,…,an-1) { if (n == 1) return a 0 (e 0 ,e 1 ,…,en/2-1) ← FFT(n/2, a 0 ,a 2 ,a 4 ,…,an-2) (d 0 ,d 1 ,…,dn/2-1) ← FFT(n/2, a 1 ,a 3 ,a 5 ,…,an-1) for k = 0 to n/2 - 1 { ω k^ ← e-2 π ik/n yk+n/2 ← (ek + ω k^ dk) / n yk+n/2 ← (ek - ω k^ dk) / n } return (y 0 ,y 1 ,…,yn-1) } 51
Theorem. Inverse FFT algorithm interpolates a degree n -1 polynomial given values at each of the nth^ roots of unity in O ( n log n ) steps. assumes n is a power of 2 ! a 0 , a 1 , K, an - 1 ! ( " 0 , y 0 ), K, ( " n #^1 , yn # 1 ) O ( n log n ) coefficient representation O ( n log n ) (^) point-value representation 52
Theorem. Can multiply two degree n -1 polynomials in O ( n log n ) steps. ! a 0 , a 1 , K, an - 1 b 0 , b 1 , K, bn - 1 ! c 0 , c 1 , K, c 2 n - 2 ! A ( " 0 ), ..., A ( " 2 n #^1 ) B ( " 0 ), ..., B ( " 2 n #^1 ) ! C ( " 0 ), ..., C ( " 2 n #^1 ) O ( n ) point-value multiplication 2 FFTs O ( n log n ) inverse FFT O ( n log n ) coefficient representation (^) coefficient representation pad with 0 s to make n a power of 2 docsity.com
53
54
Fastest Fourier transform in the West. [Frigo and Johnson] Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendor-tuned code. Implementation details. Instead of executing predetermined algorithm, it evaluates your hardware and uses a special-purpose compiler to generate an optimized algorithm catered to "shape" of the problem. Core algorithm is nonrecursive version of Cooley-Tukey. O ( n log n ), even for prime sizes. Reference: http://www.fftw.org 55
Integer multiplication. Given two n bit integers a = an-1 … a 1 a 0 and b = bn-1 … b 1 b 0 , compute their product a ⋅ b. Convolution algorithm. Form two polynomials. Note: a = A (2) , b = B (2). Compute C ( x ) = A ( x ) ⋅ B ( x ). Evaluate C (2) = a ⋅ b. Running time: O(n log n) complex arithmetic operations. Theory. [Schönhage-Strassen 1971] O ( n log n log log n ) bit operations. Theory. [Fürer 2007] O( n log n 2 O(log * n )) bit operations.
A ( x ) = a 0 + a 1 x + a 2 x^2 +L+ an " 1 xn "^1 ! B ( x ) = b 0 + b 1 x + b 2 x^2 +L+ bn " 1 xn "^1 56
Integer multiplication. Given two n bit integers a = an-1 … a 1 a 0 and b = bn-1 … b 1 b 0 , compute their product a ⋅ b. Practice. [GNU Multiple Precision Arithmetic Library] It uses brute force, Karatsuba, and FFT, depending on the size of n. "the fastest bignum library on the planet"
61
Period finding. Theorem. [Euler] Let p and q be prime, and let N = p q. Then, the following sequence repeats with a period divisible by ( p -1) ( q -1): Consequence. If we can learn something about the period of the sequence, we can learn something about the divisors of ( p -1) ( q -1). by using random values of x , we get the divisors of ( p -1) ( q -1), and from this, can get the divisors of N = p q 2 i 1 2 4 8 16 32 64 128 … 2 i^ mod 15 1 2 4 8 1 2 4 8 … 2 i^ mod 21 1 2 4 8 16 11 1 2 … period = 4 period = 6 x mod N , x^2 mod N , x^3 mod N , x^4 mod N , … Extra Slides 63
! y = Fn a = In / 2 Dn / 2 In / 2 " Dn / 2
$ % & ' ( Fn / 2 aeven Fn / 2 aodd
$ % & ' ! (
! Fn = 1 1 1 1 L 1 1 "^1 "^2 "^3 L " n #^1 1 "^2 "^4 "^6 L "^2 ( n #^1 ) 1 "^3 "^6 "^9 L "^3 ( n #^1 ) M M M M O M 1 " n #^1 "^2 ( n #^1 )^ "^3 ( n #^1 )^ L "( n #^1 )( n #^1 ) $ % & & & & & & & ' (
a = a 0 a 1 a 2 a 3
docsity.com