Download Solving Linear Systems using Parallel Gaussian Elimination - Prof. Elise H. Dedoncker and more Study notes Computer Science in PDF only on Docsity!
Thap Panitanarak
CS6260 Spring 2009
Solving Linear System using
Parallel Gaussion Elimination
a0,0x 0 + a0,1x 1 + a0,2x 2 ... + a0,n-1xn-1 = b 0
a1,0x 0 + a1,1x 1 + a1,2x 2 ... + a1,n-1xn-1 = b 1
an-1,0x 0 + an-1,1x 1 + an-1,2x 2 ... + an-1,n-1xn-1 = bn-
Linear System
Linear System or System of Linear Equations
n equations of n variables
ai,j is a coefficient of xj in equation i
Linear System
Matrix representation
a0,0 a0,1 a0,2 ... a0,n-1 x 0 b 0
a1,0 a1,1 a1,2 ... a1,n-1 x 1 = b 1
an-1,0 an-1,1 an-1,2 ... an-1,n-1 xn-1 bn-
A x^ b
Linear System
Matrix representation – Augmented matrix
a0,0 a0,1 a0,2 ... a0,n-1 b 0
a1,0 a1,1 a1,2 ... a1,n-1 b 1
an-1,0 an-1,1 an-1,2 ... an-1,n-1 bn-
[A:b]
Back Substitution
- 1x 0 + 1x 1 - 1x 2 = - - 2x 1 - 3x 2 = - 2x 2 = - 2x 3 =
- 1x 0 + 1x 1 - 1x 2 + 4x 3 = - - 2x 1 - 3x 2 + 1x 3 = - 2x 2 - 3x 3 = - 2x 3 =
- Substitute x 3 = Solve a linear system Tx = c, where T is upper triangular
- Next x 2 and x - 1x 0 = - - 2x 1 = - 2x 2 = - 2x 3 = - 1x 0 + 1x 1 = - - 2x 1 = - 2x 2 = - 2x 3 =
- Substitute x 2 = Back Substitution
- Substitute X 1 = -
- We get x 0 = 3, x 1 = -6, x 2 = 3 and x 3 =
Gaussian Elimination
General Concept
“Well-Known” algorithm for solving the linear system
Ax = b
Reduce Ax = b to Tx = c where T is an upper
triangular matrix
Back substitution Tx = c to solve for x
Row Operations used
Multiply any row with a nonzero constant
Row swap
Add/Subtract one row with another
4x 0 + 6x 1 + 2x 2 - 2x 3 = 8
- 3x 1 - 3x 2 + 2x 3 = 9
- 6x 1 - 6x 2 + 7x 3 = 24
4x 0 + 6x 1 + 2x 2 - 2x 3 = 8 2x 0 + 5x 2 - 2x 3 = 4
- 4x 0 - 3x 1 - 5x 2 + 4x 3 = 1 8x 0 + 18x 1 - 2x 2 + 3x 3 = 40
Gaussian Elimination
Consider the linear system
m1,0 = a1,0/a0,0 = 0.5, m2,0 = a2,0/a0,0 = -1 and m3,0 = a3,0/a0,0 = 2
Lower elements of column 0 are eliminated !!!
- 6x 1 + 2x 2 - 2x 3 = 8 2x 0 + 5x 2 - 2x 3 = 4
- 4x 0 - 3x 1 - 5x 2 + 4x 3 = 1 8x 0 + 18x 1 - 2x 2 + 3x 3 = 40
Gaussian Elimination
Partial Pivoting
Use the row that has the biggest absolute value of a pivot
column as a pivot row (row swap needed)
Also make the computation more accuracy
8x 0 + 18x 1 - 2x 2 + 3x 3 = 40 2x 0 + 5x 2 - 2x 3 = 4
- 4x 0 - 3x 1 - 5x 2 + 4x 3 = 1
Gaussian Elimination
Sequential Algorithm
for i from 0 to n-1 do //Find Pivot Row pmax = 0 for j from i to n-1 do if pmax < |a[j,i]| then pmax = |a[j,i]| prow = j end if end do rswap(i,prow) //Gaussian Elimination for j from i+1 to n-1 do m = a[j,i]/a[i,i] for k from i to n do a[j,k] = a[j,k]-a[i.k]*m end do end do end do
//Back Substitution for i from n-1 to 0 x[i] = a[i,n]/a[i,i] for j from 0 to i-1 do a[i,n] = a[j,n]-x[i]*a[j,i] end do end do
Note: To make row swap moreefficiently, we can also use indirect index loc[i] = j where “physical row”j is indexed by “virtual row” i
- 6x 1 + 2x 2 - 2x 3 = 8 2x 0 + 5x 2 - 2x 3 = 4
- 4x 0 - 3x 1 - 5x 2 + 4x 3 = 1 8x 0 + 18x 1 - 2x 2 + 3x 3 = 40
Gaussian Elimination
Indirect index for row swap
- 6x 1 + 2x 2 - 2x 3 = 8 2x 0 + 5x 2 - 2x 3 = 4
- 4x 0 - 3x 1 - 5x 2 + 4x 3 = 1 8x 0 + 18x 1 - 2x 2 + 3x 3 = 40
Gaussian Elimination
Sequential Algorithm - Analysis
Gaussian elimination with partial pivoting
In iteration k (column k), Finding pivot row step uses (n-k) Elimination step uses (n-k-1)(n-k) n iterations (0 to n-1) n(n+1)/2 + n(n-1)(n+1)/ O(n^3 )
Back substitution
In iteration k, it takes (n-k-1) n iterations n(n-1)/ O(n^2 )
All in O(n^3 )
Gaussian Elimination
Parallel Design
Back Substitution – Algorithm & Analysis
for i from n-1 to 0 do
P(i) computes x[i]
P(i) boardcasts x[i] to others P’s
for j from 0 to i-1 do in parallel
P(j) updates b[j]
end do
end do
P(0) computes x[0]
With n processors, we can achieve O(n*log 2 n) Speedup S = n^2 /nlog 2 n = n/log 2 n Example: n = 16 S = 16/4 = 4
4x 0 + 6x 1 + 2x 2 - 2x 3 = 8 2x 0 + 5x 2 - 2x 3 = 4
- 4x 0 - 3x 1 - 5x 2 + 4x 3 = 1 8x 0 + 18x 1 - 2x 2 + 3x 3 = 40
Gaussian Elimination
Parallel Design
Gaussian elimination with partial pivoting Processors’ asignment (row-wise)
Communication needed to find a pivot row in each iteration All-reduce (pmax, prow) Communication needed to zero off (elimination) – all processors need to know all elements in pivot row Parallel elimination can be done after all processors have pivot row
P 1
P 2
P 3
P 0
All-Reduce Communication
Hypercube (Find MAX)
Gaussian Elimination
Parallel Design
Gaussian elimination with partial pivoting – Algorithm
for i from 0 to n-1 do
//find pivot row
find max(pmax,prow) using all-reduce
P(i) swaps its row (or index) with P(prow)
//elimination
P(i) broadcasts its row to others
for j from i+1 to n-1 do in parallel
P(j) computes m[j]
for k from i to n do
P(j) computes a[j,k]
end do
end do
end do
Question
What is the purpose of all-reduce
communication? Give an example.
At the end of communication, all processors have
the same reduced value. (Min, Max, etc)
References
[1] Seyed H. Roosta, Parallel Processing and
Parallel Algorithms: Theory and Computation,
Springer-Verlag 2000.
[2] Micheal J. Quinn, Parallel Programming in C
with MPI and OpenMP, Tata McGraw-Hill 2003.