



























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Impact of cache pointers, Impact of Memory reference patterns, Matrix Multiply, Transpose, Memory Mountain Range
Typology: Slides
1 / 35
This page cannot be seen from the preview
Don't miss anything!




























Impact of cache parameters
class20.ppt
CS 213 F’
class20.ppt
t
s
b
Cache (C = S x E x B bytes)
s sets
E blocks/set
(cache line) Cache block
Address space (
n bytes)
Valid bit
data
1 bit
b bytes (block size)
t bitstag
(n = t + s + b bits) Address
CS 213 F’
class20.ppt
(misses/references)fraction of memory references not found in cache
1-2% for L2 5-10% for L1 Typical numbers:
to determine whether the block is in the cache)time to deliver a block in the cache to the processor (includes time
3-8 clock cycles for L2 1 clock cycle for L1 Typical numbers
CS 213 F’
class20.ppt
CS 213 F’
class20.ppt
Write-through or write-back?
Read misses are cheaper. Why?
Simpler to implement.
Requires a write buffer to pipeline writes
Individual writes performed at the processor rate
CS 213 F’
class20.ppt
First access to line not in cache
Also called “Cold start” misses
Active portion of memory exceeds cache size
map to same cache entryActive portion of address space fits in cache, but too many lines
Direct mapped and set associative placement only
CS 213 F’
class20.ppt
Multiply n X n matrices
O(n (^) ) total operations (^3)
/ ijk*
(^) */
for (i=0; i<n; i++)
{
for (j=0; j<n; j++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];
c[i][j] = sum;
}
} / ijk*
(^) */
for (i=0; i<n; i++)
{
for (j=0; j<n; j++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];
c[i][j] = sum;
}
}
Variable
(^) sum
held in register
CS 213 F’
class20.ppt
n n n n n n n l l l l l l l s s s s s s s u u u u u u u q q q q q q q m m m m m m m 50
75
100
125
150
175
200
0 2 4 6 8 10 12 14 16 18 20
mflops (d.p.)
matrix size (n)
n ikj
l kij
s ijk
u jik
q jki
m kji
As matrices grow in size, exceed cache capacity
CS 213 F’
class20.ppt
for (i=0; i<n; i++) / ijk /
for (j=0; j<n; j++)
for (k=0; k<n;sum = 0.0;
(^) k++)
sum += a[i][k]
b[k][j];
c[i][j] = sum;
} for (i=0; i<n; i++)/ ijk /
for (j=0; j<n; j++)
for (k=0; k<n; k++)sum = 0.0; sum += a[i][k]
b[k][j];
c[i][j] = sum;
(i,*)
(*,j)
(i,j)
Inner loop:
wiseColumn-
Row-wise
Fixed
a
b
c
CS 213 F’
class20.ppt
for (j=0; j<n; j++) / jik /
for (i=0; i<n; i++)
for (k=0; k<n; k++)sum = 0.0; sum += a[i][k]
b[k][j];
c[i][j] = sum
} for (j=0; j<n; j++)/ jik /
for (i=0; i<n; i++)
for (k=0; k<n; k++)sum = 0.0; sum += a[i][k]
b[k][j];
c[i][j] = sum
(i,*)
(*,j)
(i,j)
Inner loop:
Row-wise
wiseColumn-
Fixed
a
b
c
CS 213 F’
class20.ppt
for (i=0; i<n; i++) / ikj /
for (k=0; k<n; k++)
for (j=0; j<n;r = a[i][k];
j++)
c[i][j] += r
b[k][j];
(i,*)
(i,k)
(k,*)
Inner loop:
Row-wise
Row-wise
Fixed
a
b
c
CS 213 F’
class20.ppt
for (j=0; j<n; j++) / jki /
for (k=0; k<n; k++)
for (i=0; i<n;r = b[k][j];
i++)
c[i][j] += a[i][k]
r;
(*,j)
(k,j)
Inner loop: (*,k)
wiseColumn -
wiseColumn-
Fixed
a
b
c
CS 213 F’
for (j=0; j<n; j++) {for (i=0; i<n; i++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];
c[i][j] = sum;
}
} ijk (L=2, S=0, MR=1.25)
for (j=0; j<n; j++) { for (i=0; i<n; i++) { for (k=0; k<n; k++)sum = 0.0; sum += a[i][k] * b[k][j];
c[i][j] = sum
}
}
for (k=0; k<n; k++) { for (i=0; i<n; i++) { for (j=0; j<n; j++)r = a[i][k]; c[i][j] += r * b[k][j];
}
}
jik (L=2, S=0, MR=1.25) kij (L=2, S=1, MR=0.5)
for (i=0; i<n; i++) { for (j=0; j<n; j++)r = a[i][k];for (k=0; k<n; k++) { c[i][j] += rb[k][j];*
} } ikj (L=2, S=1, MR=0.5)
for (j=0; j<n; j++) { for (k=0; k<n; k++) { for (i=0; i<n; i++)r = b[k][j]; c[i][j] += a[i][k] * r;
}
} jki (L=2, S=1, MR=2.0)
for (k=0; k<n; k++) { for (j=0; j<n; j++) { for (i=0; i<n; i++)r = b[k][j]; c[i][j] += a[i][k] * r;
} } kji (L=2, S=1, MR=2.0)
CS 213 F’
class20.ppt
n n n n n n
l l l l l l l s s s s s s s u u u u u u u q q q q q q q m m m m m m m 50
75
100
125
150
175
200
0
1
2
3
mflops (d.p.)
matrix size (n)
n ikj
l kij
s ijk
u jik
q jki
m kji