






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The optimization of the compress algorithm's memory cache usage. The study focuses on the hash table accesses, which have little reuse in current caches, and suggests a solution to bypass the cache for infrequently accessed data. The document also introduces the concept of memory macroblocks and their usage in caching decisions.
Typology: Study notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Case Study: 026.compress
while ( (c = getchar()) != EOF ) { in_count++; fcode = (long) (((long) c << maxbits) + ent); i = ((c << hshift) ^ ent); /* xor hashing / if ( htabof (i) == fcode ) { ent = codetabof (i); continue; } else if ( (long)htabof (i) < 0 ) / empty slot/ goto nomatch; disp = hsize_reg - i; / secondary hash */ if ( i == 0 ) disp = 1; probe: if ( (i -= disp) < 0 ) i += hsize_reg; if ( htabof (i) == fcode ) { ent = codetabof (i);
continue; } if ( (long)htabof (i) > 0 ) goto probe; nomatch: output ( (code_int) ent ); out_count++; ent = c; if ( free_ent < maxmaxcode ) { codetabof (i) = free_ent++; /* code -> hashtable */ htabof (i) = fcode; } else if ( (count_int)in_count >= checkpoint && block_compress ) cl_block (); }
Htab Memory Access Distribution
Cycle
Address (offset from 1G)
htab load hits htab load misses
Compress Cache Bypass
data should bypass cache
track of usage patterns
Infrequently Accessed RegFile Frequently Accessed
Memory Macroblocks
accessing behavior
cache blocks into larger
regions called macroblocks
Main Memory
MAT Bypassing Operation
lookup counter in MAT
value
A
A
B
B ctr
++ctr ctr
Reg File
Infrequently Accessed Frequently Accessed
MAT Bypassing Operation (cont.)
normally
counter for cache block that
would be replaced
set-associative bypass buffer
A
A
B
B ctr2- -
ctr ctr CMP
Reg File
Infrequently Accessed Frequently Accessed
026.compress
072.sc099.go 147.vortex
Pcode
lmdes2_customizer
085.cc
130.li 134.perl 124.m88ksim
wordexcelphoto
Benchmark
% Improvement over Base
Sampling No Sampling
Performance Improvement
L1 Hit Ratios
026.compress
072.sc099.go 147.vortex
Pcode
lmdes2_customizer
085.cc
130.li 134.perl 124.m88ksim
wordexcelphoto
Benchmark
L1 Hit Ratio
Base Simulator (1024-byte macroblocks) Upper Bound
Example: 085.gcc Routine
int rtx_renumbered_equal_p (rtx x, rtx y)
to y->code?
Cache Organization Alternative:
Variable Fetch: 8-byte lines & 32-byte
virtual lines
8 bytes
.. .
A
B
.. .
miss C
8 bytes
.. .
A
C
B
.. .
hit B+
hit C
hit A
B+
cache cache
L1 Data Cache
8-byte blocks
MAT
SLDT
fetch size?
hit? (^) spatial reuse?
update sctr with hit and spatial reuse results
Memory Address
tag sz vc sr SLDT entry format
(^01) 00
fi bit
Cache Access SLDT Access fi sz vc Result Result Value Value Value Action miss hit - 0 sr = 1; sctr++
tag sz vc sr
SLDT entry:
compress
072.sc
go
147.vortex
Pcode
lmdes2_customizer
gcc 130.li 134.perl 124.m88ksim
word excelphoto
Benchmark
% Improvement over 64-byte L2 Lines with L1 Varying Fetches
32-byte L2 lines (L1 vary fetch) 128-byte L2 lines (L1 vary fetch) 256-byte L2 lines (L1 vary fetch) L2 vary fetch (1-bit sctr) L2 vary fetch (4-bit sctr)
L2 Varying Fetch Sizes
compress
072.sc
go
147.vortex
Pcode
lmdes2_customizer
gcc 130.li 134.perl 124.m88ksim
wordexcelphoto
Benchmark
% Improvement over Base
(sampling) L1/L2 varying fetch (4-bit sctrs) (no sampling) L1/L2 varying fetch (4-bit sctrs)
Performance Improvements
085.gcc Example Revisited
selecting the correct fetch size for the example
Related Work
techniques for numeric codes
prefetching of pointer targets [MeHa95][LuMo96]
[LiSKR95]
coherence traffic in multiprocessors
with long lines for spatial data [GoAV95][MiMTT96]