




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
University of Wisconsin - Madison ... The exam is open book and open note. ... make it difficult to determine the data or key used for encryption simply by.
Typology: Lecture notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Last (family) name: _________________________
First (given) name: _________________________
Student I.D. #: _____________________________
Tuesday, October 20, 2009 11:00AM--12:30PM (90 minutes)
exam before starting it.
minutes of the examination. If you need to leave the room before then, ask for permission.
writing and turn your exam face down. I will come by to pick them up.
finished with the exam or by permission of the instructor.
For several problems, other answers may be correct.
Problem Points Score
1 30
2 20
3 28
4 22
Total 100
[1] (30 points) Short answers.
a) (4 points) Explain why a single centralized register file typically becomes inefficient when a VLIW processor has a large number of functional units. What are two techniques for overcoming this inefficiency in VLIW processors?
A centralized register file needs a large number of read and write ports when there are a large number of functional units to prevent it from becoming a bottleneck in the system. However, the large number of read and write ports increases the register file area, delay and power.
Two techniques for overcoming the inefficiencies of large registers files are: (1) Dividing the VLIW architecture into clusters, where each cluster has its own register file and set of functional units. (2) Utilizing data buffers to store temporary results, as was done in AnySP.
b) (4 points) What are two types of security attacks that dynamic voltage and frequency scaling (DVFS) can help prevent? Explain how DVFS can be used to help prevent each type of attack.
DVFS can help prevent timing and power attacks. It can prevent timing attacks by adjusting the frequency in a pseudo-random manner to make it difficult to determine the data or key used for encryption simply by monitoring the time it takes for different encryptions. It can prevent power attacks by adjusting the frequency and/or supply voltage in a pseudo-random manner to make it difficult to determine the data or key used for encryption simply by monitoring the current used for different encryptions.
c) (3 points) What are three ways in which the instruction set architecture (ISA) of programmable digital signal processors typically differs from the ISA of general purpose processors?
Unlike the ISA of general purpose processors, the ISAs of digital signal processors typically: (1) Use specialized addressing modes (2) Have fewer general purpose registers (3) Utilize VLIW architectures
d) (3 points) What are two instruction set architecture features used by processors to help reduce code size? For each feature, list a processor that we have studied in class that utilizes this feature.
Two instruction set architecture features used by processors to help reduce code size are (1) Complex instructions โ such as those utilized by the Sandblaster processor (2) Vector instructions โ such as those utilized by AnySP and Sandblaster
e) (4 points) What are two characteristics of video processing algorithms that the AnySP processor is designed to handle efficiently? For each characteristic, explain how the AnySP processor is designed to handle the characteristic?
Two characteristics of video processing algorithms that the AnySP processor is designed to handle are: (1) Instruction pairs, such as multiply-add and shift-add are frequently executed. To handle this, AnySP uses flexible functional units that allow pairs of instructions to be merged. (2) Many variables have short lifetimes. To take advantage of this, AnySP utilizes buffers that store temporary variables with short lifetimes so that these variables do not need to be read from and written to the register file.
b) ( 2 points) What compression ratio is achieved for the compression described in problem (2a)? Show your work.
Since the table has 32 entries, each symbol in the compressed code should be 5 bits. A total of 7 symbols were needed to transmit the 10 4-bit hexadecimal sequences 0FAF0FAF0f. With LZW, the table is generated on the fly and should not be counted as part of the compressed code size.
Compression ratio = compressed code/uncompressed code = (7 x 5)/(10 x 4) = 35/40 = 87.5%
c) (6 points) If you use Huffman encoding to encode 0FAF0FAF0F, what is the compression ratio? Assume each input symbol is one byte.
With Huffman coding and one byte input symbols, the only input symbols are 0Fand AF. With Huffman encoding, 0F can be encoded as โ1โ and AF can be encoded as โ0โ. Thus transmitting five byes only requires five bits. However, we also need to include the table, which stores AF at location 0 and AF at location 1. This requires a 2-byte table.
Compression ratio = compressed code/uncompressed code = (5 + 2 x 8)/(10 x 4) = 21/40 = 52.5%
d) (4 points) Explain how branch patching works. What are one advantage and one disadvantage that branch patching has compared to using branch tables?
With branch patching, branch instructions are compressed using a special format and a second pass through the compressed code is utilized to patch the branch instructions so that they point to addresses in the compressed code, rather than addresses in the uncompressed code. Compared to using branch tables, branch patching avoids the area, power, and delay needed to store and access the branch table. However, the compression ratio with branch patching may not be as high, since branch instructions are not compressed as much.
[3] (28 points) Models of Computation and VLIW Scheduling
a) (6 points) Draw a dataflow graph for the following computations. Label each variable.
u = a + 3; v = b โ 4; w = c โ 5; x = u + v; y = w * v; z = x * y;
b) (8 points) Write a control data flow graph for the following code.
while (a > 0) { if (a > c) a = c*b; else if (a > b) b = c + 6; else c = c โ b; a = b +7; } d = a + c;
[4] (22 points) Code Generation and Back-end Compilation
a) (8 points) Complete the variable lifetime analysis chart given below. Assume that the program completes at the end of the code.
y1 y2 y3 y4 y5 y6 y7 y y1 = a + 7; y2 = b +4; t = 0
y3 = c + a; y4 = y1*y2; t = 1
y6 = y4 โ y3; y7 = y3 โ y5; t = 3
b) (8 points) For the lifetime analysis chart given in part (4a), draw the conflict graph and provide a register assignment that only uses five registers (r1 through r5) for variables y1 through y8. Assume that a variable cannot be both read and written in the same cycle.
The conflict graph connects variables with non-overlapping lifetimes. Variables with non- overlapping lifetimes can be assigned to the same register. One possible register assignment is: r1 gets y1 and y7, r2 gets y2 and y6, r3 gets y3 and y8, r4 gets y4, r5 gets y5.
c) (6 points) Explain how placement of subroutines can be used to improve energy and performance in embedded systems. How does one determine which subroutines to move and where to move them?
To improve energy and performance, subroutines should be placed such that they will not have cache conflicts with the routines that call them or with other routines that are frequently accessed close in time. To determine which routines to move and where to move them, the code can be profiled with representative inputs. A call graph can then be constructed to identify which routines frequently call other routines and which routines have cache conflicts. The profiling information can then be used to move routines that frequently conflict to other locations so that they no longer conflict. By reducing conflict misses in the instruction cache the energy and performance of the system is improved.