Assignment 4 Solutions - Advanced Computer Architecture - Fall 2006 | ECE 4100, Assignments of Computer Architecture and Organization

Material Type: Assignment; Class: Adv Computer Architecure; Subject: Electrical & Computer Engr; University: Georgia Institute of Technology-Main Campus; Term: Fall 2006;

Typology: Assignments

Pre 2010

Uploaded on 08/05/2009

koofers-user-z60-1
koofers-user-z60-1 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Fall 2006 ECE 4100/6100
Assignment 4
Due Date: 9 pm, Friday December 8st, 2006
For this ass ignment you must pick only one of two optional components. Option 1
below is comprised of three parts. Option 2 is comprised of a programming assignment of
IBM’s Cell processor using a cycle accurate simulator. Note the requirements for Option
2.
Option 1:
Answer all of the following questions.
1. Compute the bisection bandwidth of an N = 2n binary hypercube with single bit
channels. Assume full duplex links each of width 1 bit.
a. Assuming full duplex channels of width W bits in each direction, find the
channel width of a k-ary n-cube that will saturate this bisection width.
b. Assuming L bits/message and switching, routing and wire latency of 1 cycle,
and L a perfect multiple o f W, construct an analytic model of latency, and then
use this expression to find the number of dimensions that will minimize
latency for a fixed bisection bandwidth when using wormho le switching.
Make any assumpt ions you feel you have to.
2. Read the Cell paper “Cell Multiprocessor Network: Built For Speed” (posted on the
class website). From this paper, provide detailed description (not a reproduction of
paragraphs from the paper) of the transfer of blocks of data, in both directions,
between the PowerPC and the SPE co-processor. Your description should be
supported by a figure that has labeled the sequence of steps involved in each transfer.
3. The URL http://www.pcisig.com/home provides information on the PCI Express
Communication protocol. Browse the site, find the relevant documents and compare
and contrast this protocol with the Hypertransport protocol (www.hypertransport.org)
according to the following features
a. The goals of the protocol: What are the anticipated application domains and
what technical needs is the protocol intended to fill.
b. Packet structure: Describe and differentiate the purpose of the various packet
fields. Describe the need for the fields in the context of the preceding bullet.
c. Addressing: Range and type of devices that can be addressed.
d. Physical link and physical link protocol operation.
e. If I had to construct a shared memory, cache coherent multiprocessor,
packaged over many boards in a rack and had to layer a coherency protocol
over a physical layer, which of the above two communication standards
(Hypertransport and PCI Express) would you pick and why. Structure your
answer as a sequence of bullets rather than essay form – be concise.
4. Submit the assignment electronically to the TA.
pf2

Partial preview of the text

Download Assignment 4 Solutions - Advanced Computer Architecture - Fall 2006 | ECE 4100 and more Assignments Computer Architecture and Organization in PDF only on Docsity!

Fall 2006 ECE 4100/

Assignment 4

Due Date: 9 pm, Friday December 8

st

For this assignment you must pick only one of two optional components. Option 1 below is comprised of three parts. Option 2 is comprised of a programming assignment of IBMís Cell processor using a cycle accurate simulator. Note the requirements for Option

Option 1:

Answer all of the following questions.

  1. Compute the bisection bandwidth of an N = 2n^ binary hypercube with single bit channels. Assume full duplex links each of width 1 bit. a. Assuming full duplex channels of width W bits in each direction, find the channel width of a k -ary n -cube that will saturate this bisection width. b. Assuming L bits/message and switching, routing and wire latency of 1 cycle, and L a perfect multiple of W , construct an analytic model of latency, and then use this expression to find the number of dimensions that will minimize latency for a fixed bisection bandwidth when using wormhole switching. Make any assumptions you feel you have to.
  2. Read the Cell paper ìCell Multiprocessor Network: Built For Speedî (posted on the class website). From this paper, provide detailed description (not a reproduction of paragraphs from the paper) of the transfer of blocks of data, in both directions, between the PowerPC and the SPE co-processor. Your description should be supported by a figure that has labeled the sequence of steps involved in each transfer.
  3. The URL http://www.pcisig.com/home provides information on the PCI Express Communication protocol. Browse the site, find the relevant documents and compare and contrast this protocol with the Hypertransport protocol (www.hypertransport.org) according to the following features a. The goals of the protocol: What are the anticipated application domains and what technical needs is the protocol intended to fill. b. Packet structure: Describe and differentiate the purpose of the various packet fields. Describe the need for the fields in the context of the preceding bullet. c. Addressing: Range and type of devices that can be addressed. d. Physical link and physical link protocol operation. e. If I had to construct a shared memory, cache coherent multiprocessor, packaged over many boards in a rack and had to layer a coherency protocol over a physical layer, which of the above two communication standards (Hypertransport and PCI Express) would you pick and why. Structure your answer as a sequence of bullets rather than essay form ñ be concise.
  4. Submit the assignment electronically to the TA.

Fall 2006 ECE 4100/

Option 2:

In this option you will implement the RC5 encryption algorithm on the IBM Cell processor. A few cautionary notes.

  1. You will be responsible for downloading and installing the Cell simulator on your own Linux installation. You will need root access on this machine. Installation and use instructions as well as access to relevant documentation can be found at http://www.ece.gatech.edu/academic/courses/fall2006/ece6100/Lab4/mambo.htm.
  2. For a variety of reasons, there is limited support this semester for helping you debug the installation or your programs (this is why this alternative is optional!).
  3. While all of the documentation is made available, be aware of the potential time commitment.

The assignment itself has the following elements.

  1. Develop an implementation of the RC5 encryption algorithm. The basic algorithm is provided in the paper ìThe RC5 Encryption Algorithmî (paper provided on the class webpage). Note that the purpose of this assignment is to develop some experience in programming the Cell processor. Exact, faithful implementation of key expansion and other activities conformant to the standard are unimportant. It is only the implementation of the algorithm in Sections 4.1 and 4.2 that is required.
  2. Encrypt a vector of 1024 integers using one SPE.
  3. Decrypt the same vector using the second SPE. Compare to ensure your implementation is correct.
  4. You are only required to use PPE! SPE and SPE! PPE communication. You are not required to use SPE! SPE communication (for example to pipe the encryption output to the decryption code).
  5. Submit the following electronically to the TA in a single report a. Cover Sheet: Assignment Title, Name, Class and Date of Submission b. Brief description of your design/implementation (1page) c. Performance (1-2 pages max). i. Compute the speedup of the SPE relative to the PPE. ii. Compute the speedup of encryption and decryption as a function of vector length, 1024 to 4096 elements, d. SPE and PPE Source code