Application Performance on the MIT Alewife Multiprocessor | Study notes Computer Architecture and Organization

Application Performance on the

MIT Alewife Multiprocessor

Frederic T. Chong

, Beng-Hong Lim

, Ricardo Bianchini

,JohnKubiatowicz



, and Anant Agarwal



Dept. of Computer Science, University of California at Davis

IBM T.J. Watson ResearchCenter

COPPE Systems Engineering, UFRJ/Brazil



Lab. for Computer Science, Massachusetts Institute of Technology

Abstract

This study reports on the performance of several application s on the Alewife machine, focus-

ing on emerging applications and evolving architectural mechanisms. It shows that low-latency

cache miss handling mechanisms for

both

local and remote accesses in Alewife make these emerg-

ing applications viable candidates for shared-memory parallel processing. The results show that

ecient shared memory is an excellentcommunication mechanism, even for ne-grain appli-

cations that do not re-use data. Such applications are thoughttofavor message-passing. As

expected, traditional coarse-grain applications p erform well with Alewife's mechanisms. The

results also conrm that hardware support for limited sharing is adequate for a broad range of

applications, even on large numbers of processors. Additionally, modeling local cache-miss be-

havior is important for machines such as Alewife, where remote-miss latencies are only ve times

longer than local miss latencies. Weintroduce twonovel performance metrics that account for

the eect of local misses and are more accurate than previously proposed metrics. We conclude

that most applications perform well on Alewife. In particular, ne-grain applications can take

advantage of Alewife's high integration and eciency to achiev

e a new level of performance on

scalable shared-memory machines.

Keywords:

distributed shared memory,multiprocessor, performance metrics, applications,

ne grain

1 Introduction

Developments in the architecture of parallel machines inuence the evolution of the structure of

parallel programs emerging parallel applications, in turn, impact the future directions in parallel

machine architecture. Benchmark suites and architectural mechanisms constantly evolve from the

dynamics of the architecture-applications symbiosis.

This study reports on the performance of several applications on the Alewife machine ABC

95]

(see Sidebar A), focusing on ne-grain applications and evolving architectural mechanisms. The

results show that low-latency miss-handling mechanisms for both local and remote accesses in Alewife

make ne-grain applications viable candidates for shared-memory parallel processing. We discover

that ecient shared memory is an excellent communication mechanism for ne-grain applications,

even without data re-use. This is a very interesting result, given that such applications havelong

been thoughttofavor message passing over shared memory.

Not surprisingly, Alewife's mechanismsallow traditional coarse-grain applications from the SPLASH

and NAS benchmark suites to perform well. The results conrm that hardware support for limited

sharing is adequate for a broad range of applications, even on large numbers of processors. Local

cache miss behavior turns out to be importantonmultiprocessors with low remote miss latencies.

To account for the eect of local misses, weintroduce two performance metrics that provide more

accurate and revealing results for Alewife than previously proposed metrics.

Application Performance on the MIT Alewife Multiprocessor, Study notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Application Performance on the MIT Alewife Multiprocessor and more Study notes Computer Architecture and Organization in PDF only on Docsity!