Data Structures for Parallel Computing, Study notes of Data Structures and Algorithms

The paper discusses the challenges of achieving elegance, efficiency, and parallelism in functional programs that manipulate large data structures. It analyzes program examples using three common functional data-structuring approaches and presents I-structures as an alternative. The paper shows elegant, efficient, and parallel solutions for the program examples in Id, a language with I-structures. The parallelism in Id is made precise by means of an operational semantics for Id as a parallel reduction system. The paper concludes by showing that even in the context of purely functional languages, I-structures are invaluable for implementing functional data abstractions.

Typology: Study notes

2022/2023

Uploaded on 05/11/2023

millyx
millyx 🇺🇸

4.7

(9)

249 documents

1 / 44

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
I-Structures: Data Structures for Parallel Computing
Arvind
y
(MIT)
Rishiyur S. Nikhil
y
(MIT)
Keshav K. Pingali
z
(Cornell University)
Abstract
It is dicult to achieve elegance, eciency and parallelism simultaneously in
functional programs that manipulate large data structures. We demonstrate this
through careful analysis of program examples using three common functional
data-structuring approaches| lists using
Cons
and arrays using
Update
(both
ne-grained operators), and arrays using
make array
(a \bulk" operator). We
then present I-structures as an alternative, and show elegant, ecient and parallel
solutions for the program examples in Id, a language with I-structures. The
parallelism in Id is made precise by means of an op erational semantics for Id
as a parallel reduction system. I-structures make the language nonfunctional,
but do not lose determinacy. Finally,weshowthateven in the context of purely
functional languages, I-structures are invaluable for implementing functional data
abstractions.
Categories and Subject Descriptors: D.3.2
Programming Languages
]: Language Classications|
Ap-
plicative languages, Data-ow languages
D.3.3
Programming Languages
]: Language Constructs|
Con-
current programming structures
E.1
Data
]: Data Structures|
Arrays
F.3.2
Logics and Meanings of
Programs
]: Semantics of Programming Languages|
Operational Semantics
General Terms: Languages
Additional Key Words and Phrases: Functional Languages, Parallelism
0
Authors' addresses:
y
MIT Lab. for Computer Science, 545 Technology Square, Cambridge, MA 02139, USA
z
Dept. of Computer Science, 303A Upson Hall, Cornell University, Ithaca, NY 14850, USA
A preliminary version of this paper was published in
Proceedings of the Workshop on Graph Reduction,
Santa Fe, New Mexico, USA, Springer-Verlag LNCS 279
, pages 336-369, September/Octob er 1986.
This researchwas done at the MIT Laboratory for Computer Science. Funding for this project is provided
in part by the Advanced Research Projects Agency of the Department of Defense under the Oce of Naval
Researchcontract N00014-84-K-0099.
Keshav Pingali was also supported byanIBMFaculty DevelopmentAward.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c

Partial preview of the text

Download Data Structures for Parallel Computing and more Study notes Data Structures and Algorithms in PDF only on Docsity!

IStructures Data Structures for Parallel Computing

Arvindy^ MIT Rishiyur S Nikhily^ MIT Keshav K Pingaliz^ Cornell University

Abstract It is dicult to achieve elegance eciency and parallelism simultaneously in functional programs that manipulate large data structures We demonstrate this through careful analysis of program examples using three common functional datastructuring approaches lists using Cons and arrays using Update b oth negrained op erators and arrays using make array a bulk op erator We then present Istructures as an alternative and show elegant ecient and parallel solutions for the program examples in Id a language with Istructures The parallelism in Id is made precise by means of an op erational semantics for Id as a parallel reduction system Istructures make the language nonfunctional but do not lose determinacy Finally we show that even in the context of purely functional languages Istructures are invaluable for implementing functional data abstractions

Categories and Sub ject Descriptors D Programming Languages Language Classications Ap plicative languages Dataow languages  D Programming Languages Language Constructs Con current programming structures  E Data Data Structures Arrays  F Logics and Meanings of Programs Semantics of Programming Languages Operational Semantics

General Terms Languages

Additional Key Words and Phrases Functional Languages Parallelism

(^) Authors addresses y MIT Lab for Computer Science Technology Square Cambridge MA   USA z Dept of Computer Science  A Upson Hall Cornell University Ithaca NY  USA A preliminary version of this pap er was published in Proceedings of the Workshop on Graph Reduction Santa Fe New Mexico USA SpringerVerlag LNCS  pages  Septemb erOctob er  This research was done at the MIT Lab oratory for Computer Science Funding for this pro ject is provided in part by the Advanced Research Pro jects Agency of the Department of Defense under the Oce of Naval Research contract N  K  Keshav Pingali was also supp orted by an IBM Faculty Development Award

Intro duction

There is widespread agreement that only parallelism can bring ab out signicant improve ments in computing sp eed several orders of magnitude faster than to day s sup ercomputers Functional languages have received much attention as appropriate vehicles for programming parallel machines for several reasons They are highlevel declarative languages insulating the programmer from architectural details Their op erational semantics in terms of rewrite rules o ers plenty of exploitable parallelism freeing the programmer from having to iden tify parallelism explicitly They are determinate  freeing the programmer from details of scheduling and synchronization of parallel activities

In this pap er we fo cus on the issue of data structures  We rst demonstrate some diculties in the treatment of data structures in functional languages and then prop ose an alternative called Istructures  Our metho d will b e to take some test applications and compare their solutions using functional data structures and using Istructures We study the solutions from the p oint of view of

eciency amount of unnecessary copying sp eed of access numb er of reads and writes overheads in construction etc parallelism amount of unnecessary sequentialization and ease of co ding

We hop e to show that it is very dicult to achieve all three ob jectives using functional data structures

Since our ideas ab out Istructures evolved in the context of scientic computing most of the discussion will b e couched in terms of arrays ^ All our program examples are written in Id which is a functional language augmented with Istructures It is the language we use in our research on parallel architectures Of course the eciency and parallelism of a program also dep end on the underlying implementation mo del Our ndings are based on our own extensive exp erience with data ow architectures in particular the MIT Tagged Token Data ow Architecture the centerpiece of our research   We have also carefully studied other published implementations of functional languages However it is b eyond the scop e of this pap er to delve into such levels of implementation detail and so we conduct our analyses at a level which do es not require any knowledge of data ow on the part of the reader In Section  we present an abbreviated version of the rewriterule semantics of Id which captures precisely the parallelism of the data ow machine we leave it to the intuition of the reader to follow the purely functional examples prior to that section

While the addition of Istructures takes us b eyond functional languages Id do es not lose any of the prop erties that make functional languages attractive for parallel machines In particular Id remains a higherorder determinate language ie its rewriterule semantics remains con uent In the nal section of the pap er we discuss the implications of such an extension to a functional language We also show that Istructures are not enough there are some applications that are not solved eciently whether we use functional data structures or Istructures This class of applications is a sub ject of current research

 (^) However it would b e erroneous to infer that our conclusions are relevant only to programs with arrays

 Example C Inverse Permutation

This problem was p osed to one of us Arvind by Henk Barendregt and is illustrates the diculties of dealing with computed indices Given a vector B of size n containing a p er mutation of integers n build a new vector A of size n such that

AB i  i

The computation for each of A s comp onents is indep endent of the others This is called an inverse p ermutation b ecause the result A also contains a p ermutation of n and when the op eration is rep eated with A as argument the original p ermutation is returned

 Example D Shared Computation

Build two arrays A and B of size n such that

Ai  f h i B i  g h i

such that the h part of the computation for every i th element of the two arrays is shared

This example illustrates shared computation across arrays Sharing could also o ccur across indices in a single array for example the computations for Ai and Ai   may have a common sub computation And of course in other applications the two typ es of sharing may b e combined

 (^) Here we use juxtap osition to indicate function application notation that is common in functional

languages Application asso ciates to the left so that f x y stands for f x y

 FineGrained Functional Data Structure Op erations

We b egin by lo oking at two datastructuring op erations traditionally found in functional languages In Section   we lo ok at Cons  a pairing op eration and in Section  we lo ok at Update  an op eration that sp ecies a single incremental change in an array We call them negrained op erations b ecause more useful op erations such as a vector sum matrix multiplication etc must b e programmed in terms of a numb er of uses of these primitives

 Cons Simulating Large Data Structures Using Lists

Functional languages have traditionally had a twoplace Cons constructor as a basic data structuring mechanism Given Cons one can of course write suitable array abstractions as a rst step towards solving our examples In this section we quickly reject this as a serious solution

A typical representation for arrays using Cons would b e to maintain an array as a list of elements a matrix as a list of arrays and so on An abstraction for general access to an array comp onent may b e dened as follows

Def select A i  If i   Then hd A Else select tl A i 

Because of the list traversal selection takes O n reads where n is the length of the array

Now consider a vector sum programmed in terms of select

Def vectorsum A B i  If i  n Then nil Else cons select A i select B i vectorsum A B i  

This function p erforms O n^  reads where a corresp onding FORTRAN program would p er form only O n reads

This problem can b e mitigated at the exp ense of ignoring the select abstraction and taking advantage of the underlying list representation so that the listtraversing overhead is not cumulative

Def vectorsum A B  If null A Then nil Else cons hd A hd B vectorsum tl A tl B 

This solution p erforms O n reads though it is still inecient b ecause it is not tailrecursive

Unfortunately every new abstraction must b e carefully reco ded like this b ecause combina tions of given abstractions are not ecient For example

vectorsum A vectorsum B C

creates and traverses an intermediate list unnecessarily

Co ding new abstractions eciently is dicult b ecause the list representation dictates a pre ferred order in which arrays should b e constructed and traversed an order that is extremely

matrix mi ni  mj nj 

update A i j v

A i j

These op erations leave a lot of ro om for cho osing the internal representation of arrays In order to achieve constant time access at the exp ense of O n allo cation and up date we will only lo ok at representations that allo cate arrays as contiguous chunks of memory Other researchers have lo oked at implementations based on trees where selection and up date are b oth O l og n and where it is p ossible to have extensible arrays Ackerman   studied implementations based on binary trees and Thomas   studied implementations based on  trees

But none of these implementations are adequate in of themselves they all involve far to o much unnecessary copying and unnecessary sequentialization as we will demonstrate in the next section Thus they are always considered along with some ma jor compiletime andor runtime optimizations to recoup eciency and parallelism and these are discussed in subsequent sections

 Copying and Sequentialization of Update

A direct implementation of the update A i v op erator would b e

allo cate an array with the same index b ounds as A 

copy all elements from A to the result array except at lo cation i 

store value v in lo cation i of the result array

return the p ointer to the result array

The array selection op eration would simply read a memory lo cation at an appropriate o set from the p ointer to the array argument

Example A will suce to demonstrate that such a direct implementation is grossly inecient Here is a solution that allo cates an array and then uses tail recursion to traverse and ll it with the appropriate contents

A   A  matrix  m  n In traverse A    

Def traverse A i j   nextA  update A i j i j  In If j  n Then traverse nextA i j  Else If i  m Then traverse nextA i   Else nextA  

We use the syntax

 BINDING  BINDING In EXPRESSION 

for blo cks which are like letrec blo cks in other functional languages and follow the usual static scoping rules

We prefer to use the following lo op syntax to express the tailrecursions

 A  matrix  m  n In For i   To m Do Next A  For j   To n Do Next A  update A i j i j Finally A Finally A

In the rst iteration of the inner lo op b o dy the A on the righthand side refers to its value in the surrounding scop e in this case the matrix of nil s allo cated at the top of the blo ck In each iteration of the lo op the phrase Next A binds the value of A for the next iteration The phrase Finally A sp ecies the ultimate value to b e returned at the end of the iteration

There are two ma jor diculties in such a program The rst is its pro igate use of storage It is clear that using a direct implementation of update we would create mn   arrays of which only one the nal one is of interest Each intermediate array carries only incrementally more information than the previous intermediate array

The second criticism of this program is that it overspecies the order of the updates  In the problem sp ecication each element can b e computed indep endently of the others However b ecause of the nature of the update primitive it is necessary for us to chain all the updates involved in pro ducing the nal value into a linear sequence

The necessity to sequentialize the up dates also a ects program clarity adversely it is an extra and unnecessary bit of detail to b e considered by the programmer and reader Consider a solution for the wavefront problem Example B

 A  matrix  m  n In For i   To m Do Next A  For j   To n Do v  If i   or j   Then  Else A i j A i j A i j   Next A  update A i j v Finally A Finally A

It takes some careful thought to convince oneself that the ab ove program is correct that the array selections in computing v actually read previously computed values and not nil  the original contents of A For example if the recurrence had b een sp ecied instead as Aij  Aij   Aij  with appropriate b oundary conditions the programmer would have to realize that the j iteration would have to b e reversed to count down from n to  This is a great departure from the declarative nature of the original recurrence sp ecication

have b een completed b efore executing line  and so the reference count of A in line  should b e  Thus the up date can b e done in place Similarly the up date in line  can also b e done in place But the programmer could easily have written the program with lines  and  exchanged

  vj  A j  In  B  update A i vj  In  vi  A i  In update B j vi 

The reference count of A in line  is no longer b ecause of the outstanding reference in line  and so the up date in line  cannot b e done in place The up date in line  can still b e done in place

Now consider a paral lel op erational semantics for the language A precise example of such a semantics is given in Section  but for now imagine that the bindings of a blo ck can b e executed in parallel with the b o dy with sequencing if any based only on data dep enden cies All four lines of the program are now initiated in parallel Since there are no data dep endencies b etween lines  and  their order of execution is unpredictable Thus static analysis cannot draw any denite conclusions ab out the reference count of A in line 

 Using Subscript Analysis to Increase Parallelism

We have seen that the nature of the update primitive requires the programmer to sequen tialize the sequence of up dates in computing an array Reference count analysis sometimes determines that these up dates may b e done in place

If static analysis could further predict that the subscripts in the sequence of up dates were disjoint then the up dates would commute they could then all b e done in parallel Using such analysis on our program for Example A in Section   the compiler could generate co de to p erform all the mn up dates in parallel

Subscript analysis has b een studied extensively most notably by Kuck et al at the University of Illinois    and Kennedy at Rice University  Most of this work was done in the context of vectorizing compilers for FORTRAN In general this is an intractable problem but in the commonly o ccuring case where the subscripts are of the form ai  b a and b are constants i is a lo op index subscript analysis can reveal parallelism However there is a signicant cost to this analysis b oth in terms of compilation sp eed and in terms of the e ort to develop a compiler

Compared to FORTRAN subscript analysis is on the one hand easier in functional languages due to referential transparency but on the other hand more dicult b ecause of dynamic storage allo cation

An example of a program where subscript analysis cannot extract any useful information is a solution to Example C the Inverse Permutation problem

B   B  array  n In For i   To n Do

Next B  update B A i i Finally B  

In order to parallelize the lo op the compiler needs to know something ab out the contents of A such as that it contains a p ermutation of n This is in general to o much to ask of compiletime analysis This situation is not articial or unusual it o ccurs all the time in practical co des such as in sorting algorithms that avoid copying large elements of arrays by manipulating their indices instead and in Monte Carlo techniques and Random Walks

 Discussion

We hop e we have convinced the reader of the inadequacy of negrained functional data structuring mechanisms such as Cons and Update esp ecially in a parallel environment Some of these problems are solved using the make array primitive discussed in the next section

Writing programs directly in terms of these primitives do es not result in very p erspicuous programs Cons requires the programmer continuously to keep in mind the list representa tion and update requires the programmer to devise a sequential chaining of more abstract op erations In b oth cases it is advisable rst to program some higherlevel abstractions and subsequently to use those abstractions

Both op erators normally involve substantial unnecessary copying of intermediate data struc tures and substantial unnecessary sequentialization It was p ossible to avoid these overheads only when the compiler could b e assured that a reference counts were one and that b the subscripts in a chain of up dates were disjoint^ Automatic detection of these prop erties do es not seem tractable in general

There is a disquieting analogy with FORTRAN here Our functional op erators force over specication of a problem solution and static analysis attempts to relax unnecessary con straints Parallelizing FORTRAN compilers face the same problem alb eit for a di erent reason side e ects

 (^) Originally an Istructure was just a functional data structure with these two prop erties  and not a

separate kind of ob ject with its own op erations

lo cation has presence bits to indicate whether the value is present or absent Istructure storage is discussed in more detail in Section 

Another way to achieve this synchronization is by lazy evaluation the b ounds expression is evaluated rst and storage of the appropriate size is allo cated Each lo cation A i is then loaded with the susp ension for f i and the p ointer to the array is then returned A subsequent attempt to read A i will force evaluation of the susp ension which is then overwritten by the value In general a fundamental activity of lazy evaluators testing an expression to check if it is still a susp ension is really a synchronization test and also needs presence bits although they are not usually referred to with that terminology

This kind of nonstrictness p ermits a pip elined parallelism in that the consumer of an array can b egin work on parts of the array while the pro ducer of the array is still working on other parts Of course even the Cons and Update op erators of Section  could b enet from this typ e of nonstrictness

 Example B Wavefront

A straightforward solution to the wavefront problem is

Def f i j  If i   or j   Then  Else f i j  f i j f i j 

A  makematrix  m  n f 

But this is extremely inecient b ecause f i j is evaluated rep eatedly for each i j not only to compute the i j th comp onent but also during the computation of every comp onent to its right and b elow This is the typical exp onential b ehavior of a recursively dened Fib onacci function

The trick is to recognize that the array is a cache or memo for the function and to use the array itself to access alreadycomputed values This can b e done with a recursive denition for A

Def f X i j  If i   or j   Then  Else X i j X i j X i j 

g  f A  A  makematrix  m  n g

Here the function f is a curried function of two arguments a matrix and a pair of integers By applying it to A g b ecomes a function on a pair of integers which is a suitable argument for make matrix The function g in dening A carries a reference to A itself so that the computation of a comp onent of A has access to other comp onents of A

In order for this to achieve the desired caching b ehavior the language implementation must handle this correctly ie the A used in g must b e the same A pro duced by make matrix and not a new copy of the denition of A

Note that in recurrences like this it will b e imp ossible in general to predict statically in what order the comp onents must b e lled to satisfy the dep endencies and so a compiler cannot always preschedule the computation of the comp onents of an array Thus any implemen tation necessarily must use some of the dynamic synchronization techniques mentioned in Section  This is true even for sequential implementations lazy evaluation is one way to achieve this dynamic synchronization and scheduling

Assuming the implementation handles such recurrences prop erly the main ineciency that remains is that the IfThenElse is executed at every lo cation This problem arises even when there are no recurrences In scientic co des it is quite common to build a matrix with di erent lling functions for di erent regions eg one function for b oundary conditions and another for the interior Even though this structure is known statically make matrix forces the use of a single lling function that by means of a conditional dynamically selects the appropriate function at each index Compare this with the FORTRAN solution that would merely use separate lo ops to ll separate regions

 Example C Inverse Permutation

Unfortunately make array do es not do so well on Example C Recall that B contains a p er mutation of its indices and we need to compute A the inverse p ermutation

Def find B i j  If B j  i Then j Else find B i j  

Def g B i  find B i  

A  makearray  n g B 

The problem is that each g B i that is resp onsible for lling in the i th lo cation of A needs to search B for the lo cation that contains i and this search must b e linear Thus the cost of the program is O n^ 

It is p ossible to use a slightly di erent array primitive to address this problem Consider

makearrayjv l u f

where each f i returns j v so that A j  v ie the lling function f is now resp onsible for computing not only a comp onent value but also its index^ Example C may now b e written

Def g B i  B i i 

A  makearrayjv  n g B 

Of course if B do es not contain a p ermutation of n a runtime error must b e detected either two g B i s will attempt to write the same lo cation or some g B i will attempt to write out of b ounds  (^) We rst heard this solution indep endently from David Turner and Simon Peyton Jones in a slightly

dierent form instead of having a lling function f they prop osed an asso ciationlist of indexandvalue pairs This solution is also mentioned by Wadler in 

For large data structures such as arrays it is obviously not feasible in general to enumerate expressions for all the comp onents as we do with Cons Thus their functional constructors must sp ecify a regular way to generate the comp onents Make array takes a lling parameter f and it sets up n indep endent computations with the i th computation resp onsible for computing and lling the i th lo cation

We saw three problems with this xed control structure The wavefront example showed that when the lling function is di erent for di erent regions of the array they have to b e selected dynamically using a conditional even when the regions are known statically In the inverse p ermutation problem the xed control struture was totally di erent from the desired control structure Finally there was no convenient way to express shared computation b etween the lling computations for two data structures

The variant make array jv achieved some exibility by leaving it up to each of the i computa tions to decide which index j it was resp onsible for However it still did not address the issue of shared computations which could only b e p erformed with the overhead of constructing intermediate arrays or lists In recent corresp ondence with us Phil Wadler has conjectured that using the version of make array jv that uses asso ciation lists of indexandvalue pairs together with his listless transformer  these problems may indeed b e solved without any overhead of intermediate lists We have yet to investigate the viability of this approach

All the examples we have seen are quite small and simple even so we saw that the rst straightforward solution that came to mind was in many cases quite unacceptable and that the programmer would have to think twice to achieve any eciency at all The complications that were intro duced to regain eciency had nothing to do with improving the algorithms they were intro duced to get around language limitations

We are thus p essimistic ab out relying on a xed set of functional data structuring primi tives We have encountered situations where the problems illustrated ab ove do not o ccur in isolation recursive denitions are combined with shared computations across indices and across arrays In these situations writing ecient programs using functional array prim itives has proven to b e very dicult and is almost invariably at the exp ense of program clarity Perhaps with so many researchers currently lo oking at this problem new functional datastructuring primitives will emerge that will allow us to revise our opinion

 IStructures

In the preceding discussion we saw that the source of ineciency is the fact that the various functional primitives imp ose to o rigid a control structure on the computations resp onsible for lling in the comp onents of the data structure Imp erative languages do not su er from this drawback b ecause the allo cation of a data structure variable declaration is decoupled from the llingin of that data structure assignment But imp erative languages with unrestricted assignments complicate parallelism b ecause of timing and determinacy issues Istructures are an attempt to regain that exibility without losing determinacy

In the Section  we present the op erations to create and manipulate Istructures and in Section  we show how to co de the programming examples using Istructures In these sections we rely on informal and intuitive explanations concerning parallelism and eciency

Finally in Section  we make these explanations precise by presenting an op erational semantics for a kernel language with Istructures using a con uent set of rewrite rules This section may b e skipp ed on a rst reading however there are several novel features ab out the rewrite rules not usually found elsewhere in the functional languages literature Even for the functional subset of Id they capture precisely the idea of parallel data ow execution which is parallel and normalizing they describ e precisely what computations are shared  an issue that is often left unsp ecied and nally they are invaluable in developing one s intuitions ab out the readwrite synchronization of parallel data structures b oth functional and otherwise

 I structure op erations

One can think of an Istructure as a sp ecial kind of array each of whose comp onents may b e written no more than once To augment a functional language with Istructures we intro duce three new constructs

 Allo cation

An Istructure is allo cated by the expression

Iarray m n

which allo cates and returns an empty array whose index b ounds are m n Istructures are rstclass values and they can contain other Istructures functions etc We can simulate multidimensional arrays by nesting Istructures but for eciency reasons Id also provides primitives for directly constructing multidimensional Istructures

Imatrix mi ni  mj nj 

and so on

An Istructure selection expression b ecomes an Ifetch request to the Istructure store Every request is accompanied by a tag  which can b e viewed as the name of the contin uation that exp ects the result The controller for the Istructure store checks the empty bit at that lo cation If it is not empty the value is read and sent to the continuation If the lo cation is still empty the controller simply queues the tag at that lo cation

An Istructure assignment statement b ecomes an Istore request to the Istructure store When such a request arrives the controller for the Istructure store checks the empty bit at that lo cation If it is empty the value is stored there the bit is toggled to nonempty  and if any tags are queued at that lo cation the value is also sent to all those continuations If the lo cation is not empty the controller generates a runtime error

 The Programming Examples

Let us now see how our programming examples are expressed in Id with Istructures

 Example A

The rst example is straightforward

 A  Imatrix  m  n  For i   To m Do For j   To n Do A i j  i j  In A 

Recall that the lo op is a paral lel construct so in the ab ove program the lo op b o dies can b e executed in any order sequentially forwards as in FORTRAN or all in parallel or even sequentially backwards

The matrix A may b e returned as the value of the blo ck as so on as it is allo cated Meanwhile m  n lo op b o dies execute in parallel each lling in one lo cation in A Any consumer that tries to read A i j will blo ck until the value has b een stored by the corresp onding lo op b o dy

 Example B Wavefront

 A  Imatrix  m  n  For i   To m Do A i      For j  To n Do A  j     For i  To m Do For j  To n Do A i j  A i j A i j A i j  In A 

The matrix A may b e returned as the value of the expression as so on as it is allo cated Meanwhile al l the lo op b o dies are initiated in parallel but some will b e delayed until the lo op b o dies for elements to their left and top complete Thus a wavefront of pro cesses lls the matrix

Note that we do not pay the overhead of executing an IfThenElse expression at each index as in the functional solution

It is worth emphasizing again that lo ops are parallel constructs In the ab ove example it makes no di erence if we reverse the index sequences

For i  m Downto Do For i  n Downto Do 

The data dep endencies b eing the same the order of execution would b e the same This is certainly not the case in imp erative languages such as FORTRAN

 Example C Inverse Permutation

 A  Iarray  n  For i   To n Do A B i  i  In A 

The array A may b e returned as the value of the expression as so on as it is allo cated Meanwhile all the lo op b o dies execute in parallel each lling in one lo cation If B do es not contain a p ermutation of n then a runtime error will arise either b ecause two pro cesses tried to assign to the same lo cation or b ecause some pro cess tried to write out of b ounds

 Example D Shared Computation

 A  Iarray  n  B  Iarray  n  For i   To n Do z  h i  A i  f z B i  g z  In A B 

The arrays A and B may b e returned as the value of the expression as so on as they are allo cated Meanwhile all the lo op b o dies execute in parallel each lling in two lo cations one in A and the other in B In each lo op b o dy the computation of h i is p erformed only once

 Op erational Semantics for a Kernel Language with I structures

In this section we make the parallelism in the data ow execution of Id more precise First some historical notes For a long time the parallelism of Id was describ ed only in terms