Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Stream Processor Architecture and Programming Languages, Lecture notes of Programming Languages

Stanford University Programming Languages

A lecture note from Stanford University on Stream Processor Architecture and Programming Languages. The lecture covers the concept of stream programming languages, examples of stream programming languages, issues on kernels, and the purpose of Brook. The note also includes opinions about programming in StreamC/KernelC and a project proposal. useful for students who are interested in computer architecture and programming languages.

Typology: Lecture notes

Pre 2010

Uploaded on 05/11/2023

brittani 🇺🇸

4.7

(30)

287 documents

1 / 10

This page cannot be seen from the preview

Don't miss anything!

EE482C: Advanced Computer Organization Lecture #9

Stream Processor Architecture

Stanford University Thursday, 9 May 2002

Stream Programming Languages/Brook Tutorial

Lecture #9: Thursday, 2 May 2002

Lecturer: Prof. Bill Dally

Scribe: Alex Solomatnikov and Jae-Wook Lee

Reviewer: Mattan Erez

There is one handout for StreaMIT today.

Opinions about programming in Stream C/Kernel C:

1) Too many ”undocumented features”.

2) No constant folding. Need better C front-end.

Project proposal is due on Tuesday, May 7. Keep it short into the point.

1 What is a Stream Programming Language?

Stream programming language targets to achieve two goals at the same time - efficiency and

convinience in writing a streaming application. Since communication overhead dominates over

that of computation in a streaming program, a stream processor exposes the communcation

explicitly into the upper layer so that software handles it to maximize performance.

There are some examples of stream programming languates that have been developed

and/or are being developed.



StreamC/KernelC A language specific to a single machine - Imagine at Stanford



StreaMIT A language intended for the MIT RAW machine but not machine specific



Brook 2



generation language based on the Imagine concept of streams, but machine-

independent

2 Issues on Kernels

2.1 Implicit vs Explicit

In C code of FIR filter it is not easy to figure out what is the input stream and what is the

output stream. On the other hand, position of elements within the stream should be specified

explicitly. Here is C code:

Discover Lecture notes of Programming Languages Stanford University

Partial preview of the text

Download Stream Processor Architecture and Programming Languages and more Lecture notes Programming Languages in PDF only on Docsity!

EE482C: Advanced Computer Organization Lecture # Stream Processor Architecture Stanford University Thursday, 9 May 2002

Stream Programming Languages/Brook Tutorial

Lecture #9: Thursday, 2 May 2002 Lecturer: Prof. Bill Dally Scribe: Alex Solomatnikov and Jae-Wook Lee Reviewer: Mattan Erez

There is one handout for StreaMIT today. Opinions about programming in Stream C/Kernel C:

Too many ”undocumented features”.
No constant folding. Need better C front-end. Project proposal is due on Tuesday, May 7. Keep it short into the point.

1 What is a Stream Programming Language?

Stream programming language targets to achieve two goals at the same time - efficiency and convinience in writing a streaming application. Since communication overhead dominates over that of computation in a streaming program, a stream processor exposes the communcation explicitly into the upper layer so that software handles it to maximize performance. There are some examples of stream programming languates that have been developed and/or are being developed.

StreamC/KernelC A language specific to a single machine - Imagine at Stanford

StreaMIT A language intended for the MIT RAW machine but not machine specific

Brook 2 generation language based on the Imagine concept of streams, but machine- independent

2 Issues on Kernels

2.1 Implicit vs Explicit

In C code of FIR filter it is not easy to figure out what is the input stream and what is the output stream. On the other hand, position of elements within the stream should be specified explicitly. Here is C code:

for(i=0; i<MAX-FIRLEN; i++) { s = 0; for (j=0;j<FIRLEN;j++) s += a[i+FIRLEN-1-j]*h[j]; b[i-FIRLEN+1] = s; }

In Brook, input and output streams should be explicitly declared but an element within a stream is determined by context implicitly, but not pointed by an absolute index of position explicitly. Here is a kernel declaration in Brook.

kernel fir(floatsa[i:0,FIRLEN-1], float h[FIRLEN], out floats b) { s = 0; for (j=0;j<FIRLEN;j++) s += a[FIRLEN-1-j]*h[j];

b = s; }

The index for stream a, (FIRLEN-1-j), is an offset from the current pointer of streams denoted by i, rather than one from the first element of the stream. This kernel is called by a caller function like this: fir(a, h, b). Neither in the caller or in the callee function is an absolute value of the index. The code below shows how a stream is derived in Brook using stencil.

typedef stream float floats; typedef stream float floatws[FIRLEN];

floats a, b; floatws aa; streamSetLength(a, 1024); streamSetLength(b,1024); streamStencil(aa, a, STREAM_STENCIL_CLAMP, 1, 0, FIRLEN-1);

From programmer point of view the kernel is applied to stream of stencils.

2.2 Retained State vs Functional

StreamC/KernelC manually manages communication between clusters, which violates abstrac- tion. It adopts ”retained state”, which keeps the values of variables across iteration boundary of kernel execution. On the other hand, Brook does not retain states of kernel execution. A good aspect of this scheme is that there is no data dependency so that we can exploit more data-level parallelism (DLP). In order to communicate across iterations, Brook uses ”reduction” variable. A reduction variable can take any type.

Then values of the table can be written in a kernel? The answer is ”yes” for KernelC and ”no” for Brook - the global variables are read-only in the kernel of Brook. It is not clear how to implement it: register files - small capacity, local memory/scratchpad - small bandwidth. If global data is too big, then the only option is global memory. But how can we hide memory access latency? Maybe, it’s better to split into 2 kernels and generate intermidiate stream of indeces/addresses to amortize latency. In such case a lot of intermidiate data might need to be stored in other streams.

3 Issues on Streams

3.1 Stream Declarations and Derivations

Figure 3 shows how to declare a stream and derive one from an existing stream both in StreamC and Brook. All of the derived streams are not a copy of the original stream but a reference to it. However, StreamC has ”multiple of 8 problem”. In contrast, StreaMIT never sees a stream, because stream declaration is all implicit.

3.2 Communication Pattern

Connecting multiple kernels results in a communication pattern. In StreamC/KernelC as well as Brook, we get together different kernels freely as shown in figure 4. In StreaMIT, kernel is a filter, basically; it takes only one input stream and generates only one output stream. You can do three things in connecting these filters.

Pipeline one kernel follows another, or ”pipeline” them

Split/Join takes one stream input and split it to feed multiple kernels and join their out- puts into one stream

Feedback Loop reverse positions of a splitter and a joiner

Figure 2: Access Across Input Streams in StreaMIT

/* StreamC */ // a stream of 1024 "foo" records im_stream x = newStreamData(1024);

// every third record from stream x y = x(0, 1024, im_fixed, im_acc_stride, 3);

// these are "references" // if you change y, x is changed as well

/* Brook */ typedef stream foo foos; foos x,y; streamSetSize(x, 1024);

streamStride(y, x, 1, 3); // y is "references"

Figure 3: Stream Declations and Derivations

4 Brook

4.1 What is the purpose of Brook?

Brook is designed to achieve the following goals:

Machine Independent

More Suitable for Parallel Implementation ”functional” kernel with no retained states; reduction mechanism that is converted to tree in log time

Figure 4: An Example for Communication between Kernels in Brook

Figure 6: Stencil

Figure 7: Stencil of a 3x3 Rectangle

streamStencil(y, x, STREAM_STENCIL_CLAMP, 2, 1, -1, 1, -1);

kernel void neighborAvg(floats2 a, out floats b) { b = 0.25*(a[0][1]+a[2][1]+a[1][0]+a[1][2]); }

The stencil in the code above is defined by 3-by-3 array; currently, only rectangular shape of stencil is supported, and it yanks 9 elements centered around an original stream element, shaded in figure 7. In many cases, we only need 4 elements in the east, west, south, and north of a given element, but not those in the diagnal location. How we can manage the necessary data is a question. In dealing with the elements beyond boundary of 2-D array like the shaded element in figure 6, Brook supports the following three ”decorations” for stencil. (Assume that the 2-D

array b is derived from the array a in the figure.)

HALO the array is a subset of a larger array which contains all of the elements in ques- tion CLAMP uses a specified value(e.g. 0) for the boundary values

PERIODIC periodic boundary condition – like torus or donuts

It is possible to use variable as an index in the stencil.

4.3 Reduction

Reduction is another major mechanism to communicate between elements of a kernel. Unlike a stream which is either read-only or write-only, a reduction variable is both readable and writable. However, use of reduction variable assumes that computation is associative. Restricting a stream to either read-only or write-only is critical to ease discovery of com- munication between streams.

4.4 Irregular Structures

To handle an irregular structure in Brook is still under development.

4.4.1 An Example

For example, let’s think of the following problem – for each node in the graph summing the values of all of the neighboring nodes in each iteration; in this case, every vertex has a different number of neighboring vertices, and it takes a different amount of time to execute the kernel depending on the number of nodes, correspondingly. One way to solve the problem is to generate a stream of indeces of neighbours from stream of nodes, then fetch the stream of neighbours from the memory and sum (slide 18). Cleaner approach is to explicitly specify what you want and allow the compiler to decom- pose and optimize it (slide 19). Figure 9 shows an examplar structure of the vertices and a possible data structure to handle this problem.

5 Summary and Open Questions

5.1 Summary

Communication restriction vs. ease of use.
No input-output streams are allowed to avoid dependencies/serilization. The only way to communicate between iterations is through reduction variables.
Handling complex structures.

bandwidth requirements. Variable latency does not matter. If separate cache is connected directly to each cluster, then miss in any of them will stall all clusters. Q. What does StremProduct do? A. StreamProduct(a, b, c) generates all combinations of elements of streams a and b:

a b c d

p a,p b,p c,p d,p q a,q b,q c,p d,p r a,r b,r c,p d,p

StreamProduct has not been implemented yet. Efficient implementation should not generate whole product since it may be too large. Q. Is Brook environment is the same as StreamC/KernelC? A. No, Brook is implemented on Linux. Q. Is there link between Brook and Imagine? A. No, there is only converter from Brook to C, which allows to run Brook programs on standard PC/workstation.

Stream Processor Architecture and Programming Languages, Lecture notes of Programming Languages

Related documents

Partial preview of the text

Download Stream Processor Architecture and Programming Languages and more Lecture notes Programming Languages in PDF only on Docsity!

Stream Programming Languages/Brook Tutorial

1 What is a Stream Programming Language?

2 Issues on Kernels

2.1 Implicit vs Explicit

2.2 Retained State vs Functional

3 Issues on Streams

3.1 Stream Declarations and Derivations

3.2 Communication Pattern

4 Brook

4.1 What is the purpose of Brook?

4.3 Reduction

4.4 Irregular Structures

5 Summary and Open Questions

5.1 Summary