GPU workshop cheatsheet, Slides of Data Structures and Algorithms

Page 1. 1. GPU workshop cheatsheet ... An accelerator data construct defines a region of the program within which data is accessible by the accelerator.

Typology: Slides

2022/2023

Uploaded on 05/11/2023

palumi
palumi 🇺🇸

4.2

(14)

245 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
GPU workshop cheatsheet
OpenACC guide (PGI flags acc ta=nvidia Minfo=accel)
Kernels Construct
An accelerator kernels construct surrounds loops to be executed on the accelerator, typically as a sequence of
kernel operations.
C
#pragma acc kernels [clause [[,] clause]…] new-line
{ structured block }
Fortran
!$acc kernels [clause [[,] clause]…]
structured block
!$acc end kernels
Any data clause is allowed.
other clauses
if( condition )
When the condition is nonzero or .TRUE. the kernels region will execute on the accelerator; otherwise, it will execute
on the host.
async( expression )
The kernels region executes asynchronously with the host.
Data Construct
An accelerator data construct defines a region of the program within which data is accessible by the accelerator.
C
#pragma acc data [clause[[,] clause]…] new-line
{ structured block }
Fortran
!$acc data [clause[[,] clause]…]
structured block
!$acc end data
Any data
Data Clauses
The description applies to the clauses used on parallel constructs, kernels constructs, data constructs, declare
constructs, and update directives.
copy( list )
Allocates the data in list on the accelerator and copies the data from the host to the accelerator when entering the
region, and copies the data from the accelerator to the host when exiting the region.
copyin( list )
Allocates the data in list on the accelerator and copies the data from the host to the accelerator when entering the
region.
copyout( list )
Allocates the data in list on the accelerator and copies the data from the accelerator to the host when exiting the
region.
create( list )
Allocates the data in list on the accelerator, but does not copy data between the host and device.
present( list )
The data in list must be already present on the accelerator, from some containing data region; that accelerator copy
is found and used.
pf3
pf4

Partial preview of the text

Download GPU workshop cheatsheet and more Slides Data Structures and Algorithms in PDF only on Docsity!

GPU workshop cheatsheet

OpenACC guide (PGI flags – acc – ta=nvidia – Minfo=accel)

Kernels Construct

An accelerator kernels construct surrounds loops to be executed on the accelerator, typically as a sequence of kernel operations. C #pragma acc kernels [clause [[,] clause]…] new-line { structured block } Fortran !$acc kernels [clause [[,] clause]…] structured block !$acc end kernels Any data clause is allowed. other clauses if( condition ) When the condition is nonzero or .TRUE. the kernels region will execute on the accelerator; otherwise, it will execute on the host. async( expression ) The kernels region executes asynchronously with the host.

Data Construct

An accelerator data construct defines a region of the program within which data is accessible by the accelerator. C #pragma acc data [clause[[,] clause]…] new-line { structured block } Fortran !$acc data [clause[[,] clause]…] structured block !$acc end data Any data

Data Clauses

The description applies to the clauses used on parallel constructs, kernels constructs, data constructs, declare constructs, and update directives. copy( list ) Allocates the data in list on the accelerator and copies the data from the host to the accelerator when entering the region, and copies the data from the accelerator to the host when exiting the region. copyin( list ) Allocates the data in list on the accelerator and copies the data from the host to the accelerator when entering the region. copyout( list ) Allocates the data in list on the accelerator and copies the data from the accelerator to the host when exiting the region. create( list ) Allocates the data in list on the accelerator, but does not copy data between the host and device. present( list ) The data in list must be already present on the accelerator, from some containing data region; that accelerator copy is found and used.

CUDA

Built-in kernel variables

 gridDim.[x,y,z] -> Three dimensional vector containing the dimensions of the grid. This is a constant that is set at kernel launch time. If not set explicitly each dimension defaults to 1.  blockIdx.[x,y,z] -> Three dimensional vector containing the block index within the grid. This is a dynamic value that depends on which block calls it.  blockDim.[x,y,z] -> Three dimensional vector containing the dimensions of the thread block. This is set at kernel launch time. If not set explicitly each dimension defaults to 1.  threadIdx.[x,y,z] -> Three dimensional vector specifying the thread index within the thread block. Dynamic value depending on which thread calls it.

Important Functions

 Kernel Launch o Kernel_name<<< gridsize, blocksize >>>(arg1,arg2,…);  Memory Management o cudaError_t cudaMalloc( void **devPtr, size_t size );  Example: **cudaMalloc( (void ) &d_c, numbytes ); o cudaError_t cudaFree( void *devPtr );  Example: cudaFree( d_c ); o cudaError_t cudaMemcpy( void *dst, const void src, size_t size, enum cudaMemcpyKind kind );  enum cudaMemcpyKind  cudaMemcpyHostToDevice  cudaMemcpyDeviceToHost  cudaMemcpyDeviceToDevice  Example: cudaMemcpy( d_c, c, numbytes,cudaMemcpyHostToDevice);  Error Checking o cudaError_t cudaGetLastError(void); o char cudaGetErrorString( cudaError_t code ); o printf(“%s\n”, cudaGetErrorString( cudaGetLastError() ) );