Parallel Computation: HDF-5 and CACTUS for Scientific Data Libraries and Problem Solving, Study notes of Computer Science

An overview of parallel computation, focusing on scientific data libraries and problem solving environments using hdf-5 and cactus. Hdf-5 is a hierarchical data format that supports large files, parallel i/o interface, and fortran, c, java bindings. Cactus is a parallel computing environment for enabling large scale distributed computing in the grid, with features like parameter parsing, scheduling, and make system. The architecture of cactus, writing a sequential hdf-5 file, and using cactus in your code.

Typology: Study notes

Pre 2010

Uploaded on 08/17/2009

koofers-user-2cd
koofers-user-2cd 🇺🇸

9 documents

1 / 38

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
COSC 4397 – Parallel Computation
Edgar Gabriel
COSC 4397
Parallel Computation
Scientific Data Libraries and
Problem Solving Environments
Edgar Gabriel
Spring 2006
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26

Partial preview of the text

Download Parallel Computation: HDF-5 and CACTUS for Scientific Data Libraries and Problem Solving and more Study notes Computer Science in PDF only on Docsity!

COSC 4397 – Parallel ComputationEdgar Gabriel

COSC 4397

Parallel Computation

Scientific Data Libraries and

Problem Solving Environments

Edgar Gabriel

Spring 2006

2

COSC 4397 – Parallel ComputationEdgar Gabriel

Motivation

MPI I/O is good

– It knows about data types (=> data conversion)– It can optimize various access patterns in

applications

MPI I/O is bad

– It does not store any information about the data

type

• A file written as MPI_INT can be read as

MPI_DOUBLE in another application

• No information is stored, whether it is a two-

dimensional data array or anything else

4

COSC 4397 – Parallel ComputationEdgar Gabriel

HDF-

Hierarchical Data Format (HDF) developed since 1988 atNCSA (University of Illinois)

– http://hdf.ncsa.uiuc.edu/HDF5/

Has gone through a long history of changes, the recentversion HDF-5 available since 1999

HDF-5 supports

– Very large files– Parallel I/O interface– Fortran, C, Java bindings

5

COSC 4397 – Parallel ComputationEdgar Gabriel

HDF-5 dataset

Multi-dimensional array of basic data elements

A dataset consists of

– Header + data

Header consists of

– Name– Datatype : basic (e.g. HDF_NATIVE_FLOAT) or

compound dataypes

– Dataspace: defines size and shape of a multidimensional array.

Dimensions can be fixed or unlimited.

– Storage layout: defines how multidimensional arrays are stored

in file. Can be contiguous or chunked.

7

COSC 4397 – Parallel ComputationEdgar Gabriel

Storage layout: contiguous vs. chunked

contiguous

chunked

1 9 17 25 33 41 49 57

2 10 18 26 34 42 50 58

3 11 19 27 35 43 51 59

4 12 20 28 36 44 52 60

5 13 21 29 37 45 53 61

6 14 22 30 38 46 54 62

7 15 23 31 39 47 55 63

8 16 24 32 40 48 56 64

1 5 9 13 33 37 41 45

2 6 10 14 34 38 42 46

3 7 11 15 35 39 43 47

4 8 12 16 36 40 44 48

21 25 29 49 57 61

22 26 30 50 58 62

23 27 31 51 59 63

24 28 32 52 60 64

17

18

19

20

53

54

55

56

„

Advantages and disadvantages of chunking

„

Accessing rows and columns require the same number ofaccesses

„

Data can be extended into all dimensions

„

Efficient storage of sparse arrays

„

Can improve caching

8

COSC 4397 – Parallel ComputationEdgar Gabriel

HDF-5 groups

A HDF-5 group is a collection of data sets

– Comparable to a directory in a UNIX-like file system

HDF-5 naming convention

– All API functions start with an

H

– The next character identifies category of functions

• H5F

: functions handling files

• H5G

: functions handling groups

• H5D

: functions handling datasets

• H5S

: functions handling dataspaces

• H5A

: functions handling attributes

10

COSC 4397 – Parallel ComputationEdgar Gabriel

Reading an HDF-5 file – structure of the

file known

Open the file

Open the group

Open each dataset inthe group

Look up dimensions

Read data

Read attributes

Read comments

Close all objects

h5file

=

H5Fopen(…)

group

=

H5Gopen(h5file,”tempseries”)

tset

=

H5Dopen(group,”temperature”);

tspace

=

H5Dget_space(

tset

);

H5Sget_simple_extent_dims

(tspace,

dims,

…);

H5Dread(tset,H5T_IEEE_F32BE,

ttype,

tspace,

…,

buffer);

tattr

=

H5Aopen_name(tset,

“units”);

attrtype

=

H5Aget_type

(

tattr

);

H5Aread(tattr,attrtype,attr);

11

COSC 4397 – Parallel ComputationEdgar Gabriel

Compound Datatypes

Abstraction for user structures

– Has a fixed size– Each member has its own name, datatype, reference,

and byte offset

h5type

H5Tcreate(

H5T_class

class,

size_t

size);

H5Tinsert

h5type,

const

char

*name,

off_t

offset,

hid_t

field_id);

13

COSC 4397 – Parallel ComputationEdgar Gabriel

Example using hyperslabs

Define
hyperslab in
the
dataset.
offset[0]
count[0]
NX_SUB;
offset[1]
count[1]
NY_SUB;
status
H5Sselect_hyperslab
(dataspace,
H5S_SELECT_SET,
offset,
NULL,
count,
NULL);
/*Read
data
from
hyperslab
in
file
into
hyperslab in
memory
status
H5Dread
(dataset,
H5T_NATIVE_INT,
memspace,
dataspace,
H5P_DEFAULT,
data_out);

offset[0]

offset[1]

count[0]

count[1]

Examples taken from HDF-5 webpage

14

COSC 4397 – Parallel ComputationEdgar Gabriel

More complex example using

hyperslabs

Memory

Process 0

Process 1

File

count[0]
count[1]
dimsmem[1];
block[0]
dimsfile[0];
block[1]
offset[0]
offset[1]
mpi_rank;
stride[0]
stride[1]

dimsmem[1]

block[0]

dimsmem[0]

For dimension x:you generate count[x]entries of block[x] elementsstarting from offset[x]. Thedistance between eachelement is stride[x].

offset[0]

offset[1] on rank

Examples taken from HDF-5 webpage

16

COSC 4397 – Parallel ComputationEdgar Gabriel

Parallel data access in HDF-

Application has to define a set of interleaved file dataspaces on the processes that will access the file

– Similar technique like setting the file-view in MPI I/O– Usually based on defining hyperslabs

Data transfer properties have to be set

/* example for using file properties */fileprops = H5Pcreate (H5P_DATASET_XFER);H5Pset_dxpl_mpio ( fileprops, H5FD_MPIO_COLLECTIVE);H5Dwrite ( tset, …, fileprops);

17

COSC 4397 – Parallel ComputationEdgar Gabriel

Problem Solving Environments

19

COSC 4397 – Parallel ComputationEdgar Gabriel

More detailed...

Development

– Tools to help problem specification, design, analysis and

verification

– Rapid prototyping– Dependence on a specific domain– Expert assistance

Execution

– Support online/offline observation– in the Grid: activities performed on multiple heterogeneous

components

• selection, testing, configuration, activation, monitoring,

data management

20

COSC 4397 – Parallel ComputationEdgar Gabriel

Overview about some existing PSEs

N a m e

A p p lic a tio nd o m a in

B a s ete c h n o lo g y

W e b - s ite

C a c tu s

A s tr o p h y s ic s

w w w .c a c tu s c o d e .o rg

T E N T

A e r o s p a c e

C O R B A /J a v a

w w w .d lr .d e /te n t

J A C O 3

A e r o s p a c e

C O R B A /J a v a

w w w .ir is a .f r /p a r is /n f r a n c a is /ja c o3 .h tm

C O V IS E

V is u a liz a tio n

w w w .h lr s .d e /o rg a n iz a tio n /v is /c o vis e

S C IR u n

M e d ic a l s im u la tio n

s o f tw a r e .s c i.u ta h .e d u /s c ir u n .h tm l