Download Parallel Computation: HDF-5 and CACTUS for Scientific Data Libraries and Problem Solving and more Study notes Computer Science in PDF only on Docsity!
COSC 4397 – Parallel ComputationEdgar Gabriel
COSC 4397
Parallel Computation
Scientific Data Libraries and
Problem Solving Environments
Edgar Gabriel
Spring 2006
2
COSC 4397 – Parallel ComputationEdgar Gabriel
Motivation
MPI I/O is good
– It knows about data types (=> data conversion)– It can optimize various access patterns in
applications
MPI I/O is bad
– It does not store any information about the data
type
• A file written as MPI_INT can be read as
MPI_DOUBLE in another application
• No information is stored, whether it is a two-
dimensional data array or anything else
4
COSC 4397 – Parallel ComputationEdgar Gabriel
HDF-
Hierarchical Data Format (HDF) developed since 1988 atNCSA (University of Illinois)
– http://hdf.ncsa.uiuc.edu/HDF5/
Has gone through a long history of changes, the recentversion HDF-5 available since 1999
HDF-5 supports
– Very large files– Parallel I/O interface– Fortran, C, Java bindings
5
COSC 4397 – Parallel ComputationEdgar Gabriel
HDF-5 dataset
Multi-dimensional array of basic data elements
A dataset consists of
– Header + data
Header consists of
– Name– Datatype : basic (e.g. HDF_NATIVE_FLOAT) or
compound dataypes
– Dataspace: defines size and shape of a multidimensional array.
Dimensions can be fixed or unlimited.
– Storage layout: defines how multidimensional arrays are stored
in file. Can be contiguous or chunked.
7
COSC 4397 – Parallel ComputationEdgar Gabriel
Storage layout: contiguous vs. chunked
contiguous
chunked
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
8 16 24 32 40 48 56 64
1 5 9 13 33 37 41 45
2 6 10 14 34 38 42 46
3 7 11 15 35 39 43 47
4 8 12 16 36 40 44 48
21 25 29 49 57 61
22 26 30 50 58 62
23 27 31 51 59 63
24 28 32 52 60 64
17
18
19
20
53
54
55
56
Advantages and disadvantages of chunking
Accessing rows and columns require the same number ofaccesses
Data can be extended into all dimensions
Efficient storage of sparse arrays
Can improve caching
8
COSC 4397 – Parallel ComputationEdgar Gabriel
HDF-5 groups
A HDF-5 group is a collection of data sets
– Comparable to a directory in a UNIX-like file system
HDF-5 naming convention
– All API functions start with an
H
– The next character identifies category of functions
• H5F
: functions handling files
• H5G
: functions handling groups
• H5D
: functions handling datasets
• H5S
: functions handling dataspaces
• H5A
: functions handling attributes
10
COSC 4397 – Parallel ComputationEdgar Gabriel
Reading an HDF-5 file – structure of the
file known
Open the file
Open the group
Open each dataset inthe group
Look up dimensions
Read data
Read attributes
Read comments
Close all objects
h5file
=
H5Fopen(…)
group
=
H5Gopen(h5file,”tempseries”)
tset
=
H5Dopen(group,”temperature”);
tspace
=
H5Dget_space(
tset
);
H5Sget_simple_extent_dims
(tspace,
dims,
…);
H5Dread(tset,H5T_IEEE_F32BE,
ttype,
tspace,
…,
buffer);
tattr
=
H5Aopen_name(tset,
“units”);
attrtype
=
H5Aget_type
(
tattr
);
H5Aread(tattr,attrtype,attr);
11
COSC 4397 – Parallel ComputationEdgar Gabriel
Compound Datatypes
Abstraction for user structures
– Has a fixed size– Each member has its own name, datatype, reference,
and byte offset
h5type
H5Tcreate(
H5T_class
class,
size_t
size);
H5Tinsert
h5type,
const
char
*name,
off_t
offset,
hid_t
field_id);
13
COSC 4397 – Parallel ComputationEdgar Gabriel
Example using hyperslabs
Define
hyperslab in
the
dataset.
offset[0]
count[0]
NX_SUB;
offset[1]
count[1]
NY_SUB;
status
H5Sselect_hyperslab
(dataspace,
H5S_SELECT_SET,
offset,
NULL,
count,
NULL);
/*Read
data
from
hyperslab
in
file
into
hyperslab in
memory
status
H5Dread
(dataset,
H5T_NATIVE_INT,
memspace,
dataspace,
H5P_DEFAULT,
data_out);
offset[0]
offset[1]
count[0]
count[1]
Examples taken from HDF-5 webpage
14
COSC 4397 – Parallel ComputationEdgar Gabriel
More complex example using
hyperslabs
Memory
Process 0
Process 1
File
count[0]
count[1]
dimsmem[1];
block[0]
dimsfile[0];
block[1]
offset[0]
offset[1]
mpi_rank;
stride[0]
stride[1]
dimsmem[1]
block[0]
dimsmem[0]
For dimension x:you generate count[x]entries of block[x] elementsstarting from offset[x]. Thedistance between eachelement is stride[x].
offset[0]
offset[1] on rank
Examples taken from HDF-5 webpage
16
COSC 4397 – Parallel ComputationEdgar Gabriel
Parallel data access in HDF-
Application has to define a set of interleaved file dataspaces on the processes that will access the file
– Similar technique like setting the file-view in MPI I/O– Usually based on defining hyperslabs
Data transfer properties have to be set
/* example for using file properties */fileprops = H5Pcreate (H5P_DATASET_XFER);H5Pset_dxpl_mpio ( fileprops, H5FD_MPIO_COLLECTIVE);H5Dwrite ( tset, …, fileprops);
17
COSC 4397 – Parallel ComputationEdgar Gabriel
Problem Solving Environments
19
COSC 4397 – Parallel ComputationEdgar Gabriel
More detailed...
Development
– Tools to help problem specification, design, analysis and
verification
– Rapid prototyping– Dependence on a specific domain– Expert assistance
Execution
– Support online/offline observation– in the Grid: activities performed on multiple heterogeneous
components
• selection, testing, configuration, activation, monitoring,
data management
20
COSC 4397 – Parallel ComputationEdgar Gabriel
Overview about some existing PSEs
N a m e
A p p lic a tio nd o m a in
B a s ete c h n o lo g y
W e b - s ite
C a c tu s
A s tr o p h y s ic s
w w w .c a c tu s c o d e .o rg
T E N T
A e r o s p a c e
C O R B A /J a v a
w w w .d lr .d e /te n t
J A C O 3
A e r o s p a c e
C O R B A /J a v a
w w w .ir is a .f r /p a r is /n f r a n c a is /ja c o3 .h tm
C O V IS E
V is u a liz a tio n
w w w .h lr s .d e /o rg a n iz a tio n /v is /c o vis e
S C IR u n
M e d ic a l s im u la tio n
s o f tw a r e .s c i.u ta h .e d u /s c ir u n .h tm l