My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
HowTo  
Example list demonstrating common operations in h5py
Featured
Updated Dec 18, 2011 by andrew.c...@gmail.com

Introduction

This document contains example code snippets for frequently-performed operations in h5py. Examples here are written for h5py 1.2 (the current version), although most also work with 1.1 and older.

Examples in the source distribution

The h5py source tarballs contain a small but growing number of "complete" standalone examples. You can browse this directory in Google code directly:

Examples

Files and Groups

Open/create a file

With the default driver:

>>> f1 = h5py.File('myfile.hdf5', 'w')  # Default mode is 'a'
>>> f2 = h5py.File(u'myfile.hdf5', 'w')  # Unicode!

With a different HDF5 driver:

>>> f1 = h5py.File('myfile.hdf5', driver='core')  # Use H5FD_CORE driver

Enumerating group members

To get a list or dictionary of group members:

>>> L = list(mygroup)
>>> D = dict(mygroup)

Recursive list or dictionary of group/subgroup members (requires HDF5 1.8):

>>> L = []
>>> mygroup.visit(L.append)

>>> D = {}
>>> def filldict(x, y):
...     D[x] = y
>>> mygroup.visititems(filldict)

Finding node parents

Because of the way HDF5 is designed, you'll quickly notice that the syntax mygroup[".."] doesn't work. You can use the .parent attribute (new in 1.2) instead:

parent_group = mynode.parent

This property is attached to all high-level objects, including datasets. Under all circumstances, this is equivalent to:

parent_group = mygroup[posixpath.dirname(mygroup.name)]

However, if multiple hard links to mygroup exist, mygroup.name may not be what you expect! When using these properties it's best to limit yourself to a strict tree configuration for the file.

The File object used to open the HDF5 file is also available via a property:

fileobj = mynode.file

Hard linking

You can create a "hard link" to an existing HDF5 object by simply assigning it to a group:

obj = mygroup.create_dataset('ds', (2,2), 'f')
mygroup['another name for ds'] = obj

Currently you cannot create symlinks via the high-level interface.

Datasets

Using chunks

Data for any array can be stored in chunked format. When compression is used, chunking is automatically activated. You can have h5py guess a chunk layout for you:

>>> ds = myfile.create_dataset('ds', (100,100), dtype, chunks=True)

or specify the chunking manually yourself:

>>> ds = myfile.create_dataset('ds', (100,100), dtype, chunks=(10,10))

In the real world, chunks of size 10kB - 300kB work best, especially for compression. Very small chunks lead to lots of overhead in the file, while very large chunks can result in inefficient I/O.

Chunks must be smaller than 1 megabyte to participate in the HDF5 chunk cache.

Using GZIP compression

>>> ds = myfile.create_dataset('ds', shape, dtype, compression='gzip', compression_opts=4)

or (the old way):

>>> ds = myfile.create_dataset('ds', shape, dtype, compression=4)

If compression_opts is omitted, the default GZIP level is 4. You can check:

>>> ds.compression
'gzip'
>>> ds.compression_opts
4

Using SZIP compression

>>> ds1 = myfile.create_dataset('ds', shape, dtype, compression='szip')
>>> ds2 = myfile.create_dataset('ds', shape, dtype, compression='szip', compression_opts=('nn', 16))

Compression options are the method (nearest-neighbor 'nn' or entropy-coding 'ec'), and the number of pixels per block (even integer <= 32).

Using LZF compression

>>> ds = myfile.create_datase('ds', shape, dtype, compression='lzf')

There are no options for the LZF compressor. Please note LZF is currently only available with h5py.

Appending to a dataset (HDF5 1.8 only)

This one's a bit trickier. When you create the dataset, you can specify a "maximum shape" tuple. Values of None indicate unlimited dimensions:

>>> ds = myfile.create_dataset('ds', (20,1000), 'f', maxshape=(None,1000))
>>> ds.shape
(20, 1000)
>>> ds.maxshape
(None, 1000)

To increase the size of the dataset, let's say to (40, 1000), use the "resize" function, and then input your new data:

>>> ds.resize((40,1000))
>>> ds.shape
(40,1000)
>>> ds[20:40] = newdata

Accessing a scalar dataset

Use the following syntax (borrowed from NumPy):

data = mydataset[()]

Types

Compound types

Compound types in h5py work just like ordinary NumPy compound types. Just create a standard NumPy dtype and use it as normal:

>>> my_dtype = numpy.dtype([('field1', 'i'), ('field2', 'f')])
>>> ds = myfile.create_dataset('ds', (10,10), dtype=my_dtype)
>>> ds.dtype
dtype([('field1', '<i4'), ('field2', '<f4')])

As an added bonus, you can index the dataset object using field names as well as indices:

>>> ds[0,0]
(0, 0.0)
>>> ds[0,0,'field1']
0

Variable-length strings

Variable-length strings in HDF5 are handled via NumPy "object" dtypes, with a small amount of additional metadata. Here's how to create a dataset of variable length strings:

>>> str_type = h5py.new_vlen(str)
>>> ds = myfile.create_dataset('ds', shape, dtype=str_type)

The string type object is a standard NumPy object dtype:

>>> type(str_type)
<type 'numpy.dtype'>
>>> str_type.kind
'O'

It can be used anywhere a NumPy "O" dtype is allowed, including in compound (recarray) types and array types.

Warning: Variable-length strings in HDF5 are like C strings, in that they cannot contain embedded nulls. If a string contains NULL characters, only the portion of the string up to the first NULL will be saved.

Enumerated types

Like variable-length strings, enumerated types are handled in h5py as "integer-plus-metadata" dtypes. They are created by a similar convenience function:

>>> enum_type = h5py.new_enum('i', {'RED': 1, 'GREEN': 2, 'BLUE': 42})
>>> enum_type.kind
'i'
>>> ds = myfile.create_dataset('ds', shape, dtype=enum_type)

To obtain the dictionary of enum values associated with one of these types, use the high-level convenience function get_enum:

>>> h5py.get_enum(ds.dtype)
{'BLUE': 42, 'GREEN': 2, 'RED': 1}

Attributes

Attributes are small bits of data attached to any file-resident HDF5 object, including groups and datasets.

Reading and writing attributes

Attributes are accessed though a dict-like proxy attached to groups and datasets:

mygroup.attrs['name'] = 42

They are created by direct assignment, with a type determined by the given value. The example above creates an integer attribute; the code grp.attrs['name'] = 1.3 will create a floating-point attribute.

Any existing value is automatically overwritten.

Iteration

Attribute proxies support the same dict-like methods as Group objects:

>>> list_of_attrs = list(grp.attrs)   # or grp.attrs.keys()
>>> 'name' in grp.attrs
True

Sign in to add a comment
Powered by Google Project Hosting