My favorites | Sign in
Project Logo
                
Show all Featured wiki pages:
FAQ HowTo News
People details
Project owners:
  andrew.collette

What is h5py?

HDF5 for Python (h5py) is a general-purpose Python interface to the Hierarchical Data Format library, version 5. HDF5 is a versatile, mature scientific software library designed for the fast, flexible storage of enormous amounts of data.

From a Python programmer's perspective, HDF5 provides a robust way to store data, organized by name in a tree-like fashion. You can create datasets (arrays on disk) hundreds of gigabytes in size, and perform random-access I/O on desired sections. Datasets are organized in a filesystem-like hierarchy using containers called "groups", and accessed using the tradional POSIX /path/to/resource syntax.

A generic NumPy interface to HDF5 data

H5py provides a simple, robust read/write interface to HDF5 data from Python. Existing Python and Numpy concepts are used for the interface; for example, datasets on disk are represented by a proxy class that supports slicing, and has dtype and shape attributes. HDF5 groups are presented using a dictionary metaphor, indexed by name.

A major design goal of h5py is interoperability; you can read your existing data in HDF5 format, and create new files that any HDF5- aware program can understand. No Python-specific extensions are used; you're free to implement whatever file structure your application desires.

Almost all HDF5 features are available from Python, including things like compound datatypes (as used with Numpy recarray types), HDF5 attributes, hyperslab and point-based I/O, and more recent features in HDF 1.8 like resizable datasets and recursive iteration over entire files.

A foundation for other HDF5 applications in Python

In addition to the NumPy-like high-level interface, the foundation of h5py is a near-complete wrapping of the HDF5 C API. It includes the majority of the API with the following major improvements:

Applications which want to access HDF5 data in a less-generic manner, or which need a different performance/flexibility ratio, can access HDF5 directly through this Python API. In this sense, the NumPy-like layer is itself an application written to the native HDF5 API.

Compatibility

Here's a trivial example showing how to create a new HDF5 file and store a 100 x 20 array of floats:

>>> f = h5py.File("myfile.hdf5", 'w')
>>> f["MyDataset"] = numpy.ones((100,20))

And to get your data back:

>>> dset = f["MyDataset"]
>>> subset = dset[20:80,:]

See the links to the right for documentation, download and installation instructions.

Contact email is "h5py" at the domain "alfven dot org".









Hosted by Google Code