|
HowTo
Example list demonstrating common operations in h5py
Featured IntroductionThis document contains example code snippets for frequently-performed operations in h5py. Examples here are written for h5py 1.2 (the current version), although most also work with 1.1 and older.
Examples in the source distributionThe h5py source tarballs contain a small but growing number of "complete" standalone examples. You can browse this directory in Google code directly: ExamplesFiles and GroupsOpen/create a fileWith the default driver: >>> f1 = h5py.File('myfile.hdf5', 'w') # Default mode is 'a'
>>> f2 = h5py.File(u'myfile.hdf5', 'w') # Unicode!With a different HDF5 driver: >>> f1 = h5py.File('myfile.hdf5', driver='core') # Use H5FD_CORE driverEnumerating group membersTo get a list or dictionary of group members: >>> L = list(mygroup) >>> D = dict(mygroup) Recursive list or dictionary of group/subgroup members (requires HDF5 1.8): >>> L = []
>>> mygroup.visit(L.append)
>>> D = {}
>>> def filldict(x, y):
... D[x] = y
>>> mygroup.visititems(filldict)Finding node parentsBecause of the way HDF5 is designed, you'll quickly notice that the syntax mygroup[".."] doesn't work. You can use the .parent attribute (new in 1.2) instead: parent_group = mynode.parent This property is attached to all high-level objects, including datasets. Under all circumstances, this is equivalent to: parent_group = mygroup[posixpath.dirname(mygroup.name)] However, if multiple hard links to mygroup exist, mygroup.name may not be what you expect! When using these properties it's best to limit yourself to a strict tree configuration for the file. The File object used to open the HDF5 file is also available via a property: fileobj = mynode.file Hard linkingYou can create a "hard link" to an existing HDF5 object by simply assigning it to a group: obj = mygroup.create_dataset('ds', (2,2), 'f')
mygroup['another name for ds'] = objCurrently you cannot create symlinks via the high-level interface. DatasetsUsing chunksData for any array can be stored in chunked format. When compression is used, chunking is automatically activated. You can have h5py guess a chunk layout for you: >>> ds = myfile.create_dataset('ds', (100,100), dtype, chunks=True)or specify the chunking manually yourself: >>> ds = myfile.create_dataset('ds', (100,100), dtype, chunks=(10,10))In the real world, chunks of size 10kB - 300kB work best, especially for compression. Very small chunks lead to lots of overhead in the file, while very large chunks can result in inefficient I/O. Chunks must be smaller than 1 megabyte to participate in the HDF5 chunk cache. Using GZIP compression>>> ds = myfile.create_dataset('ds', shape, dtype, compression='gzip', compression_opts=4)or (the old way): >>> ds = myfile.create_dataset('ds', shape, dtype, compression=4)If compression_opts is omitted, the default GZIP level is 4. You can check: >>> ds.compression 'gzip' >>> ds.compression_opts 4 Using SZIP compression>>> ds1 = myfile.create_dataset('ds', shape, dtype, compression='szip')
>>> ds2 = myfile.create_dataset('ds', shape, dtype, compression='szip', compression_opts=('nn', 16))Compression options are the method (nearest-neighbor 'nn' or entropy-coding 'ec'), and the number of pixels per block (even integer <= 32). Using LZF compression>>> ds = myfile.create_datase('ds', shape, dtype, compression='lzf')There are no options for the LZF compressor. Please note LZF is currently only available with h5py. Appending to a dataset (HDF5 1.8 only)This one's a bit trickier. When you create the dataset, you can specify a "maximum shape" tuple. Values of None indicate unlimited dimensions: >>> ds = myfile.create_dataset('ds', (20,1000), 'f', maxshape=(None,1000))
>>> ds.shape
(20, 1000)
>>> ds.maxshape
(None, 1000)To increase the size of the dataset, let's say to (40, 1000), use the "resize" function, and then input your new data: >>> ds.resize((40,1000)) >>> ds.shape (40,1000) >>> ds[20:40] = newdata Accessing a scalar datasetUse the following syntax (borrowed from NumPy): data = mydataset[()] TypesCompound typesCompound types in h5py work just like ordinary NumPy compound types. Just create a standard NumPy dtype and use it as normal: >>> my_dtype = numpy.dtype([('field1', 'i'), ('field2', 'f')])
>>> ds = myfile.create_dataset('ds', (10,10), dtype=my_dtype)
>>> ds.dtype
dtype([('field1', '<i4'), ('field2', '<f4')])As an added bonus, you can index the dataset object using field names as well as indices: >>> ds[0,0] (0, 0.0) >>> ds[0,0,'field1'] 0 Variable-length stringsVariable-length strings in HDF5 are handled via NumPy "object" dtypes, with a small amount of additional metadata. Here's how to create a dataset of variable length strings: >>> str_type = h5py.new_vlen(str)
>>> ds = myfile.create_dataset('ds', shape, dtype=str_type)The string type object is a standard NumPy object dtype: >>> type(str_type) <type 'numpy.dtype'> >>> str_type.kind 'O' It can be used anywhere a NumPy "O" dtype is allowed, including in compound (recarray) types and array types. Warning: Variable-length strings in HDF5 are like C strings, in that they cannot contain embedded nulls. If a string contains NULL characters, only the portion of the string up to the first NULL will be saved. Enumerated typesLike variable-length strings, enumerated types are handled in h5py as "integer-plus-metadata" dtypes. They are created by a similar convenience function: >>> enum_type = h5py.new_enum('i', {'RED': 1, 'GREEN': 2, 'BLUE': 42})
>>> enum_type.kind
'i'
>>> ds = myfile.create_dataset('ds', shape, dtype=enum_type)To obtain the dictionary of enum values associated with one of these types, use the high-level convenience function get_enum: >>> h5py.get_enum(ds.dtype)
{'BLUE': 42, 'GREEN': 2, 'RED': 1}AttributesAttributes are small bits of data attached to any file-resident HDF5 object, including groups and datasets. Reading and writing attributesAttributes are accessed though a dict-like proxy attached to groups and datasets: mygroup.attrs['name'] = 42 They are created by direct assignment, with a type determined by the given value. The example above creates an integer attribute; the code grp.attrs['name'] = 1.3 will create a floating-point attribute. Any existing value is automatically overwritten. IterationAttribute proxies support the same dict-like methods as Group objects: >>> list_of_attrs = list(grp.attrs) # or grp.attrs.keys() >>> 'name' in grp.attrs True |