Module H5D

Provides access to the low-level HDF5 “H5D” dataset interface.

Functional API

h5py.h5d.open(ObjectID loc, STRING name, PropID dapl=None) DatasetID

Open an existing dataset attached to a group or file object, by name.

If specified, dapl may be a dataset access property list.

h5py.h5d.create(objectID loc, STRING name or None, TypeID tid, SpaceID space, PropDCID dcpl=None, PropID lcpl=None) DatasetID

Create a new dataset. If “name” is None, the dataset will be anonymous.

Dataset Objects

class h5py.h5d.DatasetID

Represents an HDF5 dataset identifier.

Objects of this class may be used in any HDF5 function which expects a dataset identifier. Also, all H5D* functions which take a dataset instance as their first argument are presented as methods of this class.

Properties: dtype: Numpy dtype representing the dataset type shape: Numpy-style shape tuple representing the dataspace rank: Integer giving dataset rank

  • Hashable: Yes, unless anonymous

  • Equality: True HDF5 identity if unless anonymous

dtype

Numpy dtype object representing the dataset type

extend(TUPLE shape)

Extend the given dataset so it’s at least as big as “shape”. Note that a dataset may only be extended up to the maximum dimensions of its dataspace, which are fixed when the dataset is created.

flush()

no return

Flushes all buffers associated with a dataset to disk.

This function causes all buffers associated with a dataset to be immediately flushed to disk without removing the data from the cache.

Use this in SWMR write mode to allow readers to be updated with the dataset changes.

Feature requires: 1.9.178 HDF5

get_access_plist() PropDAID

Create an return a new copy of the dataset access property list.

get_chunk_info(hsize_t index, SpaceID space=None) StoreInfo

Retrieve storage information about a chunk specified by its index.

Added in version 3.0.

get_chunk_info_by_coord(TUPLE chunk_offset) StoreInfo

Retrieve information about a chunk specified by the array address of the chunk’s first element in each dimension.

Added in version 3.0.

get_create_plist() PropDCID

Create an return a new copy of the dataset creation property list used when this dataset was created.

get_num_chunks(SpaceID space=None) INT num_chunks

Retrieve the number of chunks that have nonempty intersection with a specified dataspace. Currently, this function only gets the number of all written chunks, regardless of the dataspace.

Added in version 3.0.

get_offset() LONG offset or None

Get the offset of this dataset in the file, in bytes, or None if it doesn’t have one. This is always the case for datasets which use chunked storage, compact datasets, and datasets for which space has not yet been allocated in the file.

get_space() SpaceID

Create and return a new copy of the dataspace for this dataset.

get_space_status() INT space_status_code

Determine if space has been allocated for a dataset. Return value is one of:

get_storage_size() LONG storage_size

Report the size of storage, in bytes, that is allocated in the file for the dataset’s raw data. The reported amount is the storage allocated in the written file, which will typically differ from the space required to hold a dataset in working memory (any associated HDF5 metadata is excluded).

For contiguous datasets, the returned size equals the current allocated size of the raw data. For unfiltered chunked datasets, the returned size is the number of allocated chunks times the chunk size. For filtered chunked datasets, the returned size is the space required to store the filtered data.

get_type() TypeID

Create and return a new copy of the datatype for this dataset.

rank

Integer giving the dataset rank (0 = scalar)

read(SpaceID mspace, SpaceID fspace, NDARRAY arr_obj, TypeID mtype=None, PropDXID dxpl=None)

Read data from an HDF5 dataset into a Numpy array.

It is your responsibility to ensure that the memory dataspace provided is compatible with the shape of the Numpy array. Since a wide variety of dataspace configurations are possible, this is not checked. You can easily crash Python by reading in data from too large a dataspace.

If a memory datatype is not specified, one will be auto-created based on the array’s dtype.

The provided Numpy array must be writable and C-contiguous. If this is not the case, ValueError will be raised and the read will fail. Keyword dxpl may be a dataset transfer property list.

read_direct_chunk(offsets, PropID dxpl=None, out=None)

Reads data to a bytes array directly from a chunk at position specified by the offsets argument and bypasses any filters HDF5 would normally apply to the written data. However, the written data may be compressed or not.

Returns a tuple containing the filter_mask and the raw data storing this chunk as bytes if out is None, else as a memoryview.

filter_mask is a bit field of up to 32 values. It records which filters have been applied to this chunk, of the filter pipeline defined for that dataset. Each bit set to 1 means that the filter in the corresponding position in the pipeline was not applied to compute the raw data. So the default value of 0 means that all defined filters have been applied to the raw data.

If the out argument is not None, it must be a writeable contiguous 1D array-like of bytes (e.g., bytearray or numpy.ndarray) and large enough to contain the whole chunk.

refresh()

no return

Refreshes all buffers associated with a dataset.

This function causes all buffers associated with a dataset to be cleared and immediately re-loaded with updated contents from disk.

This function essentially closes the dataset, evicts all metadata associated with it from the cache, and then re-opens the dataset. The reopened dataset is automatically re-registered with the same ID.

Use this in SWMR read mode to poll for dataset changes.

Feature requires: 1.9.178 HDF5

set_extent(TUPLE shape)

Set the size of the dataspace to match the given shape. If the new size is larger in any dimension, it must be compatible with the maximum dataspace size.

shape

Numpy-style shape tuple representing the dataspace

write(SpaceID mspace, SpaceID fspace, NDARRAY arr_obj, TypeID mtype=None, PropDXID dxpl=None)

Write data from a Numpy array to an HDF5 dataset. Keyword dxpl may be a dataset transfer property list.

It is your responsibility to ensure that the memory dataspace provided is compatible with the shape of the Numpy array. Since a wide variety of dataspace configurations are possible, this is not checked. You can easily crash Python by writing data from too large a dataspace.

If a memory datatype is not specified, one will be auto-created based on the array’s dtype.

The provided Numpy array must be C-contiguous. If this is not the case, ValueError will be raised and the read will fail.

write_direct_chunk(offsets, data, uint32_t filter_mask=0x00000000, PropID dxpl=None)

This function bypasses any filters HDF5 would normally apply to written data. However, calling code may apply filters (e.g. gzip compression) itself before writing the data.

data is a Python object that implements the Py_buffer interface. In case of a ndarray the shape and dtype are ignored. It’s the user’s responsibility to make sure they are compatible with the dataset.

filter_mask is a bit field of up to 32 values. It records which filters have been applied to this chunk, of the filter pipeline defined for that dataset. Each bit set to 1 means that the filter in the corresponding position in the pipeline was not applied. So the default value of 0 means that all defined filters have been applied to the data before calling this function.

Module constants

Storage strategies

h5py.h5d.COMPACT
h5py.h5d.CONTIGUOUS
h5py.h5d.CHUNKED

Allocation times

h5py.h5d.ALLOC_TIME_DEFAULT
h5py.h5d.ALLOC_TIME_LATE
h5py.h5d.ALLOC_TIME_EARLY
h5py.h5d.ALLOC_TIME_INCR

Allocation status

h5py.h5d.SPACE_STATUS_NOT_ALLOCATED
h5py.h5d.SPACE_STATUS_PART_ALLOCATED
h5py.h5d.SPACE_STATUS_ALLOCATED

Fill time

h5py.h5d.FILL_TIME_ALLOC
h5py.h5d.FILL_TIME_NEVER
h5py.h5d.FILL_TIME_IFSET

Fill values

h5py.h5d.FILL_VALUE_UNDEFINED
h5py.h5d.FILL_VALUE_DEFAULT
h5py.h5d.FILL_VALUE_USER_DEFINED