Module H5D¶

Provides access to the low-level HDF5 “H5D” dataset interface.

Functional API¶

h5py.h5d.open(ObjectID loc, STRING name, PropID dapl=None) → DatasetID¶

Open an existing dataset attached to a group or file object, by name.

If specified, dapl may be a dataset access property list.

h5py.h5d.create(objectID loc, STRING name or None, TypeID tid, SpaceID space, PropDCID dcpl=None, PropID lcpl=None) → DatasetID¶: Create a new dataset. If “name” is None, the dataset will be anonymous.

Dataset Objects¶

class h5py.h5d.DatasetID¶

Represents an HDF5 dataset identifier.

Objects of this class may be used in any HDF5 function which expects a dataset identifier. Also, all H5D* functions which take a dataset instance as their first argument are presented as methods of this class.

Properties: dtype: Numpy dtype representing the dataset type shape: Numpy-style shape tuple representing the dataspace rank: Integer giving dataset rank

Hashable: Yes, unless anonymous
Equality: True HDF5 identity if unless anonymous

chunk_iter(CALLABLE func, PropDXID dxpl=None) → <Return value from func>¶

Iterate over each chunk and invoke user-supplied “func” callable object. The “func” receives chunk information: logical offset, filter mask, file location, and size. Any not-None return value from “func” ends iteration.

Feature requires: HDF5 1.10.10 or any later 1.10: HDF5 1.12.3 or later

Added in version 3.8.

dtype¶: Numpy dtype object representing the dataset type

extend(TUPLE shape)¶: Extend the given dataset so it’s at least as big as “shape”. Note that a dataset may only be extended up to the maximum dimensions of its dataspace, which are fixed when the dataset is created.

flush()¶

no return

Flushes all buffers associated with a dataset to disk.

This function causes all buffers associated with a dataset to be immediately flushed to disk without removing the data from the cache.

Use this in SWMR write mode to allow readers to be updated with the dataset changes.

get_access_plist() → PropDAID¶: Create an return a new copy of the dataset access property list.

get_chunk_info(hsize_t index, SpaceID space=None) → StoreInfo¶: Retrieve storage information about a chunk specified by its index.

Added in version 3.0.

get_chunk_info_by_coord(TUPLE chunk_offset) → StoreInfo¶: Retrieve information about a chunk specified by the array address of the chunk’s first element in each dimension.

Added in version 3.0.

get_create_plist() → PropDCID¶: Create an return a new copy of the dataset creation property list used when this dataset was created.

get_num_chunks(SpaceID space=None) → INT num_chunks¶: Retrieve the number of chunks that have nonempty intersection with a specified dataspace. Currently, this function only gets the number of all written chunks, regardless of the dataspace.

Added in version 3.0.

get_offset() → LONG offset or None¶: Get the offset of this dataset in the file, in bytes, or None if it doesn’t have one. This is always the case for datasets which use chunked storage, compact datasets, and datasets for which space has not yet been allocated in the file.

get_space() → SpaceID¶: Create and return a new copy of the dataspace for this dataset.

get_space_status() → INT space_status_code¶

Determine if space has been allocated for a dataset. Return value is one of:

SPACE_STATUS_NOT_ALLOCATED
SPACE_STATUS_PART_ALLOCATED
SPACE_STATUS_ALLOCATED

get_storage_size() → LONG storage_size¶

Report the size of storage, in bytes, that is allocated in the file for the dataset’s raw data. The reported amount is the storage allocated in the written file, which will typically differ from the space required to hold a dataset in working memory (any associated HDF5 metadata is excluded).

For contiguous datasets, the returned size equals the current allocated size of the raw data. For unfiltered chunked datasets, the returned size is the number of allocated chunks times the chunk size. For filtered chunked datasets, the returned size is the space required to store the filtered data.

get_type() → TypeID¶: Create and return a new copy of the datatype for this dataset.

rank¶: Integer giving the dataset rank (0 = scalar)

read(SpaceID mspace, SpaceID fspace, NDARRAY arr_obj, TypeID mtype=None, PropDXID dxpl=None)¶

Read data from an HDF5 dataset into a Numpy array.

It is your responsibility to ensure that the memory dataspace provided is compatible with the shape of the Numpy array. Since a wide variety of dataspace configurations are possible, this is not checked. You can easily crash Python by reading in data from too large a dataspace.

If a memory datatype is not specified, one will be auto-created based on the array’s dtype.

The provided Numpy array must be writable and C-contiguous. If this is not the case, ValueError will be raised and the read will fail. Keyword dxpl may be a dataset transfer property list.

read_direct_chunk(offsets, PropID dxpl=None, out=None)¶

Reads data to a bytes array directly from a chunk at position specified by the offsets argument and bypasses any filters HDF5 would normally apply to the written data. However, the written data may be compressed or not.

Returns a tuple containing the filter_mask and the raw data storing this chunk as bytes if out is None, else as a memoryview.

filter_mask is a bit field of up to 32 values. It records which filters have been applied to this chunk, of the filter pipeline defined for that dataset. Each bit set to 1 means that the filter in the corresponding position in the pipeline was not applied to compute the raw data. So the default value of 0 means that all defined filters have been applied to the raw data.

If the out argument is not None, it must be a writeable contiguous 1D array-like of bytes (e.g., bytearray or numpy.ndarray) and large enough to contain the whole chunk.

refresh()¶

no return

Refreshes all buffers associated with a dataset.

This function causes all buffers associated with a dataset to be cleared and immediately re-loaded with updated contents from disk.

This function essentially closes the dataset, evicts all metadata associated with it from the cache, and then re-opens the dataset. The reopened dataset is automatically re-registered with the same ID.

Use this in SWMR read mode to poll for dataset changes.

set_extent(TUPLE shape)¶: Set the size of the dataspace to match the given shape. If the new size is larger in any dimension, it must be compatible with the maximum dataspace size.

shape¶: Numpy-style shape tuple representing the dataspace

write(SpaceID mspace, SpaceID fspace, NDARRAY arr_obj, TypeID mtype=None, PropDXID dxpl=None)¶

Write data from a Numpy array to an HDF5 dataset. Keyword dxpl may be a dataset transfer property list.

It is your responsibility to ensure that the memory dataspace provided is compatible with the shape of the Numpy array. Since a wide variety of dataspace configurations are possible, this is not checked. You can easily crash Python by writing data from too large a dataspace.

If a memory datatype is not specified, one will be auto-created based on the array’s dtype.

The provided Numpy array must be C-contiguous. If this is not the case, ValueError will be raised and the read will fail.

write_direct_chunk(offsets, data, uint32_t filter_mask=0x00000000, PropID dxpl=None)¶

This function bypasses any filters HDF5 would normally apply to written data. However, calling code may apply filters (e.g. gzip compression) itself before writing the data.

data is a Python object that implements the Py_buffer interface. In case of a ndarray the shape and dtype are ignored. It’s the user’s responsibility to make sure they are compatible with the dataset.

filter_mask is a bit field of up to 32 values. It records which filters have been applied to this chunk, of the filter pipeline defined for that dataset. Each bit set to 1 means that the filter in the corresponding position in the pipeline was not applied. So the default value of 0 means that all defined filters have been applied to the data before calling this function.

Module constants¶

Storage strategies¶

h5py.h5d.COMPACT¶

h5py.h5d.CONTIGUOUS¶

h5py.h5d.CHUNKED¶

Allocation times¶

h5py.h5d.ALLOC_TIME_DEFAULT¶

h5py.h5d.ALLOC_TIME_LATE¶

h5py.h5d.ALLOC_TIME_EARLY¶

h5py.h5d.ALLOC_TIME_INCR¶

Allocation status¶

h5py.h5d.SPACE_STATUS_NOT_ALLOCATED¶

h5py.h5d.SPACE_STATUS_PART_ALLOCATED¶

h5py.h5d.SPACE_STATUS_ALLOCATED¶

Fill time¶

h5py.h5d.FILL_TIME_ALLOC¶

h5py.h5d.FILL_TIME_NEVER¶

h5py.h5d.FILL_TIME_IFSET¶

Fill values¶

h5py.h5d.FILL_VALUE_UNDEFINED¶

h5py.h5d.FILL_VALUE_DEFAULT¶

h5py.h5d.FILL_VALUE_USER_DEFINED¶

Module H5D¶

Functional API¶

Dataset Objects¶

Module constants¶

Storage strategies¶

Allocation times¶

Allocation status¶

Fill time¶

Fill values¶

Table of Contents

Previous topic

Next topic

This Page