Skip to content Skip to sidebar Skip to footer

How To Save H5py Arrays With Different Sizes?

I am referring this question to this. I am making this new thread because I did not really understand the answer given there and hopefully there is someone who could explain it mor

Solution 1:

The basic elements in a HDF5 file are groups (similar to directories) and datasets (similar to arrays).

NumPy will create an array with a lot of different inputs. When one attempts to create an array from disparate elements (i.e. different lengths), NumPy returns an array of type 'O'. Look for object_ in the NumPy reference guide. Then, there is little advantage to use NumPy as this resembles a standard Python list.

HDF5 cannot store arrays of type 'O' because it does not have generic datatypes (only some support for C struct type objects).

The most obvious solution to your problem is to store your data in HDF5 dataset, with "one dataset" per table. You retain the advantage of collecting the data in a single file and you have "dict-like" access to the elements.

Try the following code:

import numpy as np
import h5py
import glob

path="/home/ling/test/"

def runtest():
    h5f = h5py.File("/home/ling/test/2test.h5", "w") 
    h5f.create_group('data1')
    h5f.create_group('data2')

    [h5f.create_dataset(file[:-4], data=np.loadtxt(file)) for file in glob.glob(path + "data1/*.csv")]
    [h5f.create_dataset(file[:-4], data=np.loadtxt(file)) for file in glob.glob(path + "data2/*.csv")]

    h5f.close()

For reading:

h5f = h5py.File("/home/ling/test/2test.h5", "r")
test_data = h5f['data1/thefirstfilenamewithoutcsvextension'][:]

Post a Comment for "How To Save H5py Arrays With Different Sizes?"