Numpy.concatenate On Record Arrays Fails When Array Has Different Length Strings
Solution 1:
To post a complete answer. As Pierre GM suggested the module:
import numpy.lib.recfunctions
gives a solution. The function that does what you want however is:
numpy.lib.recfunctions.stack_arrays((a,b), autoconvert=True, usemask=False)
(usemask=False
is just to avoid creation of a masked array, which you are probably not using. The important thing is autoconvert=True
to force the conversion from a
's dtype
"|S3"
to "|S5"
).
Solution 2:
Would numpy.lib.recfunctions.merge_arrays
work for you ? recfunctions
is a little known subpackage that hasn't been advertised a lot, it's a bit clunky but could be useful sometimes.
Solution 3:
When you do not specify the dtype, np.rec.fromarrays
(aka np.core.records.fromarrays
) tries to guess the dtype for you. Hence,
In [4]: a = np.core.records.fromarrays( ([1,2], ["one","two"]) )
In [5]: a
Out[5]:
rec.array([(1, 'one'), (2, 'two')],
dtype=[('f0', '<i4'), ('f1', '|S3')])
Notice the dtype of the f1
column is a 3-byte string.
You can't concatenate np.concatenate( (a,b) )
because numpy sees the dtypes of a
and b
are different and doesn't change the dtype of the smaller string to match the larger string.
If you know a maximum string size that would work with all your arrays, you could specify the dtype from the beginning:
In [9]: a = np.rec.fromarrays( ([1,2], ["one","two"]), dtype = [('f0', '<i4'), ('f1', '|S8')])
In [10]: b = np.core.records.fromarrays( ([3,4,5], ["three","four","three"]), dtype = [('f0', '<i4'), ('f1', '|S8')])
and then concatenation will work as desired:
In [11]: np.concatenate( (a,b))
Out[11]:
array([(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'three')],
dtype=[('f0', '<i4'), ('f1', '|S8')])
If you do not know in advance the maximum length of the strings, you could specify the dtype as 'object':
In [35]: a = np.core.records.fromarrays( ([1,2], ["one","two"]), dtype = [('f0', '<i4'), ('f1', 'object')])
In [36]: b = np.core.records.fromarrays( ([3,4,5], ["three","four","three"]), dtype = [('f0', '<i4'), ('f1', 'object')])
In [37]: np.concatenate( (a,b))
Out[37]:
array([(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'three')],
dtype=[('f0', '<i4'), ('f1', '|O4')])
This will not be as space-efficient as a dtype of '|Sn'
(for some integer n
), but at least it will allow you to perform the concatenate
operation.
Post a Comment for "Numpy.concatenate On Record Arrays Fails When Array Has Different Length Strings"