Arrays Of Strings From Uproot
Solution 1:
String-handling is a weak point in uproot. It uses a custom ObjectArray
(not even the StringArray
in awkward-array), which generates bytes
objects on demand. What you'd like is an array-of-strings class with ==
overloaded to mean "compare each variable-length string, broadcasting a single string to an array if necessary." Unfortunately, neither the uproot ObjectArray
of strings nor the StringArray
class in awkward-array do that yet.
So here's how you can do it, admittedly through an implicit Python for loop.
>>> import uproot, numpy
>>> f = uproot.open("http://scikit-hep.org/uproot/examples/sample-6.10.05-zlib.root")
>>> t = f["sample"]
>>> t["str"].array()
<ObjectArray [b'hey-0'b'hey-1'b'hey-2' ... b'hey-27'b'hey-28'b'hey-29'] at 0x7fe835b54588>
>>> numpy.array(list(t["str"].array()))
array([b'hey-0', b'hey-1', b'hey-2', b'hey-3', b'hey-4', b'hey-5',
b'hey-6', b'hey-7', b'hey-8', b'hey-9', b'hey-10', b'hey-11',
b'hey-12', b'hey-13', b'hey-14', b'hey-15', b'hey-16', b'hey-17',
b'hey-18', b'hey-19', b'hey-20', b'hey-21', b'hey-22', b'hey-23',
b'hey-24', b'hey-25', b'hey-26', b'hey-27', b'hey-28', b'hey-29'],
dtype='|S6')
>>> numpy.array(list(t["str"].array())) == b"hey-0"
array([ True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False])
The loop is implicit in the list
constructor that iterates over the ObjectArray
, turning each element into a bytes
string. This Python list is not good for array-at-a-time operations, so we then construct a NumPy array, which is (at a cost of padding).
Alternative, probably better:
While writing this, I remembered that uproot's ObjectArray
is implemented using an awkward JaggedArray
, so the transformation above can be performed with JaggedArray
's regular
method, which is probably much faster (no intermediate Python bytes
objects, no Python for loop).
>>> t["str"].array().regular()
array([b'hey-0', b'hey-1', b'hey-2', b'hey-3', b'hey-4', b'hey-5',
b'hey-6', b'hey-7', b'hey-8', b'hey-9', b'hey-10', b'hey-11',
b'hey-12', b'hey-13', b'hey-14', b'hey-15', b'hey-16', b'hey-17',
b'hey-18', b'hey-19', b'hey-20', b'hey-21', b'hey-22', b'hey-23',
b'hey-24', b'hey-25', b'hey-26', b'hey-27', b'hey-28', b'hey-29'],
dtype=object)
>>> t["str"].array().regular() == b"hey-0"
array([ True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False])
(The functionality described above wasn't created intentionally, but it works because the right pieces compose in a fortuitous way.)
Post a Comment for "Arrays Of Strings From Uproot"