Skip to content Skip to sidebar Skip to footer

Arrays Of Strings From Uproot

I have a tree with one branch storing a string. When I read using uproot.open() and then the method arrays() I get the following: >>> array_train['backtracked_end_process'

Solution 1:

String-handling is a weak point in uproot. It uses a custom ObjectArray (not even the StringArray in awkward-array), which generates bytes objects on demand. What you'd like is an array-of-strings class with == overloaded to mean "compare each variable-length string, broadcasting a single string to an array if necessary." Unfortunately, neither the uproot ObjectArray of strings nor the StringArray class in awkward-array do that yet.

So here's how you can do it, admittedly through an implicit Python for loop.

>>> import uproot, numpy
>>> f = uproot.open("http://scikit-hep.org/uproot/examples/sample-6.10.05-zlib.root")
>>> t = f["sample"]

>>> t["str"].array()
<ObjectArray [b'hey-0'b'hey-1'b'hey-2' ... b'hey-27'b'hey-28'b'hey-29'] at 0x7fe835b54588>

>>> numpy.array(list(t["str"].array()))
array([b'hey-0', b'hey-1', b'hey-2', b'hey-3', b'hey-4', b'hey-5',
       b'hey-6', b'hey-7', b'hey-8', b'hey-9', b'hey-10', b'hey-11',
       b'hey-12', b'hey-13', b'hey-14', b'hey-15', b'hey-16', b'hey-17',
       b'hey-18', b'hey-19', b'hey-20', b'hey-21', b'hey-22', b'hey-23',
       b'hey-24', b'hey-25', b'hey-26', b'hey-27', b'hey-28', b'hey-29'],
      dtype='|S6')

>>> numpy.array(list(t["str"].array())) == b"hey-0"
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False])

The loop is implicit in the list constructor that iterates over the ObjectArray, turning each element into a bytes string. This Python list is not good for array-at-a-time operations, so we then construct a NumPy array, which is (at a cost of padding).

Alternative, probably better:

While writing this, I remembered that uproot's ObjectArray is implemented using an awkward JaggedArray, so the transformation above can be performed with JaggedArray's regular method, which is probably much faster (no intermediate Python bytes objects, no Python for loop).

>>> t["str"].array().regular()
array([b'hey-0', b'hey-1', b'hey-2', b'hey-3', b'hey-4', b'hey-5',
       b'hey-6', b'hey-7', b'hey-8', b'hey-9', b'hey-10', b'hey-11',
       b'hey-12', b'hey-13', b'hey-14', b'hey-15', b'hey-16', b'hey-17',
       b'hey-18', b'hey-19', b'hey-20', b'hey-21', b'hey-22', b'hey-23',
       b'hey-24', b'hey-25', b'hey-26', b'hey-27', b'hey-28', b'hey-29'],
      dtype=object)

>>> t["str"].array().regular() == b"hey-0"
array([ True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False])

(The functionality described above wasn't created intentionally, but it works because the right pieces compose in a fortuitous way.)

Post a Comment for "Arrays Of Strings From Uproot"