Skip to content Skip to sidebar Skip to footer

NumPy: Using Loadtxt Or Genfromtxt To Read A Ragged Structure

I need to read an ASCII file into Python, where an excerpt of the file looks like this: E M S T N... ... 9998 1 1 128 10097 10098 10199 10198 20298 20299 20400 20399 9999 1

Solution 1:

You do need a custom "split-cast" for loop, as far as I know.

In fact, NumPy can read nested structures like yours, but they must have a fixed shape, like in

numpy.loadtxt('data.txt', dtype=[ ('time', np.uint64), ('pos', [('x', np.float), ('y', np.float)]) ])

When trying to read your data with the dtype that you need, NumPy only reads the first number of each tuple:

dt=[('E', '<i4'), ('M', '<i4'), ('S', '<i4'), ('T', '<i4'), ('N', '|O4')]
print numpy.loadtxt('data.txt', dtype=dt)

thus prints

[(9998, 1, 1, 128, '10097')
 (9999, 1, 1, 128, '10098')
 (10000, 1, 1, 128, '10099')…]

So, I would say go ahead and use a for loop instead of numpy.loadtxt().

You might also use an intermediate approach that might be faster: you let NumPy load the file with the above code, and then you manually "correct" the 'N' field:

dt=[('E', '<i4'), ('M', '<i4'), ('S', '<i4'), ('T', '<i4'), ('N', '|O4')]
arr = numpy.loadtxt('data.txt', dtype=dt)  # Correctly reads the first 4 columns

with open('data.txt') as input_file:
    for (line_num, line) in enumerate(input_file):
        arr[line_num]['N'] = tuple(int(x) for x in line.split()[4:])  # Manual setting of the tuple column

This approach might be faster than parsing the whole array in a for loop. This produces the result you want:

[(9998, 1, 1, 128, (10097, 10098, 10199, 10198, 20298, 20299, 20400, 20399))
 (9999, 1, 1, 128, (10098, 10099, 10200, 10199, 20299, 20300, 20401, 20400))
 (10000, 1, 1, 128, (10099, 10100, 10201, 10200, 20300, 20301, 20402, 20401))
 (10001, 1, 2, 44, (2071, 2172, 12373, 12272))
 (10002, 1, 2, 44, (2172, 2273, 12474, 1237))]

Post a Comment for "NumPy: Using Loadtxt Or Genfromtxt To Read A Ragged Structure"