Skip to content Skip to sidebar Skip to footer

Pandas Read_csv Ignore Separator In Last Column

I have a file with the following structure (first row is the header, filename is test.dat): ID_OBS LAT LON ALT TP TO LT_min LT_max STATIONNAME ALT_NOA_000 82.45

Solution 1:

I often approach these by writing my own little parser. In general there are ways to bend pandas to your will, but I find this way is often easier:

Code:

import re

defparse_my_file(filename):
    withopen(filename) as f:
        for line in f:
            yield re.split(r'\s+', line.strip(), 8)

# build the generator        
my_parser = parse_my_file('test.dat')

# first element returned is the columns
columns = next(my_parser)

# build the data frame
df = pd.DataFrame(my_parser, columns=columns)
print(df)

Results:

        ID_OBS    LAT     LON    ALT  TP TO LT_min LT_max  \
0  ALT_NOA_000  82.45  -62.52  210.0  FM  0    0.0   24.0   
              STATIONNAME  
0  Alert, Nunavut, Canada 

Solution 2:

Your pasted sample file is a bit ambiguous: it's not possible to tell by eye if something that looks like a few spaces is a tab or not, for example.

In general, though, note that plain old Python is more expressive than Pandas, or CSV modules (Pandas's strength is elseswhere). E.g., there are even Python modules for recursive-descent parsers, which Pandas obviously lacks. You can use regular Python to manipulate the file into an easier form for Pandas to parse. For example:

import re
>>> ['@'.join(re.split(r'[ \t]+', l.strip(), maxsplit=8)) for l inopen('stuff.tsv') if l.strip()]
['ID_OBS@LAT@LON@ALT@TP@TO@LT_min@LT_max@STATIONNAME',
 'ALT_NOA_000@82.45@-62.52@210.0@FM@0@0.0@24.0@Alert, Nunavut, Canada']

changes the delimiter to '@', which, if you write back to a file, for example, you can parse using delimiter='@'.

Post a Comment for "Pandas Read_csv Ignore Separator In Last Column"