Pandas Read_csv Ignore Separator In Last Column
Solution 1:
I often approach these by writing my own little parser. In general there are ways to bend pandas to your will, but I find this way is often easier:
Code:
import re
defparse_my_file(filename):
withopen(filename) as f:
for line in f:
yield re.split(r'\s+', line.strip(), 8)
# build the generator
my_parser = parse_my_file('test.dat')
# first element returned is the columns
columns = next(my_parser)
# build the data frame
df = pd.DataFrame(my_parser, columns=columns)
print(df)
Results:
ID_OBS LAT LON ALT TP TO LT_min LT_max \
0 ALT_NOA_000 82.45 -62.52 210.0 FM 0 0.0 24.0
STATIONNAME
0 Alert, Nunavut, Canada
Solution 2:
Your pasted sample file is a bit ambiguous: it's not possible to tell by eye if something that looks like a few spaces is a tab or not, for example.
In general, though, note that plain old Python is more expressive than Pandas, or CSV modules (Pandas's strength is elseswhere). E.g., there are even Python modules for recursive-descent parsers, which Pandas obviously lacks. You can use regular Python to manipulate the file into an easier form for Pandas to parse. For example:
import re
>>> ['@'.join(re.split(r'[ \t]+', l.strip(), maxsplit=8)) for l inopen('stuff.tsv') if l.strip()]
['ID_OBS@LAT@LON@ALT@TP@TO@LT_min@LT_max@STATIONNAME',
'ALT_NOA_000@82.45@-62.52@210.0@FM@0@0.0@24.0@Alert, Nunavut, Canada']
changes the delimiter to '@'
, which, if you write back to a file, for example, you can parse using delimiter='@'
.
Post a Comment for "Pandas Read_csv Ignore Separator In Last Column"