Pandas, Turn List Of Lists Of Tuples Into Dataframe Awkward Column Headers.
I have data from parsed addresses that I obtained from the usaddress python library: https://github.com/datamade/usaddress The data is a list of lists of tuples. Each address has a
Solution 1:
Assuming the following:
- You use
usaddress.tag
- have ways to handle the errors that may be raised from
usaddress.tag
- only want the first part of the return from
usaddress.tag
Then, you can do the following
import usaddress
import pandas as pd
# your list of addresses dataframe
df = pd.read_csv('PATH_TO_ADDRESS_CSV')
# list of orderedDict
ordered_dicts = []
# loop through addresses and get respective informationfor index, row in df.iterrows():
# here you should try/except for cases that fail
addr = usaddress.tag(row['FullAddress'])
# append to list
ordered_dicts.append(addr[0])
# **get all relevant keys in your list
cols = set().union(*(d.keys() for d in ordered_dicts))
# create new dataframe
df_new = pd.DataFrame(ordered_dicts, columns=cols)
df_new.to_csv('PATH_TO_DESIRED_CSV_ENDPOINT')
The **
represents an alternative solution to this part of the function. Because we know exactly all the columns that the .tag
function can return, you can just initially set the columns as such (see all tags here and API here):
cols = ['AddressNumberPrefix', 'AddressNumber', ...]
I hope this helps! Know that when you do pd.DataFrame
with dictionaries and specify exact columns, it will automatically fill in the non-existing keys with pd.NaN
.
Post a Comment for "Pandas, Turn List Of Lists Of Tuples Into Dataframe Awkward Column Headers."