Skip to content Skip to sidebar Skip to footer

Pandas, Turn List Of Lists Of Tuples Into Dataframe Awkward Column Headers.

I have data from parsed addresses that I obtained from the usaddress python library: https://github.com/datamade/usaddress The data is a list of lists of tuples. Each address has a

Solution 1:

Assuming the following:

  • You use usaddress.tag
  • have ways to handle the errors that may be raised from usaddress.tag
  • only want the first part of the return from usaddress.tag

Then, you can do the following

import usaddress
import pandas as pd

# your list of addresses dataframe
df = pd.read_csv('PATH_TO_ADDRESS_CSV')

# list of orderedDict
ordered_dicts = []

# loop through addresses and get respective informationfor index, row in df.iterrows():
    # here you should try/except for cases that fail
    addr = usaddress.tag(row['FullAddress'])

    # append to list
    ordered_dicts.append(addr[0])

# **get all relevant keys in your list
cols = set().union(*(d.keys() for d in ordered_dicts))

# create new dataframe
df_new = pd.DataFrame(ordered_dicts, columns=cols)

df_new.to_csv('PATH_TO_DESIRED_CSV_ENDPOINT')

The ** represents an alternative solution to this part of the function. Because we know exactly all the columns that the .tag function can return, you can just initially set the columns as such (see all tags here and API here):

cols = ['AddressNumberPrefix', 'AddressNumber', ...]

I hope this helps! Know that when you do pd.DataFrame with dictionaries and specify exact columns, it will automatically fill in the non-existing keys with pd.NaN.

Post a Comment for "Pandas, Turn List Of Lists Of Tuples Into Dataframe Awkward Column Headers."