How Can I Merge A Pandas Dataframes Based On A Substring From One Of The Columns?
I have 2 dataframes: df1 and df2 df1 School Conference 0 Air Force Mt. West 1 Akron MAC 2 Alabama at Birmingham C-USA
Solution 1:
You can use list comprehension to check if the columns from each dataframe are in
each other (you also compare case-insensitively) and then merge:
df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
if x in y or y in x]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[1]:
School Conference RATE
0 Air Force Mt. West 53.0
1 Akron MAC 77.0
2 Alabama at Birmingham C-USA 75.0
3 Auburn Sun Belt 93.0
You could also search case-insensitively by adding .lower()
to x
and y
:
df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
if x.lower() in y.lower()
or y.lower() in x.lower()]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[2]:
School Conference RATE
0 Air Force Mt. West 53.0
1 Akron MAC 77.0
2 Alabama at Birmingham C-USA 75.0
3 Auburn Sun Belt 93.0
Single line of code per comment:
df1 = (df1.assign(SCHOOL_NAME = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
if x.lower() in y.lower()
or y.lower() in x.lower()]).str[0])
.merge(df2, how='left').drop('SCHOOL_NAME', axis=1))
df1
Out[3]:
School Conference RATE
0 Air Force Mt. West 53.0
1 Akron MAC 77.0
2 Alabama at Birmingham C-USA 75.0
3 Auburn Sun Belt 93.0
Post a Comment for "How Can I Merge A Pandas Dataframes Based On A Substring From One Of The Columns?"