Merge Dataframes That Have Indices That One Contains Another (but Not The Same)
For example df1 has shape (533, 2176), indices such as Elkford (5901003) DM 01010, df2 has shape (743, 12), indices such as 5901003; the number in the bracket of indices of df1 wil
Solution 1:
file1.csv:
,col_1,col_2
5901001,a,-15901002,b,-25901003,c,-35901004,d,-45901005,e,-55901006,f,-65901007,g,-75901008,h,-85901009,i,-95901010,k,-10
Here df1.shape = (10, 2)
.
file2.csv:
,col_3
Elkford (Part 1) (5901003) DM 01010,1
Ahia (5901004) DM 01010,2
Canada (01) 20000,4
Fork (5901005) DM 01010,3
England (34) 20000,4
Here df2.shape = (3, 1)
.
Run this script:
import re
import pandas as pd
import numpy as np
defextract_id(s):
m = re.search('\((\d{7})\)', s)
if m:
returnint(m.group(1))
df1 = pd.read_csv('file1.csv', index_col=0)
df2 = pd.read_csv('file2.csv', index_col=0)
indexes = df2.index.map(extract_id)
mask = ~np.isnan(indexes)
# filter incorrect row (without id)
df2 = df2[mask]
# convert index
df2.index = indexes[mask]
df = pd.concat([df1, df2], axis=1)
print(df)
Output:
col_1 col_2 col_3
5901001 a -1NaN5901002 b -2NaN5901003c-31.05901004 d -42.05901005 e -53.05901006 f -6NaN5901007 g -7NaN5901008 h -8NaN5901009 i -9NaN5901010 k -10NaN
Here df.shape = (10, 2 + 1)
Post a Comment for "Merge Dataframes That Have Indices That One Contains Another (but Not The Same)"