Skip to content Skip to sidebar Skip to footer

Merge Dataframes That Have Indices That One Contains Another (but Not The Same)

For example df1 has shape (533, 2176), indices such as Elkford (5901003) DM 01010, df2 has shape (743, 12), indices such as 5901003; the number in the bracket of indices of df1 wil

Solution 1:

file1.csv:

,col_1,col_2
5901001,a,-15901002,b,-25901003,c,-35901004,d,-45901005,e,-55901006,f,-65901007,g,-75901008,h,-85901009,i,-95901010,k,-10

Here df1.shape = (10, 2).

file2.csv:

,col_3
Elkford (Part 1) (5901003) DM 01010,1
Ahia (5901004) DM 01010,2
Canada (01)   20000,4
Fork (5901005) DM 01010,3
England (34)   20000,4

Here df2.shape = (3, 1).

Run this script:

import re

import pandas as pd
import numpy as np


defextract_id(s):
    m = re.search('\((\d{7})\)', s)
    if m:
        returnint(m.group(1))


df1 = pd.read_csv('file1.csv', index_col=0)
df2 = pd.read_csv('file2.csv', index_col=0)


indexes = df2.index.map(extract_id)
mask = ~np.isnan(indexes)
# filter incorrect row (without id)
df2 = df2[mask]
# convert index
df2.index = indexes[mask]

df = pd.concat([df1, df2], axis=1)

print(df)

Output:

        col_1  col_2  col_3
5901001     a     -1NaN5901002     b     -2NaN5901003c-31.05901004     d     -42.05901005     e     -53.05901006     f     -6NaN5901007     g     -7NaN5901008     h     -8NaN5901009     i     -9NaN5901010     k    -10NaN

Here df.shape = (10, 2 + 1)

Post a Comment for "Merge Dataframes That Have Indices That One Contains Another (but Not The Same)"