Matching Elements Of Pandas Column With Column Of Another Pandas Dataframe
I have a pandas dataframe A with column keywords as :- keywords ['loans','mercedez','bugatti','a4'] ['trump','usa','election','president'] ['galaxy','7s','canon','macbook'] ['
Solution 1:
One way is to use pandas.transform:
import pandas as pd
A = pd.DataFrame({'keywords': [['loans','mercedez','bugatti','a4'],
['trump','usa','election','president']]})
B = pd.DataFrame({'category': ['audi', 'finance'],
'words': ['audi a4,audi a6', 'sales,loans,sales price']})
def match_category_to_keywords(kws):
ret = []
for kw in kws:
m = B['words'].transform(lambda words: any([kw in w for w in words.split(',')]))
ret.extend(B['category'].loc[m].tolist())
return pd.np.unique(ret)
A['matched_category'] = A['keywords'].transform(lambda kws: match_category_to_keywords(kws))
print(A)
Output:
keywords matched_category
0 [loans, mercedez, bugatti, a4] [audi, finance]
1 [trump, usa, election, president] []
Solution 2:
I hope you can use:
#create dictionary by split comma and whitespaces
d = df2.set_index('category')['words'].str.split(',\s*|\s+').to_dict()
#flatten lists to dictionary
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
print (d1)
{'audi': 'audi', 'a4': 'audi', 'a6': 'audi', 'bugatti': 'bugatti',
'veyron': 'bugatti', 'chiron': 'bugatti', 'mercedez': 'mercedez',
's-class': 'mercedez', 'e-class': 'mercedez', 'canon': 'dslr',
'nikon': 'dslr', 'iphone': 'apple', '7s': 'apple', '6s': 'apple',
'5': 'apple', 'sales': 'finance', 'loans': 'finance', 'price': 'finance',
'donald': 'politics', 'trump': 'politics', 'election': 'politics',
'votes': 'politics', 'spiderman': 'entertainment', 'captain': 'entertainment',
'america': 'entertainment', 'ironmen': 'entertainment', 'justin': 'music',
'beiber': 'music', 'rihana': 'music', 'drake': 'music'}
#for each value map in nested list comprehension
df1['new'] = [[d1.get(y, None) for y in x if y in d1] for x in df1['keywords']]
print (df1)
keywords \
0 [loans, mercedez, bugatti, a4]
1 [trump, usa, election, president]
2 [galaxy, 7s, canon, macbook]
3 [beiber, spiderman, marvels, ironmen]
new
0 [finance, mercedez, bugatti, audi]
1 [politics, politics]
2 [apple, dslr]
3 [music, entertainment, entertainment]
Post a Comment for "Matching Elements Of Pandas Column With Column Of Another Pandas Dataframe"