Skip to content Skip to sidebar Skip to footer

How To Extract A Dataframe Using Start And End Dates With Pandas

How can we extract the DataFrame using start and end dates and achieve this output? Input id start end 1 2009 2014 2 2010 2012 Output id data 1 2009 1 2010 1 2011

Solution 1:

create the enumeration of dates between years grouped by ['id']. Additional reformatting of the index is optional

import numpy as np
import pandas as pd
melted = df.groupby('id').apply(lambda x:pd.Series(np.arange(x['start'],x['end']+1)))

melted.index = melted.index.droplevel(1)

id120091201012011120121201312014220102201122012

Solution 2:

Use:

df1 = (pd.concat([pd.Series(r.id,np.arange(r.start, r.end + 1)) for r in df.itertuples()])
        .reset_index())
df1.columns = ['data','id']
df1 = df1[['id','data']]print (df1)
   id  data
012009112010212011312012412013512014622010722011822012

Solution 3:

A little bit hard to understand,I think this should be slightly faster than apply

By using reindex and repeat

df.reindex(df.index.repeat(df['end']-df['start']+1)).assign(year=lambdax :x['start']+x.groupby('id').cumcount())Out[453]:idstartendyear012009  2014  2009012009  2014  2010012009  2014  2011012009  2014  2012012009  2014  2013012009  2014  2014122010  2012  2010122010  2012  2011122010  2012  2012

Post a Comment for "How To Extract A Dataframe Using Start And End Dates With Pandas"