Skip to content Skip to sidebar Skip to footer

Get Week Start Date (monday) From A Date Column In Python (pandas)?

I have seen a lot of posts about how you can do it with a date string but I am trying something for a dataframe column and haven't got any luck so far. My current method is : Get t

Solution 1:

Another alternative:

df['week_start'] = df['myday'].dt.to_period('W').apply(lambda r: r.start_time)

This will set 'week_start' to be the first Monday before the time in 'myday'.

Solution 2:

While both @knightofni's and @Paul's solutions work I tend to try to stay away from using apply in Pandas because it is usually quite slow compared to array-based methods. In order to avoid this, after casting to a datetime column (via pd.to_datetime) we can modify the weekday based method and simply cast the day of the week to be a numpy timedelta64[D] by either casting it directly:

df['week_start'] = df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')

or by using to_timedelta as @ribitskiyb suggested:

df['week_start'] = df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D'). 

Using test data with 60,000 datetimes I got the following times using the suggested answers using the newly released Pandas 1.0.1.

%timeit df.apply(lambda x: x['myday'] - datetime.timedelta(days=x['myday'].weekday()), axis=1)>>> 1.33 s ± 28.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df['myday'].dt.to_period('W').apply(lambda r: r.start_time)>>> 5.59 ms ± 138 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - df['myday'].dt.weekday.astype('timedelta64[D]')>>> 3.44 ms ± 106 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit df['myday'] - pd.to_timedelta(df['myday'].dt.weekday, unit='D')>>> 3.47 ms ± 170 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

These results show that Pandas 1.0.1 has dramatically improved the speed of the to_period apply based method (vs Pandas <= 0.25) but show that converting directly to a timedelta (by either casting the type directly .astype('timedelta64[D]') or using pd.to_timedelta is still superior. Based on these results I would suggest using pd.to_timedelta going forward.

Solution 3:

(Just adding to n8yoder's answer)

Using .astype('timedelta64[D]') seems not so readable to me -- found an alternative using just the functionality of pandas:

df['myday'] - pd.to_timedelta(arg=df['myday'].dt.weekday, unit='D')

Solution 4:

it fails because pd.DateOffset expects a single integer as a parameter (and you are feeding it an array). You can only use DateOffset to change a date column by the same offset.

try this :

import datetime as dt
# Change 'myday' to contains dates as datetime objectsdf['myday'] = pd.to_datetime(df['myday'])  
# 'daysoffset' will container the weekday, as integersdf['daysoffset'] = df['myday'].apply(lambda x: x.weekday())
# We apply, row by row (axis=1) a timedelta operationdf['week_start'] = df.apply(lambda x: x['myday'] - dt.TimeDelta(days=x['daysoffset']), axis=1)

I haven't actually tested this code, (there was no sample data), but that should work for what you have described.

However, you might want to look at pandas.Resample, which might provide a better solution - depending on exactly what you are looking for.

Solution 5:

from datetime import datetime, timedelta

# Convert column to pandas datetime equivalentdf['myday'] = pd.to_datetime(df['myday']) 

# Create function to calculate Start Week date
week_start_date = lambda date: date - timedelta(days=date.weekday())

# Apply above function on DataFrame columndf['week_start_date'] = df['myday'].apply(week_start_date)

Post a Comment for "Get Week Start Date (monday) From A Date Column In Python (pandas)?"