1

I have a dataset with logins and date and time of users' posts.

posts = {'Login':['User1', 'User2', 'User2', 'User1', 'User2', 'User1', 'User2', 'User2'], 'Posted':['17.02.2020 12:32', '19.02.2020 10:11', '21.02.2020 07:08', '22.02.2020 14:00', '23.02.2020 11:02', '25.02.2020 18:19', '27.02.2020 00:03', '29.02.2020 15:56']}
df_posts = pd.DataFrame(posts)

    Login   Posted
0   User1   17.02.2020 12:32
1   User2   19.02.2020 10:11
2   User2   21.02.2020 07:08
3   User1   22.02.2020 14:00
4   User2   23.02.2020 11:02
5   User1   25.02.2020 18:19
6   User2   27.02.2020 00:03
7   User2   29.02.2020 15:56

I need to calculate an average number of posts made for each user. For example, for this dataset User1 made 2 posts during a week from Feb 17 to Feb 23 and 1 post during the next week (Feb 24 - Mar 1). So, on average, User1 does 1.5 posts per week.

I need to get the following result:

    Login   Average number of posts per week
0   User1   1.5
1   User2   2.5

I tried to write the following code:

# Assigning a week column to each post
posts['Week'] = posts['Posted'].dt.strftime('%U')

# Calculating total number of posts per week for each user
total_posts = posts.groupby(['Login', 'Week']).size().reset_index(name ='Total_posts')

# Estimating average frequency of making posts for each user
freq_posts_per_week = total_posts.groupby('Login')['Total_posts'].mean().reset_index(name ='Avg_posts_per_week')

But I got the following error:

/Users/username/opt/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

Also, my results look wrong for me.

Could anyone help me to fix the problem?

Aigerim
  • 71
  • 1
  • 5
  • That is a warning, not an error - which means your code got executed but there might be unexpected results – Mortz Feb 20 '20 at 09:15

1 Answers1

1

Try:

df_posts.Posted = pd.to_datetime(df_posts.Posted)
(df_posts.groupby(['Login', pd.Grouper(key='Posted',freq='W')]).size()
         .groupby('Login').mean()
         .reset_index(name ='Avg_posts_per_week'))

   Login  Avg_posts_per_week
0  User1                 1.5
1  User2                 2.5
luigigi
  • 4,146
  • 1
  • 13
  • 30
  • I got the following error: TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index' – Aigerim Feb 20 '20 at 09:53
  • then you have to convert `Posted` to datetime first using: `df_posts.Posted = pd.to_datetime(df_posts.Posted)`. I added it to answer – luigigi Feb 20 '20 at 09:54