I have a DataFrame which I want to extend with columns that contain data from the previous row.
This script does the job:
#!/usr/bin/env python3
import numpy as np
import pandas as pd
n = 2
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [0,1,1,0,0]}, columns=['A', 'B'])
df2 = df[df['B'] == 0]
print(df2)
for i in range(1, n+1):
df2['A_%d' % i] = df2['A'].shift(i)
print(df2)
It outputs:
A B
0 1 0
3 4 0
4 5 0
A B A_1 A_2
0 1 0 NaN NaN
3 4 0 1.0 NaN
4 5 0 4.0 1.0
which is exactly what I want. The DataFrame now has two additional columns A_1 and A_2 that contain the value of column A 1 and 2 rows before.
However, I also get the warning:
./my_script.py:14: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
df2['A_%d' % i] = df2['A'].shift(i)
The problem definitely comes from the filtering before where I create df2. If I work on df directly, the problem does not occur.
In my application I need to work on multiple parts of my original DataFrame separately ant therefore the filtering and is definitely required. All the different parts (like df2 here) get concatenated later.
I found similar issues in How to deal with SettingWithCopyWarning in Pandas? and Pandas SettingWithCopyWarning but the solutions from there do not fix the problem.
Writing e.g.
df2[:, 'A_%d' % i] = df2['A'].shift(i)
the same warning still occurs.
I am working with Python 3.5.2 and Pandas 0.19.2