1

I need to perform the following operations in a loop: Select a slice of a pandas dataframe, then modify values of the slice (specifically, winsorize the data), then write the modified values back to the slice. What is the best practice for this? I have tried several ways, but the resulting column is usually full of NaNs.

for value in list_values:
    temp_df = df.loc[df["Column_a" == value]]
    transformed_data = pd.Series(mstats.winsorize(temp_df["Column_b"], limits=[0.05, 0.05])
    df.loc[df["Column_a" == value, "Column_b]] = transformed_data

Any help is very appreciated. Thanks!

Irina
  • 1,333
  • 3
  • 17
  • 37
  • This [answer](https://stackoverflow.com/questions/45093241/how-to-replace-part-of-dataframe-in-pandas) might be of some help – Shaikh Abuzar Feb 25 '21 at 17:58
  • @ShaikhAbuzar - Thanks, although these solutions are all used to combine dataframes, and I am trying to assign a series to a slice of a dataframe (the series has the same length as the dataframe slice). I have tried several suggestions there and was not succesful; any suggestions? Thanks! – Irina Feb 25 '21 at 18:43

1 Answers1

0

I think pandas.DataFrame.combine_first or pandas.DataFrame.update should solve this issue. There are examples here https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

xkudsraw
  • 149
  • 12
  • Thank you, although these functions seem to be used to combine two dataframes (or update based on another dataframe). Since I am trying to assign a series to a slice of a dataframe (the series has the same length as the dataframe slice), I have tried with those and have not been successful. Could you give me an example of how I could make my code work? I appreciate it! – Irina Feb 25 '21 at 18:46
  • There is an example here with a series. Maybe you can try it https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.update.html – xkudsraw Feb 25 '21 at 20:47