7

Lets say we have a df like below:

df = pd.DataFrame({'A': [3, 9, 3, 4], 'B': [7, 1, 6, 0], 'C': [9, 0, 3, 4], 'D': [1, 8, 0, 0]})

Starting df:

   A  B  C  D
0  3  7  9  1
1  9  1  0  8
2  3  6  3  0
3  4  0  4  0

If we wanted to assign new values to column A, I would expect the following to work:

d = {0:10,1:20,2:30,3:40}

df.loc[:,'A'] = d

Output:

   A  B  C  D
0  0  7  9  1
1  1  1  0  8
2  2  6  3  0
3  3  0  4  0

The values that are assigned instead are the keys of the dictionary.

If however, instead of assigning the dictionary to an existing column, if we create a new column, we will get the same result the first time we run it, but running the same code again will get the expected result. We then are able to select any column and it will output the expected output.

First time running df.loc[:,'E'] = {0:10,1:20,2:30,3:40}

Output:

   A  B  C  D  E
0  0  7  9  1  0
1  1  1  0  8  1
2  2  6  3  0  2
3  3  0  4  0  3

Second time running df.loc[:,'E'] = {0:10,1:20,2:30,3:40}

   A  B  C  D   E
0  0  7  9  1  10
1  1  1  0  8  20
2  2  6  3  0  30
3  3  0  4  0  40

Then if we run the same code as we did at first, we get a different result:

df.loc[:,'A'] = {0:10,1:20,2:30,3:40}

Output:

    A  B  C  D   E
0  10  7  9  1  10
1  20  1  0  8  20
2  30  6  3  0  30
3  40  0  4  0  40

Is this the intended behavior? (I am running pandas version 1.4.2)

rhug123
  • 7,893
  • 1
  • 9
  • 24
  • 1
    This is odd.... –  Jun 03 '22 at 20:49
  • 1
    Should file an issue in pandas github page. – rafaelc Jun 03 '22 at 20:50
  • The second assignment to the E column can be removed from the reproducer, the surprising behaviour still remains when assigning to the A column a second time. Definitely time for a bug report. – creanion Jun 03 '22 at 20:57
  • do you recommend I delete this question and submit on github page? – rhug123 Jun 03 '22 at 20:59
  • 2
    Calling `list` on a dictionary returns its keys, fwiw. –  Jun 03 '22 at 21:01
  • @rhug123 No, don't delete the question. Leave it until you get an answer on GH (be it workaround, bug fix, etc.), and then post that here as an answer. –  Jun 03 '22 at 21:01
  • It's a legitimate and well-written programming question, and interesting if nothing else. It's perfectly valid and in my opinion you'll be doing the community a disservice to remove it. :) –  Jun 03 '22 at 21:02
  • Submit on github sure. I've confirmed the bug exists on pandas 1.4.2 – creanion Jun 03 '22 at 21:02
  • It also happens for me on 1.3.4. –  Jun 03 '22 at 21:03
  • FWIW. Following this: https://stackoverflow.com/a/20250996/16407480, when I try "df["A"].replace(d, inplace=True)" or "df["A"].replace({'A':d})" I got "A:40,9,40,4" and when I try "df['A'].map(d)" I got "A:40,NaN,40,NaN". – Drakax Jun 05 '22 at 01:03
  • I use pandas version 1.5.1 and dont have this behavior. It immediately replaces the column with the values of the dict. – Lukas Hestermeyer Feb 16 '23 at 09:41

2 Answers2

1

I short:

It's a bug that was fixed.

How to work around this and some explanation of behaviors:

First of all, in order to use a dictionary to assign values into a dataframe, using the index, you should use pd.Series(d).

Here is an example using pandas 1.4.4:

import pandas as pd

df = pd.DataFrame({'A': [3, 9, 3, 4], 'B': [7, 1, 6, 0], 'C': [9, 0, 3, 4], 'D': [1, 8, 0, 0]})

d = {0:10,1:20,2:30,3:40}

df.loc[:,'E'] = pd.Series({0:10,1:20,2:30,3:40})
df.loc[:,'F'] = {0:10,1:20,2:30,3:40}
df.loc[:,'G'] = {0:10,1:20,2:30,3:40}
df.loc[:,'G'] = {0:10,1:20,2:30,3:40}
df

Output:

    A   B   C   D   E   F   G
0   3   7   9   1   10  0   10
1   9   1   0   8   20  1   20
2   3   6   3   0   30  2   30
3   4   0   4   0   40  3   40

The reason that you get the keys in the first time is that assigning to a dataframe using 'loc' forces a list (which results in keys of the dictionary):

list({1:10,2:20,3:30,4:40})

Output:

[1, 2, 3, 4]

And:

import pandas as pd
df = pd.DataFrame({'A': [3, 9, 3, 4], 'B': [7, 1, 6, 0], 'C': [9, 0, 3, 4], 'D': [1, 8, 0, 0]})
df.loc[:,'A'] = {11:10,12:20,13:30,14:40}
df

Output:

    A   B   C   D
0   11  7   9   1
1   12  1   0   8
2   13  6   3   0
3   14  0   4   0

The output using pandas 1.5.3

import pandas as pd
df = pd.DataFrame({'A': [3, 9, 3, 4], 'B': [7, 1, 6, 0], 'C': [9, 0, 3, 4], 'D': [1, 8, 0, 0]})
df.loc[:,'A'] = {0:10,1:20,2:30,3:40}
df

Output:

index, A    B   C   D
0   10  7   9   1
1   20  1   0   8
2   30  6   3   0
3   40  0   4   0
Matan Bendak
  • 128
  • 6
0

If you have the same keys you just map it and job done.

df['D'] = df.index.map({0:10,1:20,2:30,3:40})
ReinholdN
  • 526
  • 5
  • 22