18

In the following expression, the result of the left shift operation is assigned to the variable i.

int i;
i = 7 << 32;
printf("i = %d\n",i);

In the following expression, the left shift assignment operation is carried.

int x = 7;
x <<= 32;
printf("x = %d\n",x);

Both the above expressions gave different results. But it's not the same with the following two expressions. Both of them gave the same result. So what could be the reason for the above expressions to return different values?

int a;
a = 1 + 1;
printf("a = %d\n",a);

int b = 1;
b += 1;
printf("b = %d\n",b);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131

3 Answers3

26

The C standard says:

The result is undefined if the right operand is negative, or greater than or equal to the number of bits in the left expression’s type.

So, it is undefined behavior because int is normally 32 bits in size, which means that only 0 through 31 steps are well-defined.

msc
  • 33,420
  • 29
  • 119
  • 214
  • 1
    31 steps is also undefined unless input was zero (shifting 1 into sign bit) – M.M May 30 '17 at 06:49
  • 1
    "because size of `int` normally 32 bits": that depends on the compiler. – glglgl May 30 '17 at 07:03
  • 1
    @M.M No, 1's dropping off on the left side is ok. It's really only a shift by more than (or equal) the number of bits in the underlying type that's undefined, regardless of the actual values of those bits. – Marc Schütz May 30 '17 at 10:30
  • 1
    @M.M ISO99 6.5.7: says it's well-defined for unsigned types. It does state that it's undefined for signed integers, but IMO that's just an instance of the signed overflow rule, not specific to left shifts. – Marc Schütz May 30 '17 at 11:05
  • 1
    @MarcSchütz it still applies regardless of how you want to categorize it - assuming 32-bit int, `1 << 31`, `7 << 29` etc. are undefined behaviour, and an easy way to remember the rules is that trying to left-shift a 1 into the sign-bit is always UB – M.M May 30 '17 at 11:12
5

I agree with Cody Gray's comments. Just for people in future who end up here, the way to resolve this ambiguity is using unsigned long long.

unsigned long long int b = 7ULL<<32; // ULL here is important, as it tells the compiler that the number being shifted is more than 32bit.

unsigned long long int a = 7;
a <<=32;
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Shuvam
  • 211
  • 2
  • 7
  • In this case the situation is the same for 64 bits shift ? – alinsoar May 30 '17 at 10:03
  • Any magic number in code is usually word length unless otherwise typecasted. In this case, as you are trying to right shift 32bits i.e equal to the size of word length, we need to tell the compiler not to treat it as one word width data (32bit). – Shuvam May 30 '17 at 10:23
  • 3
    Probably diligent when thinking about shifting by `n bits` to recommend the use of `int64_t`, `uint64_t`, etc from `stdint` that explicitly ensure bit length – cat May 30 '17 at 15:21
  • 1
    Yes +1 and reiterating: ALWAYS use the _stdint_ types when you are making assumptions about the number of bits. – paddy May 30 '17 at 23:32
  • @paddy stdint is applicable only when you have to access to the standard library! Sometimes (like in embedded) you have to write your own lib. So, I don't think "ALWAYS" holds good here! – Shuvam Jun 08 '17 at 06:20
1

The abstract operational semantics from ISO/IEC 9899 says:

6.5.7 Bitwise shift operators --- Semantics

3 .... ... . If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.

In your case, disassembling and seeing what happens, we see so:

[root@arch stub]# objdump -d a.out | sed '/ <main>/,/^$/ !d'
00000000004004f6 <main>:
  4004f6:       55                      push   %rbp
  4004f7:       48 89 e5                mov    %rsp,%rbp
  4004fa:       48 83 ec 10             sub    $0x10,%rsp
  4004fe:       c7 45 fc 07 00 00 00    movl   $0x7,-0x4(%rbp)
  400505:       b8 20 00 00 00          mov    $0x20,%eax
  40050a:       89 c1                   mov    %eax,%ecx
  40050c:       d3 65 fc                shll   %cl,-0x4(%rbp)  <<== HERE IS THE PROBLEM
  40050f:       8b 45 fc                mov    -0x4(%rbp),%eax
  400512:       89 c6                   mov    %eax,%esi
  400514:       bf b4 05 40 00          mov    $0x4005b4,%edi
  400519:       b8 00 00 00 00          mov    $0x0,%eax
  40051e:       e8 cd fe ff ff          callq  4003f0 <printf@plt>
  400523:       b8 00 00 00 00          mov    $0x0,%eax
  400528:       c9                      leaveq 
  400529:       c3                      retq   
  40052a:       66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

The generated code tries indeed to shift, but the shll %cl,-0x4(%rbp) (shift left of a long) has no effect.

The undefined behaviour in this case lies in assembly, namely in SHL operation.

alinsoar
  • 15,386
  • 4
  • 57
  • 74
  • 3
    *"The undefined behaviour in this case lies in assembly, namely in SHL operation."* This is incorrect. Shifting by an excess of bits is **well-defined** on x86, as you can see in [the documentation](http://x86.renejeschke.de/html/file_module_x86_id_285.html). If you were writing the code in assembly, there would be no problem. The reason it is undefined behavior is because of the requirements set out in the C language standard. *"...the `shll %cl,-0x4(%rbp)` (shift left of a long) has no effect."* I have no idea why you say that. It absolutely does have an effect. – Cody Gray - on strike May 30 '17 at 08:20
  • The SHL operation does not return the `0` value in `-0x4(%rbp)` as one expects. – alinsoar May 30 '17 at 08:23
  • Why would one expect 0? Also, returning an "unexpected" value is very different from having "no effect". The only reason it's unexpected to you is because you aren't familiar enough with the x86 ISA. As I said before, it's not undefined in assembly language; it's only undefined in C. – Cody Gray - on strike May 30 '17 at 08:25
  • SHL keeps the value at `-0x4(%rbp)` as before the shift. – alinsoar May 30 '17 at 08:27
  • I have some experience with X86, perhaps not enough for some purposes, but in this case `SHL` does not modify the value at `0x4(%rbp)`. – alinsoar May 30 '17 at 08:28
  • @CodyGray I said so because SHL has no effect on the stack location allocated for the variable ``x``. – alinsoar May 30 '17 at 08:30
  • 1
    Technically, `shll %cl, -0x4(%rbp)` *does* modify the value on the stack. Why wouldn't it? The instruction says to shift the value at an offset of -4 from `rbp` by the value in `cl`, and that's precisely what it does. Now, the x86 ISA is specifically documented as masking shift counts for 32-bit operands to 5 bits, which (A) is what makes this well-defined behavior, and (B) means that shifting a value by 32 is essentially an identity operation in that it evaluates the original value. So in that sense, no, the value at `-0x4(%rbp)` isn't modified, but it isn't supposed to be. – Cody Gray - on strike May 30 '17 at 08:42
  • Technically it modies the stack at that offset if %CL is different of 32. Otherwise, as one can see, it does not modify it. The code generated by GCC for 32 is the same code generated for other values, so the problem lies in assembly, etc. – alinsoar May 30 '17 at 09:27
  • 1
    It is difficult for me to understand the point that you are making. It is very clear from the documentation that shifting by an excess of bits is *not* undefined behavior on x86; in fact, it is quite well-defined so we can predict exactly what the result will be of shifting 7 by 32. There is no "problem" in the assembly. The problem lies in the C source code that the compiler is translating to machine code, and the C source code is invalid because *it* exhibits undefined behavior. The top half of the answer is correct; the bottom half is misleading or just downright wrong. – Cody Gray - on strike May 30 '17 at 09:38
  • @CodyGray if it's undefined in C, there's no requirement that a compiler follow the default behavior of a single CPU architecture. This is especially relevant for compilers that target multiple CPU architectures instead of just one; in that case doing the same thing in all cases instead of implementing different machine specific behaviors eliminates large numbers of subtle cross architecture difference bugs in applications. Undefined means the compiler can do whatever it wants. From what I've read elsewhere this is increasingly likely to be the equivalent of `return 0;` in optimized builds. – Dan Is Fiddling By Firelight May 30 '17 at 14:20
  • @DanNeely in this case the compiler tries to shift left exactly as it tries to shift for 10 bits and it does not care about the result of SHL. Because 32 is unspecified in C the compiler does not care what the assembler will do. The assembler in this case will not modify the address `-0x4(%rbp)` allocated for that variable, so the problem is in assembler in this case, this `unexpected behaviour` is in fact strange as it lies on x86 definition of shift, which is strange, etc. – alinsoar May 30 '17 at 14:39
  • The right thing to say would be that the compiler would be allowed to generate exactly the same code _no matter what_ `shll %cl, -0x4(%rbp)` happens to do when `%cl` has a value outside the range 0 .. 31. What matters is not that this x86 machine instruction does have defined behavior for any shift-count argument, but that the C standard doesn't care what that behavior is. – zwol May 30 '17 at 16:11
  • 1
    To put it another way, if a human wrote this program by hand in x86 assembly language, we might say that it has a _bug_ if the human meant the left shift to do something other than what it does, but we wouldn't say that it had undefined behavior, because all of the machine instructions have well-defined behavior. (We might not _like_ that Intel specified `shll` to reduce the shift count mod 32, but they did specify it.) However, the C program that got translated into this assembly dump does have undefined behavior, because the C standard _doesn't_ specify what an over-wide shift count does. – zwol May 30 '17 at 16:16
  • 2
    @dan I think you are misunderstanding my point, as I'm afraid that alinsoar was. I fully agree that this is undefined behavior in C, and it doesn't make it any less undefined just because one particular compiler when targeting one particular architecture seems to consistently translate the code into a particular machine instruction. My issue was with the claim made in the answer that *"The undefined behaviour in this case lies in assembly, namely in SHL operation."* This is wrong, the behavior is *not* undefined in assembly language, it is undefined in *C*. – Cody Gray - on strike May 31 '17 at 07:01