3

When I declare char ch = 'ab' , ch contains only 'b', why it is not storing 'a'?

#include <stdio.h>

int main()
{
  char ch = 'ab';
  printf("%c",ch);
  return 0;
}

What is the actual process involved here?

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
  • 3
    `char ch = 'ab';` should give a warnings such as `warning: multi-character character constant` and `warning: overflow in conversion from 'int' to 'char' `. Don't do it, it's implementation specific stuff you don't need as a beginner. – Jabberwocky Oct 22 '20 at 12:58
  • @SrujanLanderi what compiler / IDE are you using? – rares985 Oct 22 '20 at 13:02
  • 1
    Compile with all warnings enabled (`-Wall` with gcc and clang, but it might be different in your environnment). – Jabberwocky Oct 22 '20 at 13:03
  • 1
    [C11 6.4.4.4](http://port70.net/~nsz/c/c11/n1570.html#6.4.4.4) p10: "The value of an integer character constant containing more than one character (e.g., 'ab') [...] is implementation-defined." Apparently, in your implementation `'ab'` maps to `'b'`. – pmg Oct 22 '20 at 13:03
  • 1
    @pmg The fact he gets 'b' is mostly due to to the truncation of a >8 bit value to a char. – Jabberwocky Oct 22 '20 at 13:06
  • @andrei985 I use Code Blocks – Srujan Landeri Oct 22 '20 at 13:07
  • 1
    @Jabberwocky: that's my guess about his implementation too. I believe `if ('ab' == ('a' * 256 + 'b')) puts("I'm right");` prints "I'm right" on his computer :) --- https://ideone.com/Vfxsun – pmg Oct 22 '20 at 13:08
  • @SrujanLanderi for setting compiler flags in codeblocks read [this](https://stackoverflow.com/questions/33208733/how-to-add-compiler-flags-on-codeblocks). You absolutely need to use `-Wall` and consider most if not all warnings as errors. – Jabberwocky Oct 22 '20 at 13:08
  • @SrujanLanderi Try enabling the warnings as shown [here](https://www.learncpp.com/cpp-tutorial/configuring-your-compiler-warning-and-error-levels/) or how other people suggested in the comments – rares985 Oct 22 '20 at 13:11
  • @andrei985 forget my comment, I deleted it, you can delete your last comment too, and I'll delete this comment in a few minutes – Jabberwocky Oct 22 '20 at 13:15

2 Answers2

4

When using the single quote marks ' to contain the ab string, you have created what is known as a multi-character constant. The behaviour of your program is implementation-defined, meaning that if I have a different compiler or OS, I might see the value a, even though you see the value b.

This is not a good practice and the compiler should issue a warning. If you do not see the warning, try compiling with warnings enabled by using the flags -Wall and -Wextra if using gcc or clang.

You can read more about this here in another answer.

rares985
  • 341
  • 3
  • 15
1

A character literal, for example 'A', has the type int with the value of 'A', most likely 0x41 (ASCII-Value for A). When you have more than 1 byte in a character literal, for example 'ab', the result is implementation-defined. Note that the same applies to an Unicode codepoint when stored in UTF-8 with more than 1 byte, means every codepoint outside of ASCII, for example '✓' is a multi-character constant.

On my system it is still of type int and every byte is stored in 8 bit of this int, this is probably the case on your system as well. 'ab' results in the value 0x6162, 0x61 is the value for 'a' and 0x62 the value for 'b'. You assign the value 0x6162 to a 8-bit char, you create UB when char is signed, so do not do that. In your case it stores the lower 8 bit of the value in the char, as it also would do when unsigned char is used. (Do not relay on this behavior in case of signed char or char on systems where char is signed).

You can test the value of 'ab':

#include <stdio.h>

int main(void)
{
  unsigned ch = 'ab';
  printf("0x%04X\n",ch);
  return 0;
}

This outputs 0x6162 on my system.