Long ago, but I'll likely need this for my own future reference...
Adding on to Chris's fine answer says, the key is using a modifier between the '%' and the number of the output operand. For example, "MOV %1, %0" might become "MOV %q1, %w0".
I couldn't find anything in constraints.md, but /gcc/config/i386/i386.c had this potentially useful comment in the source for print_reg():
/* Print the name of register X to FILE based on its machine mode and number.
If CODE is 'w', pretend the mode is HImode.
If CODE is 'b', pretend the mode is QImode.
If CODE is 'k', pretend the mode is SImode.
If CODE is 'q', pretend the mode is DImode.
If CODE is 'x', pretend the mode is V4SFmode.
If CODE is 't', pretend the mode is V8SFmode.
If CODE is 'h', pretend the reg is the 'high' byte register.
If CODE is 'y', print "st(0)" instead of "st", if the reg is stack op.
If CODE is 'd', duplicate the operand for AVX instruction.
*/
A comment below for ix86_print_operand() offer an example:
b -- print the QImode name of the register for the indicated operand.
%b0 would print %al if operands[0] is reg 0.
A few more useful options are listed under Output Template of the GCC Internals documentation:
‘%cdigit’ can be used to substitute an operand that is a constant
value without the syntax that normally indicates an immediate operand.
‘%ndigit’ is like ‘%cdigit’ except that the value of the constant is
negated before printing.
‘%adigit’ can be used to substitute an operand as if it were a memory
reference, with the actual operand treated as the address. This may be
useful when outputting a “load address” instruction, because often the
assembler syntax for such an instruction requires you to write the
operand as if it were a memory reference.
‘%ldigit’ is used to substitute a label_ref into a jump instruction.
‘%=’ outputs a number which is unique to each instruction in the
entire compilation. This is useful for making local labels to be
referred to more than once in a single template that generates
multiple assembler instructions.
The '%c2' construct allows one to properly format an LEA instruction using an offset:
#define ASM_LEA_ADD_BYTES(ptr, bytes) \
__asm volatile("lea %c1(%0), %0" : \
/* reads/writes %0 */ "+r" (ptr) : \
/* reads */ "i" (bytes));
Note the crucial but sparsely documented 'c' in '%c1'. This macro is equivalent to
ptr = (char *)ptr + bytes
but without making use of the usual integer arithmetic execution ports.
Edit to add:
Making direct calls in x64 can be difficult, as it requires yet another undocumented modifier: '%P0' (which seems to be for PIC)
#define ASM_CALL_FUNC(func) \
__asm volatile("call %P0") : \
/* no writes */ : \
/* reads %0 */ "i" (func))
A lower case 'p' modifier also seems to function the same in GCC, although only the capital 'P' is recognized by ICC. More details are probably available at /gcc/config/i386/i386.c. Search for "'p'".