2

I'm just wondering If I do the following:

{
    register uint32_t v1 asm("r6"), v2 asm("r7");
    register uint32_t v3 asm("r8"), v4 asm("r10");

    asm volatile (
        /* Move data */
        "   ldm     %[src], {%[v1],%[v2],%[v3],%[v4]};"
        "   stm     %[dst], {%[v1],%[v2],%[v3],%[v4]};"
        : /* output constraints */
          "=m"(*(uint64_t (*)[2])dst),
          [v1]"=&r"(v1), [v2]"=&r"(v2),
          [v3]"=&r"(v3), [v4]"=&r"(v4)
        : /* input constraints */
          "m"(*(const uint64_t (*)[2])src),
          [dst]"r"(read_dst),
          [src]"r"((const uint64_t* )src)
        : /* clobber constraints */
    );
}

is v1 guaranteed to be at r6, or is the compiler free to make an optimization to use another register? Is there a way to tie a name to a particular register? (without manually specify %r6 everywhere?)

Also, is there a difference between using an output constraint vs a clobber constraint for a temporary variable? (assuming it is not referenced after the inline assembly call?)

I'm using gcc and clang, so a solution would have to work for both. This is of course a simplified example for the purposes of posting a question.

HardcoreHenry
  • 5,909
  • 2
  • 19
  • 44
  • 1
    Yes, `v1` will be in `r6`. To quote the [manual](https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html): _"To force an operand into a register, create a local variable and specify the register name after the variable’s declaration."_ – Jester Aug 27 '18 at 20:31

1 Answers1

4

Yes, this is safe. That's exactly what register ... asm("regname"); is for, and in fact the only supported use of register asm local variables.

In practice gcc will strongly prefer that register even when it costs extra instructions if you keep using that variable later. (Using a specific zmm register in inline asm). But I hope if that variable is dead it will allocate the register to other variables. Still, you might want to limit the scope of those temporaries, or just wrap this in an inline function.


In asm volatile, there's no difference AFAIK between an output operand vs. a clobber. The benefit to an output operand is that it lets the compiler pick a register for you, but if you're forcing the register allocation manually then there's no benefit either way.

If you disabled optimization, the compiler would actually reserve stack space for the locals and spill them. (Or maybe not since they're register variables?)


Without volatile, having no output constraint makes your asm statement implicitly volatile. So if all the output constraints are unused, but you want the asm statement to run anyway for some side-effect with a "memory" clobber, you need to use volatile explicitly.

In your case, you only want the copy to happen if the memory output is used. So you should probably omit volatile so it can optimize away a copy to a temporary if it can prove that nothing can care about the result. A copy to a global or an unknown pointer can't be optimized away any more than void foo(int*p) { *p=1; } can be. It's a potentially-visible side-effect of the function that the caller could observe.


This use-case for copying 16 bytes

This looks a bit questionable. Does gcc really make worse code than this for copying 16 bytes? Or are you trying to optimize for size over speed? Normally you'd want to schedule instructions so the load result isn't used right away, especially for in-order CPUs (which are not rare in the ARM world).

Congratulations on getting all the constraints right, though. Well over 90% of SO questions about GNU C inline asm have unsafe or sub-optimal constraints, maybe even 99%.

Your early-clobber output constraints, and dummy "m" input/output operands, are necessary for this to be safe.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847