Short question: Are there compiler options or functions attributes available in g++ that force the compiler to pass members of structures through registers instead of the stack.
Long question: In my application I have a list of function handles that I am basically calling in a loop. Since every function does only a small amount of work, the function call overhead needs to be minimized.
I want now to pass the arguments in a struct. This has the advantage, that a change in the arguments needs to be done only in one place not in like 20 places all over the code base. Another advantage is, that some arguments are based on template parameters which add or remove arguments. With the struct this could be overcome.
The problem is now, that if the struct has more than two members, g++ pushes the struct on the stack instead of passing the arguments in the registers. This causes the performance to go down by 50%. I produced a small example that demonstrates the problem:
#include <iostream>
struct A {
uint8_t n;
size_t& __restrict__ dataPos;
char* const __restrict__ data;
};
struct B {
size_t& __restrict__ dataPos;
char* const __restrict__ data;
};
__attribute__((noinline)) void funcStructA(A a) {
std::cout << "out struct A: n: " << a.n << " dataPos: " << a.dataPos << " data: " << a.data << std::endl;
}
__attribute__((noinline)) void funcStructB(uint8_t n, B b) {
std::cout << "out struct B: n: " << n << " dataPos: " << b.dataPos << " data: " << b.data << std::endl;
}
__attribute__((noinline)) void funcDirect(uint8_t n, size_t& __restrict__ dataPos, char* const __restrict__ data) {
std::cout << "out direct: n: " << n << " dataPos: " << dataPos << " data: " << data << std::endl;
}
int main(int nargs, char** args) {
char data[1000];
size_t pos = 100;
funcStructA(A{10, pos, data});
funcStructB(10, B{pos, data});
funcDirect(10, pos, data);
return 0;
}
The assembly code (g++ -std=c++14 -O3, version 11.2.1 20220127 (Red Hat 11.2.1-9)) in main is:
401119: push QWORD PTR [rsp+0x10]
40111d: push QWORD PTR [rsp+0x10]
401121: push QWORD PTR [rsp+0x38]
401125: call 401280 <funcStructA(A)>
40112a: add rsp,0x20
40112e: mov rsi,rbp
401131: mov rdx,r12
401134: mov edi,0xa
401139: call 4013a0 <funcStructB(unsigned char, B)>
40113e: mov rdx,r12
401141: mov rsi,rbp
401144: mov edi,0xa
401149: call 4014c0 <funcDirect(unsigned char, unsigned long&, char*)>
In functStructA the structure is pushed to the stack, for funcStructB the members are passed through the registers.
I tried to move n around in the struct or pass it by reference, but the behavior is always the same.
I read through the attributes available in gnu (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes, https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes) but could not find one that matches my problem. I tried cdcl, fastcall, ms_abi but this changed not that much.
Passing the structure by reference causes the same problems.
clang++ seems to have the same problem. I will run a test in the next days.
Any help would be appreciated.