I have 2 variables to emulate X86 XMM & YMM, like below:
uint64_t xmm_value[2];
uint64_t ymm_value[4];
Now I want to use inline assembly to read & write to/from XMM/YMM registers.
- How to write GCC inline assembly to copy
xmm_valueto registerXMM0? - How to write GCC inline assembly to copy register
YMM0toymm_value?
I already tried to search for sample inline assembly doing this, but could not find any good answer. Thanks!
So with some helps, I wrote this code, and it compiled OK. I use movups for XMM, and vmovups for YMM, like below. Is this correct, and can I still optimize my code?
__m128 xmm0;
__m256 ymm0;
// write to XMM0, and read from YMM0
__asm__("movups %1, %%xmm0\n\t"
"vmovups %%ymm0, %0"
: "=m"(ymm0)
: "m"(xmm0)
: "xmm0", "ymm0");
Update 2: here is my full code (with vpbroadcastb added)
__m128 xmm0;
__m256 ymm0;
// write to XMM0, and read from YMM0
__asm__("movups %1, %%xmm0\n\t"
"vpbroadcastb %%xmm0, %%ymm0\n\t"
"vmovups %%ymm0, %0"
: "=m"(ymm0)
: "m"(xmm0)
: "xmm0", "ymm0");
The idea is that I want to copy xmm0 (variable) to XMM0, then run vpbroadcastb, then copy out the result in YMM0 to ymm0 (variable). Now I realize that XMM0 is a lower part of YMM0, so this code can still be improved?