There is movdqu available via _mm_loadu_si128 that requires SSE2.
There is vmovdqu8 (16, 32, 64) available via _mm_loadu_epi8 (16, 32, 64) available via AVX512BW + AVX512VL or AVX512F + AVX512VL.
What is the purpose of the later if they apparently do the same?
If the purpose is the mask, then why are unmasked _mm_loadu_epi8 exposed as intrinsics?