I know how to create a global device function inside Host using np.array or np.zeros or np.empty(shape, dtype) and then using cuda.to_device to copy.
Also, one can declare shared array as cuda.shared.array(shape, dtype)
But how to create an array of constant size in the register of a particular thread inside gpu function.
I tried cuda.device_array or np.array but nothing worked.
I simply want to do this inside a thread -
x = array(CONSTANT, int32) # should make x for each thread