I'm seeing some bizarre behavior in cython code. I'm writing code to compute a forward Kalman filter, but I have a state transition model which has many 0s in it, so it would be nice to be able to calculate only certain elements of the covariance matrices.
So to test this out, I wanted to fill individual array elements using cython. To my surprise, I found
that writing output to specific array locations is very slow
(function fill(...)), compared to just assigning it to a scalar variable every time(function nofill(...))(essentially forgetting the results), andsetting
C=0.1or31, while not affecting how longnofill(...)takes to run, the latter choice for C makesfill(...)run 2x as slowly. This is baffling to me. Can anyone explain why I'm seeing this?
Code:-
################# file way_too_slow.pyx
from libc.math cimport sin
# Setting C=0.1 or 31 doesn't change affect performance of calling nofill(...), but it makes the fill(...) slower. I have no clue why.
cdef double C = 0.1
# This function just throws away its output.
def nofill(double[::1] x, double[::1] y, long N):
cdef int i
cdef double *p_x = &x[0]
cdef double *p_y = &y[0]
cdef double d
with nogil:
for 0 <= i < N:
d = ((p_x[i] + p_y[i])*3 + p_x[i] - p_y[i]) + sin(p_x[i]*C) # C appears here
# Same function keeps its output.
# However: #1 - MUCH slower than
def fill(double[::1] x, double[::1] y, double[::1] out, long N):
cdef int i
cdef double *p_x = &x[0]
cdef double *p_y = &y[0]
cdef double *p_o = &out[0]
cdef double d
with nogil:
for 0 <= i < N:
p_o[i] = ((p_x[i] + p_y[i])*3 + p_x[i] - p_y[i]) + sin(p_x[i]*C) # C appears here
The above code is called by the python program
#################### run_way_too_slow.py
import way_too_slow as _wts
import time as _tm
N = 80000
x = _N.random.randn(N)
y = _N.random.randn(N)
out = _N.empty(N)
t1 = _tm.time()
_wts.nofill(x, y, N)
t2 = _tm.time()
_wts.fill(x, y, out, N)
t3 = _tm.time()
print "nofill() ET: %.3e" % (t2-t1)
print "fill() ET: %.3e" % (t3-t2)
print "fill() is slower by factor %.3f" % ((t3-t2)/(t2-t1))
The cython was compiled using the setup.py file
################# setup.py
from distutils.core import setup, Extension
from distutils.sysconfig import get_python_inc
from distutils.extension import Extension
from Cython.Distutils import build_ext
incdir=[get_python_inc(plat_specific=1)]
libdir = ['/usr/local/lib']
cmdclass = {'build_ext' : build_ext}
ext_modules = Extension("way_too_slow",
["way_too_slow.pyx"],
include_dirs=incdir, # include_dirs for Mac
library_dirs=libdir)
setup(
name="way_too_slow",
cmdclass = cmdclass,
ext_modules = [ext_modules]
)
Here is a typical output of running "run_way_too_slow.py" using C=0.1
>>> exf("run_way_too_slow.py")
nofill() ET: 6.700e-05
fill() ET: 6.409e-04
fill() is slower by factor 9.566
A typical run with C=31.
>>> exf("run_way_too_slow.py")
nofill() ET: 6.795e-05
fill() ET: 1.566e-03
fill() is slower by factor 23.046
As we can see
Assigning into specified array location is quite slow compared to assigning to a double.
For some reason, assigning speed seems to depend on what operation was done in the calculation - which makes no sense to me.
Any insight would be greatly appreciated.