Writing C in Cython

· by Matthew Honnibal · ~4 min. read

For the last two years, I’ve done almost all of my work in Cython. And I don’t mean, I write Python, and then “Cythonize” it, with various type-declarations etc. I just, write Cython.

I use “raw” C structs and arrays, and occasionally C++ vectors, with a thin wrapper around malloc/free that I wrote myself. The code is almost always exactly as fast as C/C++, because it really is just C/C++ with some syntactic sugar — but with Python “right there”, should I need/want it.

This is basically the inverse of the old promise that languages like Python came with: that you would write your whole application in Python, optimise the “hot spots” with C, and voila! C speed, Python convenience, and money in the bank.

This was always much nicer in theory than practice. In practice, your data structures have a huge influence on both the efficiency of your code, and how annoying it is to write. Arrays are a pain and fast; lists are blissfully convenient, and very slow. Python loops and function calls are also quite slow, so the part you have to write in C tends to wriggle its way up the stack, until it’s almost your whole application.

Today a post came up on HN, on writing C extensions for Python. The author wrote both a pure Python implementation, and a C implementation, using the Numpy C API. This seemed a good opportunity to demonstrate the difference, so I wrote a Cython implementation for comparison:

import random
from cymem.cymem cimport Pool

from libc.math cimport sqrt

cimport cython

cdef struct Point:
    double x
    double y

cdef class World:
    cdef Pool mem
    cdef int N
    cdef double* m
    cdef Point* r
    cdef Point* v
    cdef Point* F
    cdef readonly double dt
    def __init__(self, N, threads=1, m_min=1, m_max=30.0, r_max=50.0, v_max=4.0, dt=1e-3):
        self.mem = Pool()
        self.N = N
        self.m = <double*>self.mem.alloc(N, sizeof(double))
        self.r = <Point*>self.mem.alloc(N, sizeof(Point))
        self.v = <Point*>self.mem.alloc(N, sizeof(Point))
        self.F = <Point*>self.mem.alloc(N, sizeof(Point))
        for i in range(N):
            self.m[i] = random.uniform(m_min, m_max)
            self.r[i].x = random.uniform(-r_max, r_max)
            self.r[i].y = random.uniform(-r_max, r_max)
            self.v[i].x = random.uniform(-v_max, v_max)
            self.v[i].y = random.uniform(-v_max, v_max)
            self.F[i].x = 0
            self.F[i].y = 0
        self.dt = dt


@cython.cdivision(True)
def compute_F(World w):
    """Compute the force on each body in the world, w."""
    cdef int i, j
    cdef double s3, tmp
    cdef Point s
    cdef Point F
    for i in range(w.N):
        # Set all forces to zero.
        w.F[i].x = 0
        w.F[i].y = 0
        for j in range(i+1, w.N):
            s.x = w.r[j].x - w.r[i].x
            s.y = w.r[j].y - w.r[i].y

            s3 = sqrt(s.x * s.x + s.y * s.y)
            s3 *= s3 * s3;

            tmp = w.m[i] * w.m[j] / s3
            F.x = tmp * s.x
            F.y = tmp * s.y

            w.F[i].x += F.x
            w.F[i].y += F.y

            w.F[j].x -= F.x
            w.F[j].y -= F.y


@cython.cdivision(True)
def evolve(World w, int steps):
    """Evolve the world, w, through the given number of steps."""
    cdef int _, i
    for _ in range(steps):
        compute_F(w)
        for i in range(w.N):
            w.v[i].x += w.F[i].x * w.dt / w.m[i]
            w.v[i].y += w.F[i].y * w.dt / w.m[i]
            w.r[i].x += w.v[i].x * w.dt
            w.r[i].y += w.v[i].y * w.dt

The Cython version took about 30 minutes to write, and it runs just as fast as the C code — because, why wouldn’t it? It is C code, really, with just some syntactic sugar. And you don’t even have to learn or think about a foreign, complicated C API… You just, write C. Or C++ — although that’s a little more awkward. Both the Cython version and the C version are about 70x faster than the pure Python version, which uses Numpy arrays.

One difference from C: I wrote a little wrapper around malloc/free, cymem. All it does is remember the addresses it served, and when the Pool is garbage collected, it frees the memory it allocated. I’ve had no trouble with memory leaks since I started using this.

The “intermediate” way of writing Cython, using typed memory-views, allows you to use the Numpy multi-dimensional array features. However, to me it feels more complicated, and the applications I tend to write involve very sparse arrays — where, once again, I want to define my own data structures.

Note

I found a Russian translation of this post here. I don’t know how accurate it is.

About the author

  • Matthew Honnibal

    Matthew Honnibal CTO, Founder