Writing C in Cython

Oct 21, 2014
4 minute read
Blog
Matthew Honnibal

For the last two years, I’ve done almost all of my work in Cython. And I don’t mean, I write Python, and then “Cythonize” it, with various type-declarations etc. I just, write Cython.

I use “raw” C structs and arrays, and occasionally C++ vectors, with a thin wrapper around malloc/free that I wrote myself. The code is almost always exactly as fast as C/C++, because it really is just C/C++ with some syntactic sugar — but with Python “right there”, should I need/want it.

This is basically the inverse of the old promise that languages like Python came with: that you would write your whole application in Python, optimise the “hot spots” with C, and voila! C speed, Python convenience, and money in the bank.

This was always much nicer in theory than practice. In practice, your data structures have a huge influence on both the efficiency of your code, and how annoying it is to write. Arrays are a pain and fast; lists are blissfully convenient, and very slow. Python loops and function calls are also quite slow, so the part you have to write in C tends to wriggle its way up the stack, until it’s almost your whole application.

Today a post came up on HN, on writing C extensions for Python. The author wrote both a pure Python implementation, and a C implementation, using the Numpy C API. This seemed a good opportunity to demonstrate the difference, so I wrote a Cython implementation for comparison:

import random
from cymem.cymem cimport Pool

from libc.math cimport sqrt

cimport cython

cdef struct Point:
    double x
    double y

cdef class World:
    cdef Pool mem
    cdef int N
    cdef double* m
    cdef Point* r
    cdef Point* v
    cdef Point* F
    cdef readonly double dt
    def __init__(self, N, threads=1, m_min=1, m_max=30.0, r_max=50.0, v_max=4.0, dt=1e-3):
        self.mem = Pool()
        self.N = N
        self.m = <double*>self.mem.alloc(N, sizeof(double))
        self.r = <Point*>self.mem.alloc(N, sizeof(Point))
        self.v = <Point*>self.mem.alloc(N, sizeof(Point))
        self.F = <Point*>self.mem.alloc(N, sizeof(Point))
        for i in range(N):
            self.m[i] = random.uniform(m_min, m_max)
            self.r[i].x = random.uniform(-r_max, r_max)
            self.r[i].y = random.uniform(-r_max, r_max)
            self.v[i].x = random.uniform(-v_max, v_max)
            self.v[i].y = random.uniform(-v_max, v_max)
            self.F[i].x = 0
            self.F[i].y = 0
        self.dt = dt


@cython.cdivision(True)
def compute_F(World w):
    """Compute the force on each body in the world, w."""
    cdef int i, j
    cdef double s3, tmp
    cdef Point s
    cdef Point F
    for i in range(w.N):
        # Set all forces to zero.
        w.F[i].x = 0
        w.F[i].y = 0
        for j in range(i+1, w.N):
            s.x = w.r[j].x - w.r[i].x
            s.y = w.r[j].y - w.r[i].y

            s3 = sqrt(s.x * s.x + s.y * s.y)
            s3 *= s3 * s3;

            tmp = w.m[i] * w.m[j] / s3
            F.x = tmp * s.x
            F.y = tmp * s.y

            w.F[i].x += F.x
            w.F[i].y += F.y

            w.F[j].x -= F.x
            w.F[j].y -= F.y


@cython.cdivision(True)
def evolve(World w, int steps):
    """Evolve the world, w, through the given number of steps."""
    cdef int _, i
    for _ in range(steps):
        compute_F(w)
        for i in range(w.N):
            w.v[i].x += w.F[i].x * w.dt / w.m[i]
            w.v[i].y += w.F[i].y * w.dt / w.m[i]
            w.r[i].x += w.v[i].x * w.dt
            w.r[i].y += w.v[i].y * w.dt

The Cython version took about 30 minutes to write, and it runs just as fast as the C code — because, why wouldn’t it? It is C code, really, with just some syntactic sugar. And you don’t even have to learn or think about a foreign, complicated C API… You just, write C. Or C++ — although that’s a little more awkward. Both the Cython version and the C version are about 70x faster than the pure Python version, which uses Numpy arrays.

One difference from C: I wrote a little wrapper around malloc/free, cymem. All it does is remember the addresses it served, and when the Pool is garbage collected, it frees the memory it allocated. I’ve had no trouble with memory leaks since I started using this.

The “intermediate” way of writing Cython, using typed memory-views, allows you to use the Numpy multi-dimensional array features. However, to me it feels more complicated, and the applications I tend to write involve very sparse arrays — where, once again, I want to define my own data structures.

Note

I found a Russian translation of this post here. I don’t know how accurate it is.

How to advocate for modular NLP in the age of Generative AI

Writing C in Cython

Note

How to advocate for modular NLP in the age of Generative AI

How Love Without Sound helps the music industry recover millions in revenue for artists with NLP, spaCy and Prodigy

What the history of the web can teach us about the future of AI

From PDFs to AI-ready structured data: a deep dive