Most Powerful Open Source ERP

Cython nogil extension on multi core introduction

In this blog post, we show a proof of concept to make multi-core processing possible in CPython. It involves adding a new data type in Cython which is not limited by the CPython GIL and a new coroutine which together make real multi-core possible.
  • Last Update:2019-01-11
  • Version:003
  • Language:en

For a long time, multi-core processing has been a sore point in CPython. A variety of solutions were proposed to solve this issue. The goal of this article is not to describe and compare these different solutions, but introduce a new attempt also acting as a proof of concept that is based on Cython. It involves adding a new data type in Cython that is not limited by the CPython GIL and then using a new coroutine library which will be described below to achieve real multi-core support.

The solution is composed of two parts:

  • The first part is a primitive coroutine library named lwan_coro. This was extracted from lwan and wrapped using Cython by Bryton Lacquement. For more information, please refer to: Cython Multithreaded Coroutines. With this, we can create a true multi-core program if we released the GIL in Cython. But currently, there is no data structure in Cython that can release GIL. Users can only use with nogil clause or nogil function to release GIL in a small portion of code.
  • The second part is modifying the Cython compiler, providing a new semantic which is called "nogil extension". "Extension" type is a feature in Cython, which allows users to define C structs in Cython environment directly (see Extension Types). With the "nogil extension", users can define a C struct in Cython environment without GIL.

By combining these two parts, developers can create new objects in the lwan coroutine, which enable more possibilities compared to using the parallel module in Cython.

Background and basic ideas

CPython is a slow interpreted language with GIL. The slowness mainly come from two places. One is because everytime Python code is run, CPython is trying to run the compiled bytecode inside its interpreter evaluation loop instead of running the machine code like C is doing. The second reason is the principle, that CPython is a dynamically typed language. When the interpreter processes operations, it takes a time to check its type and dispatch it to the right place to handle.

It will be promising if one day we can use Python or Python-like language to get fully multi-cores support. So using JIT to accelerate the Python code cannot meet our goal, so we started an experiment which tries to obtain this from a way that wasn't tried before. Before we dive into the details, please allow me to introduce some backgrounds of this project. 


From developer's perspective, Cython is a compiler, which translates the Python/Cython code to C code, use CPython C API to rewrite the pure Python module. In this way, people can get slightly performance improvement (Cython Overview), then build it as a C extension. For CPython implementation, the C code is going to use CPython API to communicate with Python environment.

If Cython wants to get a significant performance improvement, it needs type information inside the code. Then when Cython translating the Cython code (Python grammar with some modification) to C code, it can get rid of the type checking and dispatching and call the corresponding slot directly. That's why Cython with static type can gain significant performance improvement.

Cython Extension type

Extension type is a way to define Python object in Cython environment (Extension Types). Unlike normal Python class, users can use cdef keyword to define attribute or function in the extension type. The cdef attribute becomes the C struct fields. Of course, this C struct contains the PyObject_HEAD, which means it is also a Python Object.

lwan coro

My colleague Bryton Lacquement evaluated several popular coroutine libraries and realized that none could fulfill our needs (Multicore Python HTTP Server). And then he found lwan(lwan github repository), a powerful HTTP server written in C, has a coroutine implementation which can fit perfectly. So he extracted this coroutine implementation and wrapped it by Cython, (cython lwan coroutine wrapper), which allows users to use the API in Python environments. The lwan coroutine implementation is using pthread behind the scene. Since the lwan coro supports multi-core, it means we have to use it in nogil mode.


Base on the introduction in the backgrounds. We know the main drawback is that Cython is just like CPython, limited by the GIL. So one basic idea came out, that is let part of Cython code not related to CPython runtime anymore. Then we can use take the advantage of multi-core.

As no strict definition, Cython has relationship with three environments: Python environment, Cython environment and C environment. Basically, the Python environment is the file with *.py suffix, the Cython environment is the file with *.pyx or *.pyd suffix, as you expect, the C environment is the file with *.c suffix.

Base on the LEGB scope definition. It is not a good idea to let the entire module get rid of the CPython runtime, because in this case, it is not a "Python Module" anymore but basically a C environment without bridge to CPython environment. Cython has with nogil statement and nogil keyword to decorate the cdef function. But there is no way to define nogil extension before. Because nogil extension will cause lots of complex issues. Such as how to handle the exception generation and handling, slots dispatch, memory management and so on. In order to focus our purpose, most of then were not taken care of for now.

So Cython already supports to get rid of CPython runtime on functional level. But in this case, every class still has a PyObject_HEAD. It means the problem we want to solve - Get rid of CPython memory management to remove the GIL- is still out there. Based on this analysis, the good approach is to get rid of CPython runtime on class level. Cython has a feature called Extension Type. If we look at from the pyd side, the Extension Type uses cdef keyword to define a class, like below:

cdef class Foo:

It will create a struct in C environment and add PyObject_HEAD as its first member. Originally, the Extension Type is for define cdef members and cdef functions inside a class. Pure Python class cannot contains cdef declarations. Like I said in above, cdef with static type can bring significant performance improvement.

Based on the fact that the Extension Type type can create C struct and define C attributes and C functions. We are going to remove its PyObject_HEAD, to get rid of CPython runtime. Get rid of CPython runtime means there is no GIL anymore. It can clear the way for take the advantage of multi-core.

Be inspired from the with nogil statement or nogil keyword in Cython. These two features in Cython allow developer to enter a "non-CPython mode". Which means everything in the "nogil" domain will be translated to pure C code without calling any CPython C API. No Python Object will be involved. Then of course, no GIL is present.The first step is to add the nogil keyword to Cython extension type, making it become "nogil extension". Like below:

cdef class SomeMemory nogil:
  cdef double a
  cdef double b

  cdef void foo(self, double a) nogil:
    self.a = a

From semantic point of view, we released GIL in this data structure. Which implies that we can use it in multi-core code.  From the code generation point of view. The nogil extension will generate a C struct without PyObject_HEAD, which means it is not a Python object anymore. This is very important, we will see it below.

In order to achieve this, the first step is modify the Cython parser, add nogil keyword to the cdef class (aka Cython extension type).  Then modify the code generator, to generate desired C code. I also added some checks to avoid semantic conflicts with the existing Cython usages. For the nogil extension, I added several restrictions, the first one is all the function in the nogil extension should be defined by cdef along with the nogil keyword. It should remind developers that we are in the "no-Python" environment. And the return type is also required.

Combine with lwan coro

Since lwan coro library only support nogil environment, if we want to support multi-core processing, we have to release the GIL first.

With the nogil extension, we combined it with lwan coroutine library. Some readers may know that Cython has parallel module, which currently only has OpenMP backend. The post Cython Multithreaded Coroutines already gave some introduction. Basically, you can use the parallel module to write parallel code that will translate to C code using the OpenMP API. But the parallel module for now is somehow primitive. We can not do any fine grained control. The main usage for it is in concurrent numeric processing. We can not join, stop or start any thread lively. 

So we want to find a better way to write multi-core code. Then we found lwan coroutine, it is pthread based coroutine library. Thanks to Bryton Lacquement's creative work, we can use its APIs in Cython to create concurrent program in Cython which is more promising than use OpenMP.

Combine the lwan coro library, with the nogil environment which provided by nogil extension. In theory, developer can do more things than before.

def main():
​​​​​​ ​​​​​​ cdef:
​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ scheduler_t s
​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ unsigned int i

​​​​​​ ​​​​​​ with nogil:
​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ scheduler_init(&s)
​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ for i in range(5):
​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ scheduler_coro_add(&s, task)
​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​scheduler_run(&s)
​​​​​​ ​​​​​​ ​​​​​​ ​​​​​​ scheduler_destroy(&s)

# Example task
​​​​​​cdef int task(coro_t *coroutine, void *arg) nogil:
​​​​​​ ​​​​​​ cdef int a = 5
​​​​​​ ​​​​​​ coro_yield(coroutine, coro_yield_value.MAY_RESUME)
 ​​​​​​ cdef SomeMemory bar = SomeMemory(1, 2)
  coro_yield(coroutine, coro_yield_value.FINISHED)

Closing words

This proof of concept, aka nogil extension prototype combined with lwan coroutine library, revealed an opportunity that developer can use Python-like language to write true multi-core code in Cython. Indeed, the nogil extension type introduced some limitation. Which may make the developer confusing if they are not familiar with C but only familiar with Python code. But we can discuss more with Cython community.

Further thoughts

lwan coroutine originally is part of lwan http server, it is not intended to serve as general purpose coroutine library.  So the wrapped lwan coroutine API can not pass and receive data smoothly when it starts or finishes some calculation.  It needs to be improved if we want to use it in production environment. If we want to make it more usable, we should improve its API or continue to modify the Cython parser and code generator, provide some new semantics like "go", "channel" in Golang. 

The nogil extension is a bit clumsy because we can not use Python builtin function and CPython slots. It is also not inheritable for now. The reference counting for the nogil extension type was disabled because without scope analysis and symbol table it cannot serve its purpose for now. The current plan is provide an object system which independent but similar to CPython object system, hides the difference to developer.

We can not use any pure Python magic method in nogil extension. But I am working on some patches which bring the __cinit__ like Cython magic method to the nogil extension. But this is another topic, after all, the topic of this post is run true controllable multi-core code in Cython.

Source Code

Patched Cython in Nexedi gitlab

SlapOS profile in Nexedi gitlab

Test script in Nexedi gitlab

Amended lwan coroutine Cython wrapper


Extension Types:

Cython internal introduction:

Bryton's blog of Cython multithreaded coroutines:

PyConFr paper about multicore: