Python FFI

Call native C from python

#python, #programming

Key Concepts
Advanced Examples

Python FFI is how you wire pure Python to native code without writing a CPython extension by hand. You get to keep Python for orchestration while borrowing the speed and reach of C libraries. This deep dive focuses on two battle tested tools:

ctypes: ships with Python, great for quick bindings and small to medium surfaces.
cffi: third party, closer to C, excellent for large headers, structs, and callbacks.

No magic here: FFI works because your OS loader can map a shared library (ELF on Linux, Mach-O on macOS, PE on Windows) into the process and expose its exported symbols. Python then calls those symbols through the platform calling convention. Your job is to describe the function prototypes and data layouts accurately and to manage memory ownership with care.

This article covers:

ABI vs API bindings
Finding and loading libraries portably
Precise type mapping
Strings, buffers, and memory ownership
Error handling and diagnostics
Callbacks and reentrancy
Arrays, slicing, and NumPy
Packaging and deployment
Debugging strategies
Two end to end examples

If you already know the basics, jump straight to the two examples.

ABI vs API modes

ABI mode means you bind to an existing compiled library at runtime. You supply C signatures in Python, and the tool uses the C Application Binary Interface to call into the library. Both ctypes and cffi support ABI mode.
API mode means you compile a small extension module that bundles C declarations and optionally some helper C code. cffi supports an out of line API mode that type checks more and can run faster. API mode is also a good place to write tiny glue functions to avoid tricky FFI edges like variadic functions or macro only APIs.

Pick ABI mode for exploration and light bindings. Pick API mode for production grade, wide surface bindings, especially when the header uses lots of typedefs and structs.

Finding and loading libraries

Portable loading is half the battle.

Use ctypes.util.find_library("name") to resolve common system libraries. It returns a filename or None.
On Windows 3.8+, use os.add_dll_directory to add a directory to the DLL search path at runtime.
Keep a small search function that tries common names:
- Linux: libname.so or libname.so.X
- macOS: libname.dylib
- Windows: name.dll
When shipping third party libs with wheels, prefer platform specific wheels and fix up rpaths:
- Linux: auditwheel repair to vendor and set RPATH or RUNPATH to $ORIGIN.
- macOS: delocate to rewrite install names to @rpath or @loader_path.
- Windows: put DLLs next to the pyd and use add_dll_directory at import time.
Verify what you actually loaded with ldd, otool -L, or dumpbin /dependents.

Precise type mapping

You must match sizes and signedness.

Do not assume C long equals Python ctypes c_long across platforms. On Windows 64 bit, long is 32 bit. On Linux 64 bit, long is 64 bit. Prefer fixed width types when available in the header (uint32_t, int64_t) and map them to c_uint32, c_int64, etc.
size_t and ssize_t map to c_size_t and c_ssize_t.
Pointers map to c_void_p or POINTER(T). A char * parameter that is input only maps to c_char_p, but mutable buffers should be passed as POINTER(c_char) with a create_string_buffer.
Structs in ctypes use a subclass of ctypes.Structure with fields. Pay attention to alignment and packing. If the header uses #pragma pack(1), set pack = 1 on the Structure. In cffi, layouts follow the platform C compiler by default.
Bit fields are a footgun. Both ctypes and cffi can model them, but consider writing a tiny C shim that exposes get and set helpers instead.

Strings, buffers, and ownership

C strings are byte sequences terminated by 0.
In Python 3, str is Unicode. Use .encode("utf-8") to get bytes. Decode outputs with .decode("utf-8") when the library promises UTF-8. If the library uses locale encodings, treat with care.
For output buffers, allocate them on the Python side so you control lifetime:
- ctypes.create_string_buffer for mutable char buffers.
- In cffi, use ffi.new("char[]", size) or ffi.from_buffer to pass a Python bytearray or a NumPy array without copying.
Ownership rules matter:
- If the library allocates memory and tells you to free it, call the matching free function from the same library. With cffi, wrap the pointer with ffi.gc(ptr, lib.free_like_func) so it is freed when the Python object is collected.
- If the library returns a borrowed pointer, do not store it beyond the documented lifetime.

Error handling and diagnostics

Many C APIs report errors through return codes and optionally set errno. With ctypes, load functions with use_errno=True or the library with use_errno=True and read ctypes.get_errno(). On Windows, some functions set the last error; use use_last_error and ctypes.get_last_error().
Wrap dangerous functions with errcheck in ctypes so you raise early instead of silently continuing with corrupt state.
Serde like errors often write into out parameters. Make sure to check them.
When things go wrong, inspect sizes and alignments. In ctypes, use sizeof(T) and alignment(T). In cffi, use ffi.sizeof and ffi.alignof.

Callbacks and reentrancy

C libraries sometimes ask for a function pointer callback. In ctypes, create a CFUNCTYPE with the exact prototype. In cffi, use ffi.callback with the prototype string.
Keep the Python callable alive for as long as the C code might call it. If it gets garbage collected, the next callback can crash the process.
Exceptions raised in a callback cannot unwind safely through C frames. Catch and log inside the callback and convert to an error code that C expects.

Arrays, slicing, and NumPy

Arrays are just contiguous memory. In ctypes, define T n or use (T n)() to allocate. You can get a pointer with ctypes.pointer or by passing the array object directly.
With cffi, new("T[]", n) creates an array and you can index into it like Python. ffi.from_buffer lets you pass a Python buffer object without copying. This is great with NumPy: ffi.from_buffer(arr) gives you a char * to arr.data. Always enforce C contiguous layout first: arr = np.ascontiguousarray(arr).

Threading and the GIL

FFI calls made from Python normally run with the GIL held. Long running or blocking calls will stall other Python threads.
Offload blocking I O or CPU heavy C calls to a worker thread or process. If you need to truly release the GIL around the C call, that requires compiled extension code. In cffi API mode you can write small helper C functions that release the GIL, but pure Python FFI cannot release it for you.

Variadic functions

Variadic functions like printf are tricky because the called function needs to know the exact promoted types. Avoid calling variadic functions directly from Python FFI. Instead, wrap them in a small C function with a fixed signature that your FFI calls.

Packaging and deployment checklist

Pin exact library versions and check them at runtime by calling a version function exposed by the library.
Vendor the shared library where licensing allows.
Add platform specific wheels with repaired rpaths, as mentioned earlier.
At import time, extend your search path safely. On Windows, prefer os.add_dll_directory to mutating PATH globally.
Fail with a clear error if the library cannot be found and print where you searched.

Debugging strategies

Symbol not found at import: check you loaded the right library build and that the symbol is exported unmangled. C++ names are mangled unless declared extern "C".
Crashes at call time: double check arg order, sizes, and whether the callee expects pointers to pointers.
Print the integer values of pointers and buffer lengths before calls. Figure out what the callee expects by reading the header again.
Use nm or readelf on Linux, otool and nm on macOS, and dumpbin on Windows to inspect exports.
Last resort: run under gdb or lldb to capture a backtrace at the crash. Use ASAN or valgrind with a tiny C harness to catch writes past the end of buffers.

Example A: a safe ctypes wrapper around zlib compress and uncompress

This shows how to map function prototypes, handle output buffers, and raise errors early.

import ctypes
import ctypes.util

# Resolve the zlib shared library name across platforms
def load_zlib():
    name = ctypes.util.find_library("z") or ctypes.util.find_library("zlib")
    if not name:
        # Common fallbacks
        candidates = ["zlib1.dll", "libz.so.1", "libz.dylib"]
        for c in candidates:
            try:
                return ctypes.CDLL(c, use_errno=True)
            except OSError:
                pass
        raise OSError("could not find zlib on this system")
    return ctypes.CDLL(name, use_errno=True)

z = load_zlib()

# zlib types vary by platform, but uLong and uLongf are commonly unsigned long.
c_uLong = ctypes.c_ulong
c_uLong_p = ctypes.POINTER(c_uLong)
c_Byte = ctypes.c_ubyte
c_Byte_p = ctypes.POINTER(c_Byte)

# int compress2(Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen, int level);
z.compress2.argtypes = [c_Byte_p, c_uLong_p, c_Byte_p, c_uLong, ctypes.c_int]
z.compress2.restype = ctypes.c_int

# uLong compressBound(uLong sourceLen);
z.compressBound.argtypes = [c_uLong]
z.compressBound.restype = c_uLong

# int uncompress(Bytef *dest, uLongf *destLen, const Bytef *source, uLong sourceLen);
z.uncompress.argtypes = [c_Byte_p, c_uLong_p, c_Byte_p, c_uLong]
z.uncompress.restype = ctypes.c_int

# zlib return codes
Z_OK = 0
Z_MEM_ERROR = -4
Z_BUF_ERROR = -5
Z_STREAM_ERROR = -2

class ZlibError(RuntimeError):
    pass

def _errcheck(code, where):
    if code == Z_OK:
        return
    err = ctypes.get_errno()
    raise ZlibError(f"{where} failed with code {code}, errno {err}")

def zlib_compress(data: bytes, level: int = 6) -> bytes:
    if not isinstance(data, (bytes, bytearray, memoryview)):
        raise TypeError("data must be bytes like")
    src = (c_Byte * len(data)).from_buffer_copy(bytearray(data))
    src_len = c_uLong(len(data))

    # Ask zlib how big the output can be
    dest_cap = z.compressBound(src_len)
    dest = (c_Byte * dest_cap)()
    dest_len = c_uLong(dest_cap)

    rc = z.compress2(dest, ctypes.byref(dest_len), src, src_len, int(level))
    _errcheck(rc, "compress2")

    # Slice the buffer to the actual compressed size
    view = memoryview(dest)[:dest_len.value]
    return bytes(view)

def zlib_uncompress(data: bytes, expected_size: int | None = None) -> bytes:
    if not isinstance(data, (bytes, bytearray, memoryview)):
        raise TypeError("data must be bytes like")
    src = (c_Byte * len(data)).from_buffer_copy(bytearray(data))
    src_len = c_uLong(len(data))

    # If caller knows the uncompressed size, allocate exactly once
    # Otherwise, grow geometrically until uncompress succeeds
    cap = expected_size or max(64, len(data) * 4)
    for _ in range(6):  # hard cap attempts
        dest = (c_Byte * cap)()
        dest_len = c_uLong(cap)
        rc = z.uncompress(dest, ctypes.byref(dest_len), src, src_len)
        if rc == Z_OK:
            return bytes(memoryview(dest)[:dest_len.value])
        if rc != Z_BUF_ERROR:
            _errcheck(rc, "uncompress")
        cap *= 2  # try again with a bigger buffer
    raise ZlibError("uncompress ran out of attempts to grow buffer")

# Quick check
blob = b"ffi " * 1000
compressed = zlib_compress(blob, level=6)
decompressed = zlib_uncompress(compressed, expected_size=len(blob))
assert decompressed == blob

Key takeaways:

We mapped the exact prototypes.
We asked zlib for the worst case size with compressBound.
We handled error codes and surfaced them as Python exceptions.
We avoided c_char_p for mutable buffers and used explicit byte arrays.

Example B: cffi with callbacks, arrays, and automatic GC

This example binds to the C library qsort and uses a Python comparator. It also shows using ffi.gc to attach a destructor to a pointer that must be freed.

import sys
from cffi import FFI

ffi = FFI()

ffi.cdef(
    """
    typedef int (*cmp_fn)(const void *a, const void *b);

    void qsort(void *base, size_t nmemb, size_t size, cmp_fn compar);

    // We will also use malloc and free to show ffi.gc
    void *malloc(size_t size);
    void free(void *ptr);
    """
)

# Resolve the C runtime that provides qsort, malloc, free
if sys.platform.startswith("win"):
    libc = ffi.dlopen("msvcrt.dll")
else:
    # libc name varies; on many Linux systems it is libc.so.6
    # If this fails on your distro, use ctypes.util.find_library to locate libc
    libc = ffi.dlopen("libc.so.6")

# Prepare an array of 32 bit ints
N = 10
arr = ffi.new("int[]", [7, 1, 2, 9, 5, 3, 8, 4, 6, 0])

# Create a comparator that sorts ascending
@ffi.callback("int(const void *, const void *)")
def cmp(a, b):
    # Cast the void* pointers back to int*
    ia = ffi.cast("const int *", a)[0]
    ib = ffi.cast("const int *", b)[0]
    # qsort expects negative, zero, or positive
    return -1 if ia < ib else (1 if ia > ib else 0)

# Call qsort. sizeof(int) is computed by cffi
libc.qsort(arr, N, ffi.sizeof("int"), cmp)

print([arr[i] for i in range(N)])  # sorted values 0..9

# Show ffi.gc with malloc
nbytes = 1024
raw = libc.malloc(nbytes)
buf = ffi.gc(raw, libc.free)  # buf will be freed when GC runs

# Turn it into a typed pointer for convenience
p = ffi.cast("unsigned char *", buf)
for i in range(nbytes):
    p[i] = i % 256
print(int(p[0]), int(p[255]))

Important notes:

We used ffi.callback to build a function pointer that C can call.
We kept the callback object in a Python variable. Do not let it go out of scope while C might call it.
We used ffi.gc to ensure malloced memory is freed even if exceptions happen.
For production code, wrap callbacks with try except and return a safe error value on exceptions.

Building a tidy Pythonic wrapper

Clean bindings are pleasant to use. Tips:

Hide raw pointers and lengths behind Python methods that take and return bytes or memoryviews.
Normalize errors to exceptions and keep error enums on a module or class.
Assert invariants using asserts or explicit checks right before FFI calls. Small preconditions catch big crashes.
Keep a single module that handles library loading and symbol resolution so you can log exactly what file you loaded.
For repeated patterns such as allocate, call, grow, repeat, factor a helper like grow_buffer(call_fn) and reuse it.

When to prefer a C shim

Sometimes the safest path is to write 20 lines of C:

Variadic functions
Macro only APIs or inline only functions
Complex pointer to pointer out parameters
Structs with bit fields and packed alignment
GIL sensitive long running loops

Your cffi API mode or a tiny CPython extension can expose a flat, FFI friendly API that you then bind from Python cleanly.

Final thoughts

FFI turns Python into a control plane for high performance C. The craft is in the details: correct prototypes, careful ownership, clear errors, and boring predictability around loading and packaging. Start with ABI bindings to learn the surface, then promote the hot path or tricky pieces to a minimal API mode shim. With that split, you get the best of both worlds: Python ergonomics with native speed where it matters.