Mojo

The Python Superset That Runs at the Speed of C

13 min read

#mojo, #programming, #python

Mojo is a compiled programming language designed as a superset of Python, with a focus on high-performance numerical and AI workloads. In benchmarks like computing the Mandelbrot set, Mojo achieves performance significantly faster than CPython — sometimes by several orders of magnitude — while maintaining full compatibility with existing Python code.

Mojo is developed by Modular, an AI infrastructure company founded by Chris Lattner. Lattner previously created LLVM and Clang, led the development of Swift at Apple, and worked on TensorFlow's XLA compiler at Google. Mojo is built on top of MLIR, a compiler infrastructure project Lattner also helped create.

Why Another Language?

Python dominates AI and data science because of its accessibility and ecosystem: NumPy, PyTorch, TensorFlow, scikit-learn. However, Python is an interpreted, dynamically-typed language, which means raw execution speed is limited. The performance-critical parts of major Python libraries are implemented in C, C++, or Fortran. Python serves as an interface to that native code rather than executing numerics itself.

This creates the two-language problem. Developers prototype in Python, then rewrite performance-critical sections as C extensions when speed becomes necessary. This adds complexity, maintenance burden, and requires expertise in two very different languages.

Mojo's design goal is to address this directly: allow developers to write performance-critical code in the same language as their Python glue code, without switching to a separate systems language.

What Exactly Is Mojo?

Mojo is designed to be a superset of Python, and is largely compatible with Python 3 syntax. In practice, most valid Python 3 code also runs as valid Mojo code, though full standard library compatibility is still in progress. You can import Python libraries, use Python syntax, and run existing Python scripts without modification. Mojo then adds optional systems-level features — explicit types, memory ownership, SIMD operations, and ahead-of-time compilation — that you can adopt incrementally.

Under the hood, Mojo compiles via MLIR, which enables targeting CPUs, GPUs, and custom AI accelerators like Google's TPUs from the same source file. This architecture is one of the reasons Mojo can produce competitive performance while maintaining Python-compatible syntax.

Mojo files use the .mojo extension.

Installation

Mojo is installed via Modular's Magic package manager.

macOS and Linux:

curl -ssL https://magic.modular.com/install.sh | bash

After installing, create a new Mojo project:

magic init my-project --format mojoproject
cd my-project
magic run mojo

Windows: Mojo currently requires WSL (Windows Subsystem for Linux). Native Windows support is on the roadmap.

Verify installation:

mojo --version

You can also experiment without installing at playground.modular.com.

Hello, Mojo

Mojo runs Python without modification:

print("Hello, Mojo!")

Save it as hello.mojo and run it:

mojo hello.mojo

That's valid Mojo. No changes needed from Python.

The Core Idea: `def` vs `fn`

The most important conceptual distinction in Mojo is between def and fn.

Python-style functions use def — dynamic, flexible, and interpreted at runtime. Mojo-style functions use fn — statically typed and compiled to native machine code.

# Python-style: flexible, interpreted
def greet(name):
    print("Hello, " + name)

# Mojo-style: typed, compiled
fn greet_fast(name: String) -> None:
    print("Hello, " + name)

With fn, argument types and return types must be declared. The compiler uses these to generate optimized native code, bypassing the Python interpreter, object overhead, and the Global Interpreter Lock (GIL). Both styles can be used freely in the same file — def for general code, fn where performance matters.

Variables and Mutability

In fn functions, mutable variables are declared with var. Variables declared without var are treated as immutable by the compiler — reassigning them is a compile-time error.

fn calculate() -> Float64:
    pi = 3.14159        # no var: immutable, reassignment is a compile error
    var radius = 5.0    # var: mutable, can be reassigned
    radius = 6.0        # fine
    return pi * radius * radius

Note: Mojo previously had a let keyword for explicitly declaring immutable variables, but it was removed from the language. The current convention is to simply omit var when you don't need reassignment.

Structs: Stack Allocation Without the Pain

Mojo provides struct for high-performance data types. Unlike Python classes — which are heap-allocated and reference-counted — Mojo structs are value types stored on the stack. This eliminates garbage collection pauses and heap allocation overhead, which matters in tight inference loops.

struct Point:
    var x: Float64
    var y: Float64

    fn __init__(mut self, x: Float64, y: Float64):
        self.x = x
        self.y = y

    fn distance(self) -> Float64:
        return (self.x ** 2 + self.y ** 2) ** 0.5

fn main():
    var p = Point(3.0, 4.0)
    print(p.distance())  # 5.0

The mut annotation on self in __init__ indicates that the function modifies the value in-place. This enables precise memory management without a garbage collector. The approach is conceptually similar to Rust's borrow checker but with less ceremony. (Earlier versions of Mojo used inout for this; it was renamed to mut to better reflect its meaning.)

Calling Python Libraries

Mojo can import Python libraries directly and use them alongside native Mojo code:

from python import Python

fn main() raises:
    var np = Python.import_module("numpy")
    var arr = np.array([1, 2, 3, 4, 5])
    print(np.sum(arr))  # 15

The raises annotation indicates this function may propagate a Python exception. This interop means existing Python libraries — PyTorch, NumPy, Pandas, Matplotlib — are available from within Mojo code. Python-facing code runs through the Python runtime; code written with fn and explicit types runs natively.

Where the Speed Comes From: SIMD

Modern CPUs process multiple values in a single instruction via SIMD (Single Instruction, Multiple Data). This is the mechanism behind NumPy's vectorized operations. Normally, accessing SIMD requires C intrinsics or external libraries. Mojo exposes SIMD operations directly in high-level code:

from algorithm.functional import vectorize
from memory import UnsafePointer
from sys import simd_width_of

fn dot_product(a: UnsafePointer[Float32],
               b: UnsafePointer[Float32], n: Int) -> Float32:
    var result: Float32 = 0.0
    alias simd_width = simd_width_of[DType.float32]()

    fn mul_and_add[width: Int](i: Int):
        result += (a.load[width=width](i) *
                   b.load[width=width](i)).reduce_add()

    vectorize[simd_width](n, mul_and_add)
    return result

The vectorize function automatically batches work into SIMD chunks matched to the CPU's register width. On an AVX2 machine, it processes 8 floats per instruction; on AVX-512, 16. The same code runs on both — Mojo selects the appropriate width at compile time.

A Concrete Speed Comparison

Matrix multiplication is a core operation in AI. Here is the same computation implemented three ways:

Pure Python (readable, slow):

def matmul_python(A, B):
    result = [[0] * len(B[0]) for _ in range(len(A))]
    for i in range(len(A)):
        for j in range(len(B[0])):
            for k in range(len(B)):
                result[i][j] += A[i][k] * B[k][j]
    return result

NumPy (fast, but implemented in C):

import numpy as np
result = np.matmul(A, B)

Mojo with SIMD tiling (fast, and written in the language itself):

fn matmul[M: Int, N: Int, K: Int](C: Matrix, A: Matrix, B: Matrix):
    for i in range(M):
        for k in range(K):
            @parameter
            fn dot[nelts: Int](j: Int):
                C.store[nelts](i, j,
                    C.load[nelts](i, j) + A[i, k] * B.load[nelts](k, j))
            vectorize[dot, nelts](N)

On Modular's benchmarks, the Mojo version matches or exceeds NumPy's performance. The difference from NumPy is that this code is written in Mojo itself rather than in a compiled C extension, which means it can be read, modified, and extended directly.

Mojo's Current State

Mojo is actively developed and the feature set is still expanding. Here's a practical picture of where things stand:

What works today:

Full Python syntax and interop — import any Python library
Systems-level features: fn, struct, SIMD, ownership model
Compiling to standalone native binaries (no Python runtime required)
AI inference and numerical computing at production scale
Tooling: the magic package manager, MAX platform integration

Still in progress:

Full Python standard library compatibility — some gaps remain
Native Windows support — WSL is the current path
The standard library is growing but not yet complete

Open source: The standard library source is available on GitHub, and the language itself is free to download and use. Modular has committed to open-sourcing the language.

The language launched as a closed beta in 2023 and now has a growing community, a package ecosystem, and users running it in production AI inference workloads.

Who Might Find Mojo Useful?

AI/ML engineers who need to write custom kernels, operators, or data pipelines in Python but are hitting performance limits. Mojo allows writing those performance-sensitive paths without switching languages.

Python developers who need native performance but don't want to learn Rust or C++. Mojo is a superset of Python, so existing Python knowledge transfers directly. The additional concepts are introduced incrementally.

Systems programmers interested in a language with strong hardware targeting, GPU support, and the ability to write code for AI accelerators without low-level C boilerplate.

Developers following the AI infrastructure space who want hands-on experience with the tooling being built around modern ML hardware.

Going Further

Mojo Documentation — actively maintained, with many examples
Mojo GitHub — standard library source, examples, open issues
Modular Playground — run Mojo in your browser without installing anything
Discord Community — active community, beginner-friendly

Summary

Mojo occupies an interesting position: it is compatible with Python and its ecosystem, but compiles to native code and exposes low-level hardware features when needed. It is not the first language to attempt this space — Julia and Cython have addressed related problems — but its approach of targeting Python compatibility while compiling through MLIR is distinct.

The language is young and some rough edges remain, but the core features are functional and the development velocity has been high. For developers working at the intersection of Python and performance-sensitive AI code, it's worth understanding how it works.