Apache Age

The graph database extension that turns PostgreSQL into a multi-model powerhouse

16 min read

#databases, #postgres

Prelude

I have been on a graph database journey for a while now. It started with Memgraph (which I covered in a previous article), moved through Neo4j comparisons, and eventually led me to Apache AGE. Why? Because I was already running PostgreSQL and I needed to traverse a 13+ million node distributor tree without adding another database to manage. AGE lets me write Cypher queries alongside my existing SQL - same database, same backups, same connection string. I installed it from source on Amazon Linux with PG 17, loaded my real production data, and put it through its paces. This article is what I learned along the way.

Apache AGE

If you are running PostgreSQL and your data has relationships that go deeper than a simple JOIN can handle - hierarchies, social networks, dependency graphs - you have probably felt the pain of recursive CTEs. They work, but they get ugly fast and slower even faster.

Apache AGE (A Graph Extension) solves this by bolting a full graph database engine directly onto PostgreSQL. You get openCypher query support (the same language Neo4j and MemGraph uses), graph storage, and the ability to JOIN graph results with your regular relational tables - all inside a single database. No data migration. No second database to manage. No sync headaches.

Think of it this way: plain PostgreSQL answering "who are all of Jane's descendants?" is like searching an alphabetical phone book over and over. AGE is like having the family tree on the wall and tracing the branches with your finger.

A Brief History

Apache AGE was born out of AgensGraph, a multi-model database that Bitnine Global built as a fork of PostgreSQL 10. Led by developer Junseok Yang, the team realized that while a full fork was powerful, most PostgreSQL users wanted graph capabilities without abandoning their existing database.

Key milestones:

2019 - Development begins at Bitnine Global as a PostgreSQL extension inspired by AgensGraph
2020 - Donated to the Apache Software Foundation; enters the Apache Incubator in April
2022 - Graduates as an Apache Top-Level Project in May; releases v1.0.0; first formal graph database project in the ASF
2024 - Support expands across PostgreSQL 11 through 17; LangChain integration proposed for RAG pipelines
2025 - PostgreSQL 18 support added; Azure Database for PostgreSQL includes AGE as a managed extension.

Recently Bitnine Co., Ltd. (the Korean parent company of Bitnine Global) was acquired by Directors Company in December 2024 and renamed itself to "SKAI Worldwide Co., Ltd." in January 2025. Their new website is skaiworldwide.com. They're pivoting toward AI advertising and content production while still maintaining their graph database products.

Why Apache AGE?

The Core Value Proposition

AGE differentiates itself through three pillars:

Zero Migration: Your existing PostgreSQL data stays exactly where it is. AGE adds graph capabilities alongside it. You can JOIN Cypher query results with regular SQL tables in a single statement.
One Database to Manage: No separate graph database process, no data synchronization, no additional backups. AGE is a PostgreSQL extension - it lives inside your existing database.
openCypher Compatibility: If you know Cypher from Neo4j or Memgraph, you already know how to query AGE. The learning curve is minimal.

The Trade-Off

AGE is not an in-memory graph engine like Memgraph. It stores graph data in regular PostgreSQL heap tables with B-tree index lookups per hop, not index-free adjacency. For shallow queries (2-3 levels) the difference is negligible. For deep traversals on millions of nodes, a dedicated graph database will be faster. But if your priority is operational simplicity and tight relational integration, AGE is hard to beat.

Use Cases

AGE excels in scenarios where graph data coexists with relational data:

MLM / Distributor Trees

Network marketing companies need to traverse sponsor and enroller hierarchies across millions of distributors. AGE handles upline, downline, and team size queries without recursive CTE gymnastics.

Fraud Detection

Banks and payment processors can model transaction flows as graphs and identify suspicious patterns by traversing connections between accounts, devices, and merchants.

Knowledge Graphs & RAG

Organizations building knowledge graphs for AI/LLM augmentation can store graph relationships directly in PostgreSQL and query them alongside vector embeddings (using pgvector).

Identity & Access Management

Model users, roles, permissions, and organizational hierarchies as a graph. Answering "does this user have access to this resource through any path?" becomes a single Cypher query.

Network & Infrastructure

Map infrastructure dependencies, trace outage impacts, and perform root cause analysis by traversing device-to-device relationships.

Supply Chain

Trace component dependencies and supplier relationships through deep, multi-hop graphs without the complexity of recursive SQL.

Installation

Option 1: Docker (Fastest)

docker run \
  --name age \
  -p 5455:5432 \
  -e POSTGRES_USER=postgresUser \
  -e POSTGRES_PASSWORD=postgresPW \
  -e POSTGRES_DB=postgresDB \
  -d \
  apache/age

Connect:

docker exec -it age psql -d postgresDB -U postgresUser

Option 2: Build from Source (Production)

This is the route I took on Amazon Linux 2023 with PostgreSQL 17. Here is what actually works:

# 1. Install dependencies
sudo dnf install -y git gcc make readline-devel zlib-devel \
  bison flex perl-FindBin perl-lib

# 2. Install PG devel package (PGDG version, not Amazon's)
sudo dnf install -y postgresql17-devel

# 3. Clone and checkout the correct branch
git clone https://github.com/apache/age.git
cd age
git checkout release/PG17/1.7.0

# 4. Build against the correct pg_config
make PG_CONFIG=/usr/pgsql-17/bin/pg_config
sudo make PG_CONFIG=/usr/pgsql-17/bin/pg_config install

# 5. Verify
ls /usr/pgsql-17/lib/age.so

Edit your postgresql.conf:

shared_preload_libraries = 'age'

Restart PostgreSQL, then enable:

CREATE EXTENSION age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
SELECT ag_catalog.create_graph('my_graph');

Gotchas I hit:

Install perl-FindBin and perl-lib before running make - the build will fail without them
Always use ag_catalog.create_graph() with the explicit schema - bare create_graph() does not resolve
If you have both Amazon and PGDG PostgreSQL packages, make will target the wrong pg_config - always specify PG_CONFIG= explicitly
PGDG and Amazon server-devel packages conflict; use the PGDG version

Option 3: Azure Managed

Azure Database for PostgreSQL supports AGE as a built-in extension. Enable it through the Azure portal under Server Parameters by adding age to shared_preload_libraries.

Connecting to AGE

Using psql (CLI)

psql -d mydb -U postgres

Every session requires loading AGE:

LOAD 'age';
SET search_path = ag_catalog, "$user", public;

Using AGE Viewer (Web UI)

Apache AGE Viewer provides graph visualization and a Cypher query editor:

git clone https://github.com/apache/age-viewer.git
cd age-viewer
npm install && npm run setup
npm run dev

Navigate to http://localhost:3000.

Using Drivers

Since AGE is a PostgreSQL extension, any PostgreSQL driver works for SQL operations. For Cypher query parsing and result handling, AGE provides dedicated drivers:

Language	Driver	Notes
Python	`apache-age-python`	Built on psycopg2; parses agtype results
Go	`apache/age/drivers/golang`	Official driver in the AGE repo
Java	`apache/age/drivers/jdbc`	JDBC-based driver
Node.js	`apache/age/drivers/nodejs`	Built on pg driver
Rust	`apache_age`	Sync and async (Tokio) support
C	Built into AGE	Native access via libpq

Python Example:

import age

connection = age.connect(
    graph="my_graph",
    dsn="host=127.0.0.1 port=5432 dbname=mydb user=postgres"
)

cursor = connection.execCypher(
    "MATCH (d:Distributor {id: 1})-[:SPONSORS*]->(downline) "
    "RETURN downline.name, downline.level"
)

for row in cursor:
    print(row[0], row[1])

connection.commit()
connection.close()

You can also use any standard PostgreSQL driver (psycopg2, pg, JDBC) and call Cypher through the cypher() function wrapper in SQL - this is how I do it in production.

Core Concepts

AGE uses the property graph model, the same model used by Neo4j and Memgraph.

Nodes (Vertices): Entities with one or more labels and a map of properties.

Edges (Relationships): Directed connections between nodes with a type and optional properties.

agtype: AGE's custom data type for storing graph data. It handles vertices, edges, paths, maps, and lists. Think of it as JSONB with graph semantics.

Graphs and Labels: AGE stores each graph as a PostgreSQL schema. Each vertex label and edge label becomes a table within that schema.

(:Distributor {id: 1, name: "Alice"})-[:SPONSORS {since: "2024-01"}]->(:Distributor {id: 2, name: "Bob"})

Under the hood, AGE creates:

my_graph."Distributor" (id graphid, properties agtype)
my_graph."SPONSORS"    (id graphid, start_id graphid, end_id graphid, properties agtype)

This is both AGE's strength (you can query these tables with regular SQL) and its limitation (traversals require index lookups, not pointer chasing).

Cypher Query Examples

Cypher queries in AGE are wrapped in a cypher() function call:

Creating Data

SELECT * FROM cypher('my_graph', $$
    CREATE (:Person {name: "Alice", age: 30})
$$) AS (v agtype);

SELECT * FROM cypher('my_graph', $$
    CREATE (:Person {name: "Bob", age: 25})
$$) AS (v agtype);

-- Create a relationship
SELECT * FROM cypher('my_graph', $$
    MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
    CREATE (a)-[:KNOWS {since: 2023}]->(b)
    RETURN a, b
$$) AS (a agtype, b agtype);

Reading Data

-- Find all people
SELECT * FROM cypher('my_graph', $$
    MATCH (p:Person) RETURN p.name, p.age
$$) AS (name agtype, age agtype);

-- Traverse relationships
SELECT * FROM cypher('my_graph', $$
    MATCH (a:Person)-[r:KNOWS]->(b:Person)
    RETURN a.name, b.name, r.since
$$) AS (from_name agtype, to_name agtype, since agtype);

-- Variable-length paths (1 to 5 hops)
SELECT * FROM cypher('my_graph', $$
    MATCH (a:Person {name: "Alice"})-[:KNOWS*1..5]->(friend)
    RETURN DISTINCT friend.name
$$) AS (name agtype);

Updating and Deleting

-- Update a property
SELECT * FROM cypher('my_graph', $$
    MATCH (p:Person {name: "Alice"})
    SET p.age = 31
    RETURN p
$$) AS (v agtype);

-- Delete a node and its relationships
SELECT * FROM cypher('my_graph', $$
    MATCH (p:Person {name: "Bob"})
    DETACH DELETE p
$$) AS (v agtype);

The Killer Feature: SQL + Cypher Joins

This is what sets AGE apart from standalone graph databases:

-- Join graph results with a regular relational table
SELECT c.email, c.phone, g.team_size
FROM customers c
JOIN (
    SELECT * FROM cypher('mlm', $$
        MATCH (d:Distributor)-[:SPONSORS*]->(down)
        RETURN d.id AS dist_id, count(down) AS team_size
    $$) AS (dist_id agtype, team_size agtype)
) g ON c.dist_id = g.dist_id::int;

No other graph database gives you this without application-level orchestration.

Real-World Example: Distributor Tree

This is the use case that brought me to AGE - modeling an MLM sponsor/enroller hierarchy with monthly snapshots.

Set Up the Graph

SELECT ag_catalog.create_graph('mlm');

-- Create vertex and edge labels
SELECT ag_catalog.create_vlabel('mlm', 'Distributor');
SELECT ag_catalog.create_elabel('mlm', 'SPONSORS');

-- Create indexes for performance
CREATE INDEX ON mlm."Distributor" USING gin (properties);

Load Distributors

SELECT * FROM cypher('mlm', $$
    CREATE (:Distributor {id: 1, name: "Alice", status: "Active"})
$$) AS (v agtype);

SELECT * FROM cypher('mlm', $$
    CREATE (:Distributor {id: 2, name: "Bob", status: "Active"})
$$) AS (v agtype);

SELECT * FROM cypher('mlm', $$
    CREATE (:Distributor {id: 3, name: "Charlie", status: "Active"})
$$) AS (v agtype);

-- Create sponsor relationships
SELECT * FROM cypher('mlm', $$
    MATCH (a:Distributor {id: 1}), (b:Distributor {id: 2})
    CREATE (a)-[:SPONSORS]->(b)
$$) AS (e agtype);

SELECT * FROM cypher('mlm', $$
    MATCH (a:Distributor {id: 1}), (c:Distributor {id: 3})
    CREATE (a)-[:SPONSORS]->(c)
$$) AS (e agtype);

Common Queries

-- Direct downline
SELECT * FROM cypher('mlm', $$
    MATCH (d:Distributor {id: 1})-[:SPONSORS]->(down)
    RETURN down.name, down.status
$$) AS (name agtype, status agtype);

-- Full downline (all levels)
SELECT * FROM cypher('mlm', $$
    MATCH (d:Distributor {id: 1})-[:SPONSORS*]->(down)
    RETURN down.name, down.id
$$) AS (name agtype, id agtype);

-- Upline (sponsor chain to root)
SELECT * FROM cypher('mlm', $$
    MATCH (d:Distributor {id: 3})<-[:SPONSORS*]-(upline)
    RETURN upline.name, upline.id
$$) AS (name agtype, id agtype);

-- Team size count
SELECT * FROM cypher('mlm', $$
    MATCH (d:Distributor {id: 1})-[:SPONSORS*]->(down)
    RETURN count(down) AS team_size
$$) AS (team_size agtype);

-- Shortest path between two distributors
SELECT * FROM cypher('mlm', $$
    MATCH p = shortestPath(
        (a:Distributor {id: 1})-[:SPONSORS*]-(b:Distributor {id: 3})
    )
    RETURN p
$$) AS (path agtype);

Performance Tips

1. Create Indexes

-- GIN index on properties (most useful)
CREATE INDEX ON mlm."Distributor" USING gin (properties);

-- B-tree on specific property for exact lookups
CREATE INDEX ON mlm."Distributor" ((properties->>'id'));

2. Bound Your Traversals

-- Unbounded - potentially catastrophic on large graphs
MATCH (d)-[:SPONSORS*]->(x) RETURN x;

-- Bounded - predictable performance
MATCH (d)-[:SPONSORS*1..10]->(x) RETURN x;

3. Load Data in Batches via SQL

For bulk loading millions of nodes, bypass Cypher and insert directly into the underlying tables. This is significantly faster than individual CREATE statements.

4. Use EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT * FROM cypher('mlm', $$
    MATCH (d:Distributor {id: 1})-[:SPONSORS*1..5]->(down)
    RETURN down
$$) AS (v agtype);

5. Remember the Session Setup

Every connection needs LOAD 'age' and the search_path set. Forget this and you will get cryptic errors about unhandled cipher function calls.

When to Choose Apache AGE

Choose AGE when:

You already run PostgreSQL and want graph capabilities without adding infrastructure
Your graph queries need to JOIN with relational data
Operational simplicity matters more than raw traversal speed
Your traversals are bounded (under 10 levels deep) or your dataset is moderate
You want ACID transactions across both graph and relational operations
You need a single backup and recovery strategy

Consider alternatives when:

You need sub-millisecond deep traversals on millions of nodes (use Memgraph)
Your graph workload is the primary workload, not secondary (use a dedicated graph DB)
You need streaming graph ingestion (Memgraph has native Kafka support)
You need mature graph visualization tools (Neo4j Bloom, Memgraph Lab)

Resources

Official Site: age.apache.org
GitHub: github.com/apache/age
AGE Viewer: github.com/apache/age-viewer
Documentation: age.apache.org/age-manual
Docker Hub: hub.docker.com/r/apache/age

Conclusion

Apache AGE occupies a unique position in the graph database landscape. It is not the fastest graph engine - Memgraph and Neo4j will outrun it on deep traversals. But it is the only one that lives inside PostgreSQL, and that changes the entire operational equation.

If you are already invested in PostgreSQL, AGE means you do not need a second database, a second backup strategy, a second monitoring system, or application-level joins to combine graph and relational data. You get Cypher queries that can sit right next to your SQL, sharing the same transaction, the same connection, the same data.

I loaded my 13+ million node distributor tree into AGE and it handled the bounded queries I needed without breaking a sweat. For unbounded deep traversals across the entire tree, Memgraph is still faster. But for the 90% of queries that are bounded, well-indexed, and need to JOIN back to relational data? AGE is the right tool.