I have been on a graph database journey for a while now. It started with Memgraph (which I covered in a previous article), moved through Neo4j comparisons, and eventually led me to Apache AGE. Why? Because I was already running PostgreSQL and I needed to traverse a 13+ million node distributor tree without adding another database to manage. AGE lets me write Cypher queries alongside my existing SQL - same database, same backups, same connection string. I installed it from source on Amazon Linux with PG 17, loaded my real production data, and put it through its paces. This article is what I learned along the way.
If you are running PostgreSQL and your data has relationships that go deeper than a simple JOIN can handle - hierarchies, social networks, dependency graphs - you have probably felt the pain of recursive CTEs. They work, but they get ugly fast and slower even faster.
Apache AGE (A Graph Extension) solves this by bolting a full graph database engine directly onto PostgreSQL. You get openCypher query support (the same language Neo4j and MemGraph uses), graph storage, and the ability to JOIN graph results with your regular relational tables - all inside a single database. No data migration. No second database to manage. No sync headaches.
Think of it this way: plain PostgreSQL answering "who are all of Jane's descendants?" is like searching an alphabetical phone book over and over. AGE is like having the family tree on the wall and tracing the branches with your finger.
Apache AGE was born out of AgensGraph, a multi-model database that Bitnine Global built as a fork of PostgreSQL 10. Led by developer Junseok Yang, the team realized that while a full fork was powerful, most PostgreSQL users wanted graph capabilities without abandoning their existing database.
Key milestones:
Recently Bitnine Co., Ltd. (the Korean parent company of Bitnine Global) was acquired by Directors Company in December 2024 and renamed itself to "SKAI Worldwide Co., Ltd." in January 2025. Their new website is skaiworldwide.com. They're pivoting toward AI advertising and content production while still maintaining their graph database products.
AGE differentiates itself through three pillars:
AGE is not an in-memory graph engine like Memgraph. It stores graph data in regular PostgreSQL heap tables with B-tree index lookups per hop, not index-free adjacency. For shallow queries (2-3 levels) the difference is negligible. For deep traversals on millions of nodes, a dedicated graph database will be faster. But if your priority is operational simplicity and tight relational integration, AGE is hard to beat.
AGE excels in scenarios where graph data coexists with relational data:
Network marketing companies need to traverse sponsor and enroller hierarchies across millions of distributors. AGE handles upline, downline, and team size queries without recursive CTE gymnastics.
Banks and payment processors can model transaction flows as graphs and identify suspicious patterns by traversing connections between accounts, devices, and merchants.
Organizations building knowledge graphs for AI/LLM augmentation can store graph relationships directly in PostgreSQL and query them alongside vector embeddings (using pgvector).
Model users, roles, permissions, and organizational hierarchies as a graph. Answering "does this user have access to this resource through any path?" becomes a single Cypher query.
Map infrastructure dependencies, trace outage impacts, and perform root cause analysis by traversing device-to-device relationships.
Trace component dependencies and supplier relationships through deep, multi-hop graphs without the complexity of recursive SQL.
docker run \
--name age \
-p 5455:5432 \
-e POSTGRES_USER=postgresUser \
-e POSTGRES_PASSWORD=postgresPW \
-e POSTGRES_DB=postgresDB \
-d \
apache/age
Connect:
docker exec -it age psql -d postgresDB -U postgresUser
This is the route I took on Amazon Linux 2023 with PostgreSQL 17. Here is what actually works:
# 1. Install dependencies
sudo dnf install -y git gcc make readline-devel zlib-devel \
bison flex perl-FindBin perl-lib
# 2. Install PG devel package (PGDG version, not Amazon's)
sudo dnf install -y postgresql17-devel
# 3. Clone and checkout the correct branch
git clone https://github.com/apache/age.git
cd age
git checkout release/PG17/1.7.0
# 4. Build against the correct pg_config
make PG_CONFIG=/usr/pgsql-17/bin/pg_config
sudo make PG_CONFIG=/usr/pgsql-17/bin/pg_config install
# 5. Verify
ls /usr/pgsql-17/lib/age.so
Edit your postgresql.conf:
shared_preload_libraries = 'age'
Restart PostgreSQL, then enable:
CREATE EXTENSION age;
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
SELECT ag_catalog.create_graph('my_graph');
Gotchas I hit:
perl-FindBin and perl-lib before running make - the build will fail without themag_catalog.create_graph() with the explicit schema - bare create_graph() does not resolvemake will target the wrong pg_config - always specify PG_CONFIG= explicitlyAzure Database for PostgreSQL supports AGE as a built-in extension. Enable it through the Azure portal under Server Parameters by adding age to shared_preload_libraries.
psql -d mydb -U postgres
Every session requires loading AGE:
LOAD 'age';
SET search_path = ag_catalog, "$user", public;
Apache AGE Viewer provides graph visualization and a Cypher query editor:
git clone https://github.com/apache/age-viewer.git
cd age-viewer
npm install && npm run setup
npm run dev
Navigate to http://localhost:3000.
Since AGE is a PostgreSQL extension, any PostgreSQL driver works for SQL operations. For Cypher query parsing and result handling, AGE provides dedicated drivers:
| Language | Driver | Notes |
|---|---|---|
| Python | apache-age-python |
Built on psycopg2; parses agtype results |
| Go | apache/age/drivers/golang |
Official driver in the AGE repo |
| Java | apache/age/drivers/jdbc |
JDBC-based driver |
| Node.js | apache/age/drivers/nodejs |
Built on pg driver |
| Rust | apache_age |
Sync and async (Tokio) support |
| C | Built into AGE | Native access via libpq |
Python Example:
import age
connection = age.connect(
graph="my_graph",
dsn="host=127.0.0.1 port=5432 dbname=mydb user=postgres"
)
cursor = connection.execCypher(
"MATCH (d:Distributor {id: 1})-[:SPONSORS*]->(downline) "
"RETURN downline.name, downline.level"
)
for row in cursor:
print(row[0], row[1])
connection.commit()
connection.close()
You can also use any standard PostgreSQL driver (psycopg2, pg, JDBC) and call Cypher through the cypher() function wrapper in SQL - this is how I do it in production.
AGE uses the property graph model, the same model used by Neo4j and Memgraph.
Nodes (Vertices): Entities with one or more labels and a map of properties.
Edges (Relationships): Directed connections between nodes with a type and optional properties.
agtype: AGE's custom data type for storing graph data. It handles vertices, edges, paths, maps, and lists. Think of it as JSONB with graph semantics.
Graphs and Labels: AGE stores each graph as a PostgreSQL schema. Each vertex label and edge label becomes a table within that schema.
(:Distributor {id: 1, name: "Alice"})-[:SPONSORS {since: "2024-01"}]->(:Distributor {id: 2, name: "Bob"})
Under the hood, AGE creates:
my_graph."Distributor" (id graphid, properties agtype)
my_graph."SPONSORS" (id graphid, start_id graphid, end_id graphid, properties agtype)
This is both AGE's strength (you can query these tables with regular SQL) and its limitation (traversals require index lookups, not pointer chasing).
Cypher queries in AGE are wrapped in a cypher() function call:
SELECT * FROM cypher('my_graph', $$
CREATE (:Person {name: "Alice", age: 30})
$$) AS (v agtype);
SELECT * FROM cypher('my_graph', $$
CREATE (:Person {name: "Bob", age: 25})
$$) AS (v agtype);
-- Create a relationship
SELECT * FROM cypher('my_graph', $$
MATCH (a:Person {name: "Alice"}), (b:Person {name: "Bob"})
CREATE (a)-[:KNOWS {since: 2023}]->(b)
RETURN a, b
$$) AS (a agtype, b agtype);
-- Find all people
SELECT * FROM cypher('my_graph', $$
MATCH (p:Person) RETURN p.name, p.age
$$) AS (name agtype, age agtype);
-- Traverse relationships
SELECT * FROM cypher('my_graph', $$
MATCH (a:Person)-[r:KNOWS]->(b:Person)
RETURN a.name, b.name, r.since
$$) AS (from_name agtype, to_name agtype, since agtype);
-- Variable-length paths (1 to 5 hops)
SELECT * FROM cypher('my_graph', $$
MATCH (a:Person {name: "Alice"})-[:KNOWS*1..5]->(friend)
RETURN DISTINCT friend.name
$$) AS (name agtype);
-- Update a property
SELECT * FROM cypher('my_graph', $$
MATCH (p:Person {name: "Alice"})
SET p.age = 31
RETURN p
$$) AS (v agtype);
-- Delete a node and its relationships
SELECT * FROM cypher('my_graph', $$
MATCH (p:Person {name: "Bob"})
DETACH DELETE p
$$) AS (v agtype);
This is what sets AGE apart from standalone graph databases:
-- Join graph results with a regular relational table
SELECT c.email, c.phone, g.team_size
FROM customers c
JOIN (
SELECT * FROM cypher('mlm', $$
MATCH (d:Distributor)-[:SPONSORS*]->(down)
RETURN d.id AS dist_id, count(down) AS team_size
$$) AS (dist_id agtype, team_size agtype)
) g ON c.dist_id = g.dist_id::int;
No other graph database gives you this without application-level orchestration.
This is the use case that brought me to AGE - modeling an MLM sponsor/enroller hierarchy with monthly snapshots.
SELECT ag_catalog.create_graph('mlm');
-- Create vertex and edge labels
SELECT ag_catalog.create_vlabel('mlm', 'Distributor');
SELECT ag_catalog.create_elabel('mlm', 'SPONSORS');
-- Create indexes for performance
CREATE INDEX ON mlm."Distributor" USING gin (properties);
SELECT * FROM cypher('mlm', $$
CREATE (:Distributor {id: 1, name: "Alice", status: "Active"})
$$) AS (v agtype);
SELECT * FROM cypher('mlm', $$
CREATE (:Distributor {id: 2, name: "Bob", status: "Active"})
$$) AS (v agtype);
SELECT * FROM cypher('mlm', $$
CREATE (:Distributor {id: 3, name: "Charlie", status: "Active"})
$$) AS (v agtype);
-- Create sponsor relationships
SELECT * FROM cypher('mlm', $$
MATCH (a:Distributor {id: 1}), (b:Distributor {id: 2})
CREATE (a)-[:SPONSORS]->(b)
$$) AS (e agtype);
SELECT * FROM cypher('mlm', $$
MATCH (a:Distributor {id: 1}), (c:Distributor {id: 3})
CREATE (a)-[:SPONSORS]->(c)
$$) AS (e agtype);
-- Direct downline
SELECT * FROM cypher('mlm', $$
MATCH (d:Distributor {id: 1})-[:SPONSORS]->(down)
RETURN down.name, down.status
$$) AS (name agtype, status agtype);
-- Full downline (all levels)
SELECT * FROM cypher('mlm', $$
MATCH (d:Distributor {id: 1})-[:SPONSORS*]->(down)
RETURN down.name, down.id
$$) AS (name agtype, id agtype);
-- Upline (sponsor chain to root)
SELECT * FROM cypher('mlm', $$
MATCH (d:Distributor {id: 3})<-[:SPONSORS*]-(upline)
RETURN upline.name, upline.id
$$) AS (name agtype, id agtype);
-- Team size count
SELECT * FROM cypher('mlm', $$
MATCH (d:Distributor {id: 1})-[:SPONSORS*]->(down)
RETURN count(down) AS team_size
$$) AS (team_size agtype);
-- Shortest path between two distributors
SELECT * FROM cypher('mlm', $$
MATCH p = shortestPath(
(a:Distributor {id: 1})-[:SPONSORS*]-(b:Distributor {id: 3})
)
RETURN p
$$) AS (path agtype);
-- GIN index on properties (most useful)
CREATE INDEX ON mlm."Distributor" USING gin (properties);
-- B-tree on specific property for exact lookups
CREATE INDEX ON mlm."Distributor" ((properties->>'id'));
-- Unbounded - potentially catastrophic on large graphs
MATCH (d)-[:SPONSORS*]->(x) RETURN x;
-- Bounded - predictable performance
MATCH (d)-[:SPONSORS*1..10]->(x) RETURN x;
For bulk loading millions of nodes, bypass Cypher and insert directly into the underlying tables. This is significantly faster than individual CREATE statements.
EXPLAIN ANALYZE
SELECT * FROM cypher('mlm', $$
MATCH (d:Distributor {id: 1})-[:SPONSORS*1..5]->(down)
RETURN down
$$) AS (v agtype);
Every connection needs LOAD 'age' and the search_path set. Forget this and you will get cryptic errors about unhandled cipher function calls.
Choose AGE when:
Consider alternatives when:
Apache AGE occupies a unique position in the graph database landscape. It is not the fastest graph engine - Memgraph and Neo4j will outrun it on deep traversals. But it is the only one that lives inside PostgreSQL, and that changes the entire operational equation.
If you are already invested in PostgreSQL, AGE means you do not need a second database, a second backup strategy, a second monitoring system, or application-level joins to combine graph and relational data. You get Cypher queries that can sit right next to your SQL, sharing the same transaction, the same connection, the same data.
I loaded my 13+ million node distributor tree into AGE and it handled the bounded queries I needed without breaking a sweat. For unbounded deep traversals across the entire tree, Memgraph is still faster. But for the 90% of queries that are bounded, well-indexed, and need to JOIN back to relational data? AGE is the right tool.