You already have the data. It's sitting in PostgreSQL, MySQL, Snowflake, or a dozen other stores. The problem isn't the data -- it's the relationships buried inside it. Finding how your customers connect, how fraud propagates, or how your supply chain is linked requires graph traversal. And traditionally, that meant standing up a dedicated graph database, migrating data via complex ETL pipelines, keeping everything in sync, and paying the operational cost forever.
PuppyGraph takes a different approach: it sits on top of your existing relational databases and lets you query them as a graph -- zero ETL, zero data migration, operational in under 10 minutes.
PuppyGraph is a graph query engine, not a graph database. That distinction matters. Traditional graph databases (Neo4j, TigerGraph, etc.) require you to load your data into their proprietary storage. PuppyGraph performs the graph computation and pushes down the data fetch to your existing data sources via JDBC or a Parquet file reader, reading directly from tables you already have.
You define a schema -- a JSON document that maps your relational tables to graph vertices and edges -- and PuppyGraph only reads the necessary data related to a query based on the defined schema.
Key capabilities:
Real-world users include Coinbase (crypto graph analytics), Netskope (cybersecurity), and AMD.
| Term | Meaning |
|---|---|
| Catalog | A connection to a data source (your PostgreSQL instance, Snowflake account, etc.) |
| Vertex (Node) | A graph node, mapped from a database table |
| Edge | A graph relationship between vertices, also mapped from a table |
| Schema | A JSON document that defines catalogs, vertices, and edges |
| Gremlin | Apache TinkerPop's graph traversal language |
| openCypher | Neo4j's declarative graph query language |
PuppyGraph exposes three ports:
| Port | Purpose |
|---|---|
8081 |
Web UI + REST API |
8182 |
Gremlin WebSocket endpoint |
7687 |
Bolt protocol (openCypher / Neo4j-compatible drivers) |
PuppyGraph supports both query languages and you can switch between them freely. Understanding what each is optimized for will save you a lot of time.
Gremlin is the query language of Apache TinkerPop, an open-source graph computing framework. It's a procedural, traversal-based language -- you describe how to walk the graph step by step, chaining operations like a pipeline.
// "Start at marko, walk outward along 'knows' edges,
// then walk outward along 'created' edges, return names"
g.V().has("name", "marko")
.out("knows")
.out("created")
.values("name")
Every step (out(), has(), values(), where(), etc.) narrows or transforms the traversal. It reads almost like a stream pipeline. Gremlin is:
Gremlin's weakness is readability for people who think in SQL. Pattern matching (finding all nodes that fit a shape) requires more verbose syntax than Cypher.
openCypher originated at Neo4j and is now an open standard. It's a declarative, pattern-matching language -- you describe what you want the graph to look like, and the engine figures out how to find it. The syntax is inspired by SQL and ASCII art.
-- "Find all people marko knows, and what software those people created"
MATCH (p:person {name: 'marko'})-[:knows]->(:person)-[:created]->(s:software)
RETURN DISTINCT s.name
The --> arrow notation visually represents relationships, making schemas easy to read at a glance. Cypher is:
Cypher's weakness is that very long traversal chains can become harder to express than the equivalent Gremlin.
The same query in both languages:
// Gremlin: all software created by people marko knows, with edge weights > 0.5
g.V().has("name", "marko")
.outE("knows").has("weight", gt(0.5)).inV()
.out("created")
.values("name")
-- Cypher: same query
MATCH (p:person {name: 'marko'})-[k:knows]->(friend)-[:created]->(s:software)
WHERE k.weight > 0.5
RETURN s.name
| Factor | Gremlin | openCypher |
|---|---|---|
| Style | Procedural / traversal | Declarative / pattern matching |
| Learning curve | Steeper (new paradigm) | Gentler (SQL-like) |
| Deep multi-hop traversals | Natural | Workable but verbose |
| Pattern matching | Verbose | Concise and visual |
| Ecosystem | TinkerPop-compatible engines | Neo4j ecosystem + growing open standard |
| PuppyGraph CLI access | console command |
cypher-console command |
| External driver | gremlin-python | Neo4j Python driver (Bolt) |
Rule of thumb: If your team knows SQL and you're doing structural pattern matching, start with openCypher. If you're doing complex multi-hop traversals or working in a TinkerPop ecosystem, use Gremlin. With PuppyGraph you can mix both -- Gremlin for traversal-heavy work, Cypher for quick exploratory queries.
PuppyGraph runs as a Docker container. The only prerequisite is Docker with Compose.
brew install --cask docker
# Launch Docker Desktop from Applications, then verify:
docker compose version
docker compose version
All
dockercommands work identically in PowerShell or WSL 2 bash.
docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 \
-e PUPPYGRAPH_USERNAME=puppygraph \
-e PUPPYGRAPH_PASSWORD=puppygraph123 \
--name puppygraph \
puppygraph/puppygraph:stable
Open http://localhost:8081 and log in with puppygraph / puppygraph123.
The examples below assume PuppyGraph is running and your databases are already accessible over the network.
Every PuppyGraph deployment is driven by a schema.json with three top-level sections:
{
"catalogs": [ ... ],
"graph": {
"vertices": [ ... ],
"edges": [ ... ]
}
}
Vertices need at minimum a label, a tableSource, and an id field. Edges additionally need fromVertex, toVertex, fromId, and toId to wire them to vertex tables.
The only thing that changes between database types is the catalog block. The graph section is identical regardless of whether the data lives in PostgreSQL, MySQL, or Snowflake.
Upload schema via REST:
curl -XPOST -H "content-type: application/json" \
--data-binary @./schema.json \
--user "puppygraph:puppygraph123" \
localhost:8081/schema
# {"Status":"OK","Message":"Schema uploaded and gremlin server restarted"}
Assume you have tables structured like this in an existing PostgreSQL instance:
modern.person (id, name, age)
modern.software (id, name, lang)
modern.knows (id, from_id, to_id, weight)
modern.created (id, from_id, to_id, weight)
schema.json:
{
"catalogs": [
{
"name": "pg_data",
"type": "postgresql",
"jdbc": {
"username": "myuser",
"password": "mypassword",
"jdbcUri": "jdbc:postgresql://my-db-host:5432/my_database",
"driverClass": "org.postgresql.Driver"
}
}
],
"graph": {
"vertices": [
{
"label": "person",
"oneToOne": {
"tableSource": { "catalog": "pg_data", "schema": "modern", "table": "person" },
"id": { "fields": [{ "type": "String", "field": "id", "alias": "id" }] },
"attributes": [
{ "type": "String", "field": "name", "alias": "name" },
{ "type": "Int", "field": "age", "alias": "age" }
]
}
},
{
"label": "software",
"oneToOne": {
"tableSource": { "catalog": "pg_data", "schema": "modern", "table": "software" },
"id": { "fields": [{ "type": "String", "field": "id", "alias": "id" }] },
"attributes": [
{ "type": "String", "field": "name", "alias": "name" },
{ "type": "String", "field": "lang", "alias": "lang" }
]
}
}
],
"edges": [
{
"label": "knows",
"fromVertex": "person",
"toVertex": "person",
"tableSource": { "catalog": "pg_data", "schema": "modern", "table": "knows" },
"id": { "fields": [{ "type": "String", "field": "id", "alias": "id" }] },
"fromId": { "fields": [{ "type": "String", "field": "from_id", "alias": "from_id" }] },
"toId": { "fields": [{ "type": "String", "field": "to_id", "alias": "to_id" }] },
"attributes": [{ "type": "Double", "field": "weight", "alias": "weight" }]
},
{
"label": "created",
"fromVertex": "person",
"toVertex": "software",
"tableSource": { "catalog": "pg_data", "schema": "modern", "table": "created" },
"id": { "fields": [{ "type": "String", "field": "id", "alias": "id" }] },
"fromId": { "fields": [{ "type": "String", "field": "from_id", "alias": "from_id" }] },
"toId": { "fields": [{ "type": "String", "field": "to_id", "alias": "to_id" }] },
"attributes": [{ "type": "Double", "field": "weight", "alias": "weight" }]
}
]
}
}
docker exec -it puppygraph ./bin/puppygraph console
g.V().has("name", "marko").valueMap()
g.V().has("name", "marko").out("knows").values("name")
g.V().has("name", "marko").out("knows").out("created").values("name")
g.V().hasLabel("person").has("age", gt(30)).valueMap("name", "age")
g.V().has("name", "marko").out("knows").out("created").path().by("name")
[PuppyGraph]> cypher-console
:> MATCH (p:person {name: 'marko'})-[:knows]->(friend)
RETURN friend.name, friend.age
:> MATCH (p:person {name: 'marko'})-[:knows]->()-[:created]->(s:software)
RETURN DISTINCT s.name
:> MATCH (p:person)-[:created]->(s:software)
RETURN p.name, count(s) AS projects
ORDER BY projects DESC
:> MATCH (p:person)-[k:knows]->(friend)
WHERE k.weight > 0.5
RETURN p.name, friend.name, k.weight
The MySQL catalog block differs only in type, driverClass, jdbcUri, and driverUrl. The graph section is identical to PostgreSQL.
{
"name": "mysql_data",
"type": "mysql",
"jdbc": {
"username": "myuser",
"password": "mypassword",
"jdbcUri": "jdbc:mysql://my-db-host:3306/my_database",
"driverClass": "com.mysql.cj.jdbc.Driver",
"driverUrl": "https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar"
}
}
Note: The
driverUrlfield tells PuppyGraph where to download the MySQL JDBC driver at startup. PostgreSQL's driver is bundled by default; MySQL's must be specified.
Swap this catalog block into the PostgreSQL schema.json, updating "catalog": "pg_data" references in each tableSource to "catalog": "mysql_data". All the same Gremlin and Cypher queries work without modification.
Snowflake uses key-pair authentication rather than a plain password, which requires a small amount of upfront setup.
# Generate encrypted private key
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out rsa_key.p8
# Derive the public key
openssl rsa -in rsa_key.p8 -pubout -out rsa_key.pub
Open a SQL worksheet in Snowsight and run:
ALTER USER <your_snowflake_user>
SET RSA_PUBLIC_KEY = 'MIIBIjANBgkqh...';
-- paste the body of rsa_key.pub, excluding the -----BEGIN/END----- lines
docker run -p 8081:8081 -p 8182:8182 -p 7687:7687 \
-e PUPPYGRAPH_USERNAME=puppygraph \
-e PUPPYGRAPH_PASSWORD=puppygraph123 \
-v /path/to/your/keys:/home \
--name puppygraph \
puppygraph/puppygraph:stable
The private key will be available inside the container at /home/rsa_key.p8.
The Snowflake catalog is best configured through the Schema Builder UI at http://localhost:8081. Click Create graph schema → Add more catalogs and fill in:
| Field | Value |
|---|---|
| Catalog type | Snowflake |
| Catalog name | Any label you choose |
| Auth type | Key-pair authentication |
| Server URL | <account_identifier>.snowflakecomputing.com |
| User name | Your Snowflake username |
| Database name | Your Snowflake database |
| Warehouse name | Your Snowflake warehouse |
| Private key file path | /home/rsa_key.p8 |
| Private key file password | The passphrase set during key generation |
Click Save → Submit.
{
"catalogs": [
{
"name": "snowflake_data",
"type": "snowflake",
"snowflake": {
"username": "my_sf_user",
"serverUrl": "myorg-myaccount.snowflakecomputing.com",
"database": "MY_DATABASE",
"warehouse": "MY_WAREHOUSE",
"privateKeyPath": "/home/rsa_key.p8",
"privateKeyPassword": "my_passphrase"
}
}
],
"graph": {
"vertices": [
{
"label": "person",
"oneToOne": {
"tableSource": { "catalog": "snowflake_data", "schema": "MODERN", "table": "PERSON" },
"id": { "fields": [{ "type": "String", "field": "ID", "alias": "id" }] },
"attributes": [
{ "type": "String", "field": "NAME", "alias": "name" },
{ "type": "Int", "field": "AGE", "alias": "age" }
]
}
},
{
"label": "software",
"oneToOne": {
"tableSource": { "catalog": "snowflake_data", "schema": "MODERN", "table": "SOFTWARE" },
"id": { "fields": [{ "type": "String", "field": "ID", "alias": "id" }] },
"attributes": [
{ "type": "String", "field": "NAME", "alias": "name" },
{ "type": "String", "field": "LANG", "alias": "lang" }
]
}
}
],
"edges": [
{
"label": "knows",
"fromVertex": "person",
"toVertex": "person",
"tableSource": { "catalog": "snowflake_data", "schema": "MODERN", "table": "KNOWS" },
"id": { "fields": [{ "type": "String", "field": "ID", "alias": "id" }] },
"fromId": { "fields": [{ "type": "String", "field": "FROM_ID", "alias": "from_id" }] },
"toId": { "fields": [{ "type": "String", "field": "TO_ID", "alias": "to_id" }] },
"attributes": [{ "type": "Double", "field": "WEIGHT", "alias": "weight" }]
},
{
"label": "created",
"fromVertex": "person",
"toVertex": "software",
"tableSource": { "catalog": "snowflake_data", "schema": "MODERN", "table": "CREATED" },
"id": { "fields": [{ "type": "String", "field": "ID", "alias": "id" }] },
"fromId": { "fields": [{ "type": "String", "field": "FROM_ID", "alias": "from_id" }] },
"toId": { "fields": [{ "type": "String", "field": "TO_ID", "alias": "to_id" }] },
"attributes": [{ "type": "Double", "field": "WEIGHT", "alias": "weight" }]
}
]
}
}
Snowflake identifiers are uppercase by default. Field names in your schema JSON must match the case Snowflake uses -- usually all caps unless you created columns with quoted lowercase names.
After uploading the schema, all the same Gremlin and Cypher queries from the PostgreSQL example work identically.
One of PuppyGraph's most powerful features is cross-database graph federation. A single schema can reference multiple catalogs, letting you traverse relationships that span different databases in one query.
{
"catalogs": [
{ "name": "pg_data", "type": "postgresql", "jdbc": { "jdbcUri": "jdbc:postgresql://pg-host:5432/crm", "...": "..." } },
{ "name": "sf_data", "type": "snowflake", "snowflake": { "serverUrl": "myorg-myaccount.snowflakecomputing.com", "...": "..." } }
],
"graph": {
"vertices": [
{ "label": "customer", "oneToOne": { "tableSource": { "catalog": "pg_data", "schema": "public", "table": "customers" }, "...": "..." } },
{ "label": "order", "oneToOne": { "tableSource": { "catalog": "sf_data", "schema": "SALES", "table": "ORDERS" }, "...": "..." } }
],
"edges": [
{ "label": "placed", "fromVertex": "customer", "toVertex": "order",
"tableSource": { "catalog": "pg_data", "schema": "public", "table": "customer_orders" }, "...": "..." }
]
}
}
A single Gremlin or Cypher query can now traverse from a customer vertex in PostgreSQL to an order vertex in Snowflake in one hop -- no join logic needed in the application layer.
One-to-One (most common): One table row = one vertex or edge. All examples above use "oneToOne": { ... }.
Many-to-One: Multiple rows from different tables coalesce into a single vertex type. Useful for denormalized data where the same conceptual entity is spread across several tables.
Edge Tables: Always backed by a table with at minimum three columns -- an ID, a from_id, and a to_id. These reference the ID columns of the vertex tables. Any additional columns become traversable edge properties.
The web UI at http://localhost:8081 provides:
For Snowflake specifically, the Schema Builder's Auto Suggestion feature can inspect your table structure and propose the edge wiring automatically -- a significant time saver on large schemas.
Python -- Gremlin:
from gremlin_python.driver import client, serializer
c = client.Client(
"ws://localhost:8182/gremlin",
"g",
message_serializer=serializer.GraphSONSerializersV2d0()
)
result = c.submit(
"g.V().has('name', 'marko').out('knows').values('name')"
).all().result()
print(result) # ['vadas', 'josh']
Python -- openCypher (neo4j driver):
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("puppygraph", "puppygraph123"))
with driver.session() as session:
result = session.run(
"MATCH (p:person {name: $name})-[:knows]->(friend) RETURN friend.name",
name="marko"
)
for record in result:
print(record["friend.name"])
| Factor | PuppyGraph | Neo4j / TigerGraph |
|---|---|---|
| Data location | Your existing store | Migrated to graph DB |
| ETL required | No | Yes |
| Setup time | ~10 minutes | Hours to days |
| Query languages | Gremlin + Cypher | Cypher / GSQL |
| Write operations | Read-only | Full read/write |
| Multi-source federation | Yes | No (single store) |
| Best for | Graph analytics on existing data | Transactional graph workloads requiring graph-native writes |
The primary trade-off is write semantics. PuppyGraph is a query engine -- you read through it, you write through your existing databases using their normal clients. If your use case is analytics on data that already lives in relational stores, PuppyGraph removes an entire category of operational overhead.
Fraud detection -- model accounts, transactions, devices, and IP addresses as a graph. Multi-hop traversal reveals rings and shared infrastructure that flat SQL can't surface efficiently.
Network & IT security -- graph traversal across asset, vulnerability, and connection tables identifies blast radius and lateral movement paths.
Supply chain analysis -- model suppliers, factories, distributors, and customers as connected nodes. Path queries expose single points of failure and find alternate routes.
Knowledge graphs for LLMs -- wire PuppyGraph's Gremlin endpoint into a RAG pipeline to give LLMs structured relationship traversal over your enterprise data.
Organizational hierarchies -- distributor downline trees, org charts, and referral networks traversed directly from your existing CRM or ERP tables.
puppygraph/puppygraph:stable