Quick Start
CrabGraph is an embedded graph database. Add it as a dependency, point it at a SQL schema file, and you're traversing graphs in minutes — no separate server process, no config files, no CLI to install.
pom.xml — it starts an embedded Gremlin server automatically on initialization.<dependency> <groupId>io.crabgraph</groupId> <artifactId>crabgraph-embedded</artifactId> <version>1.0.0</version> </dependency>
schema.sql file on the classpath with standard CREATE TABLE statements for vertices and edges, and CREATE VIEW for derived traversals.-- Vertex: person CREATE TABLE person ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, age INT ); -- Edge: person → person CREATE TABLE knows ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), from_id UUID REFERENCES person(id), to_id UUID REFERENCES person(id), since INT ); -- Derived view: friends-of-friends CREATE VIEW friends_of_friends AS SELECT a.from_id, b.to_id FROM knows a JOIN knows b ON a.to_id = b.from_id;
CrabGraph.start() — the schema is applied and the Gremlin traversal source is ready immediately.import io.crabgraph.CrabGraph; var crab = CrabGraph.start(); // loads schema.sql from classpath var g = crab.traversal(); g.addV("person").property("name", "Alice").property("age", 34).next(); List<Map<String, Object>> people = g.V() .hasLabel("person") .has("age", P.gt(30)) .valueMap("name", "age") .toList();
pip install crabgraph
schema.sql. CrabGraph applies it on startup — vertices, edges, and views are all ready.from crabgraph import CrabGraph crab = CrabGraph.start(schema="./schema.sql") g = crab.traversal() g.addV("person").property("name", "Alice").property("age", 34).next() people = (g.V().hasLabel("person") .has("age", P.gt(30)) .valueMap("name", "age") .toList())
.start().npm install crabgraph
import { CrabGraph } from 'crabgraph'; import { P } from 'gremlin'; const crab = await CrabGraph.start({ schema: './schema.sql' }); const g = crab.traversal(); await g.addV('person').property('name', 'Alice').next(); const people = await g.V() .hasLabel('person') .has('age', P.gt(30)) .valueMap('name', 'age') .toList();
go get io.crabgraph/crabgraph-go@v1.0.0
package main import ( "context" crab "io.crabgraph/crabgraph-go" ) func main() { ctx := context.Background() db, _ := crab.Start(ctx, crab.Options{Schema: "./schema.sql"}) defer db.Close() g := db.Traversal() g.AddV("person").Property("name", "Alice").Next(ctx) }
build.rs. The embedded server starts when you call CrabGraph::start().[dependencies] crabgraph = "1.0"
tokio. Schema path is resolved relative to the crate root at compile time via include_str!, or passed at runtime.use crabgraph::{CrabGraph, P}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { // Schema embedded at compile time let schema = include_str!("../schema.sql"); let crab = CrabGraph::start().schema_str(schema).await()?; let g = crab.traversal(); g.add_v("person") .property("name", "Alice") .property("age", 34_i32) .next().await()?; let people = g.v() .has_label("person") .has("age", P::gt(30)) .value_map("name", "age") .to_list().await()?; println!("{people:?}"); Ok(()) }
Zero config. CrabGraph discovers schema.sql on the classpath (Java) or at the path you pass. No XML, no YAML, no environment variables needed to get started.
Installation
CrabGraph is distributed exclusively through language-native package managers. There is no standalone binary or CLI to install.
| Language | Package Manager | Package |
|---|---|---|
| Java / Kotlin | Maven, Gradle | io.crabgraph:crabgraph-embedded:1.0.0 |
| Python ≥ 3.9 | pip, Poetry, uv | crabgraph==1.0.0 |
| Node.js ≥ 18 | npm, yarn, pnpm | crabgraph@1.0.0 |
| Go ≥ 1.21 | go get | io.crabgraph/crabgraph-go v1.0.0 |
| Rust ≥ 1.75 | Cargo | crabgraph = "1.0" |
Gradle (Kotlin DSL)
dependencies { implementation("io.crabgraph:crabgraph-embedded:1.0.0") }
Define a Schema
CrabGraph uses standard SQL DDL in a single schema.sql file. No proprietary schema language to learn. Tables become graph labels; views become computed traversals.
Vertices
Any table with a UUID PRIMARY KEY is a vertex label. Each row is a vertex; columns are its properties.
CREATE TABLE movie ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), title VARCHAR(500) NOT NULL, year INT, rating FLOAT );
Edges
A table with from_id and to_id foreign-key columns becomes an edge label. CrabGraph detects this convention automatically — no annotation needed. Any other columns are edge properties.
CREATE TABLE acted_in ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), from_id UUID REFERENCES person(id) ON DELETE CASCADE, to_id UUID REFERENCES movie(id) ON DELETE CASCADE, role VARCHAR(255) -- edge property );
Derived Views
Use CREATE VIEW … AS SELECT to define computed edge types. Views follow the same from_id / to_id convention and are traversable like any other edge label.
-- People who appeared in the same movie CREATE VIEW co_star AS SELECT a.from_id AS from_id, b.from_id AS to_id, m.title AS movie_title FROM acted_in a JOIN acted_in b ON a.to_id = b.to_id AND a.from_id <> b.from_id JOIN movie m ON m.id = a.to_id;
Views are read-only. You cannot addV or addE on a label backed by a CREATE VIEW. Writes must go through the underlying tables.
Your First Query
CrabGraph supports the full Gremlin traversal language via standard Apache TinkerPop drivers. If you've used Gremlin before, everything works identically — CrabGraph just maps it to your SQL-defined schema.
// Movies an actor appeared in g.V().has("person", "name", "Tom Hanks") .out("acted_in") .values("title") .toList() // Co-stars via the derived view g.V().has("person", "name", "Tom Hanks") .out("co_star") .values("name") .dedup() .toList() // Shortest path between two people g.V().has("person", "name", "Alice") .repeat(.bothE().otherV().simplePath()) .until(.has("name", "Bob")) .path() .limit(1) .next()
New to Gremlin? See the Gremlin Primer for the most useful steps, or the TinkerPop reference for the full spec.
Embedded Server
When you call CrabGraph.start(), a Gremlin-compatible WebSocket server boots inside your process on 127.0.0.1:8182. Nothing to install separately — the server and storage engine are bundled in the dependency.
Lifecycle
// Minimal start — schema.sql loaded from classpath root var crab = CrabGraph.start(); // Builder — full options var crab = CrabGraph.builder() .port(9182) .dataDir(Path.of("/var/myapp/graph")) .schemaResource("db/schema.sql") .start(); // Graceful shutdown crab.close();
Storage modes
| Mode | How to enable | Use for |
|---|---|---|
| In-memory | Default (no dataDir) | Tests, development — clean slate every start |
| Persistent | Set dataDir | Production — WAL-backed, survives restarts |
Testing tip. Omit dataDir in tests — in-memory mode gives each test a clean graph with no teardown required.
Schema Reference
CrabGraph interprets your DDL according to a small set of conventions. No annotations, no special comments — just column names and foreign key relationships.
Type mapping
| SQL type | Gremlin property type | Notes |
|---|---|---|
| UUID | String | Auto-generated when column has DEFAULT gen_random_uuid() |
| VARCHAR, TEXT | String | |
| INT, BIGINT, SMALLINT | Long | |
| FLOAT, DOUBLE, NUMERIC | Double | |
| BOOLEAN | Boolean | |
| TIMESTAMP, DATE | Date (ISO-8601 string in Gremlin) | |
| JSONB | Map<String, Object> | Traversable as nested properties |
Edge detection rules
CrabGraph marks a table as an edge type when all three conditions hold:
- The table has a column named
from_idwith a foreign key to another table's primary key. - The table has a column named
to_idwith a foreign key (to the same or different table). - The
from_idandto_idcolumns referenceUUID PRIMARY KEYcolumns.
All other columns on an edge table are treated as edge properties and accessible via .values() and .valueMap() in traversals.
Indices
Standard SQL indices are respected by CrabGraph's query planner. Add a CREATE INDEX on any property column you filter on frequently:
CREATE INDEX idx_person_name ON person(name); CREATE INDEX idx_person_age ON person(age);
Gremlin Primer
Gremlin is a functional, data-flow traversal language. A traversal starts at a set of elements and threads through a pipeline of steps. Each step transforms the current traversers.
Starting a traversal
g.V() // all vertices g.V("some-uuid") // vertex by ID g.E() // all edges g.V().hasLabel("person") // vertices of a specific type g.V().has("person", "name", "Alice") // filter by property
Step reference
P.* predicates.Map. Good for final projection.valueMap for complex projections..as().t until condition c is met. Use .times(n) for fixed depth.Path object.Map grouped by a key.Order.desc for descending.n traversers. Always prefer to toList() unbounded.Predicate reference (P)
| Predicate | Meaning |
|---|---|
| P.eq(x) | Equal to x |
| P.neq(x) | Not equal |
| P.gt(x) / P.lt(x) | Greater / less than |
| P.gte(x) / P.lte(x) | Greater or equal / less or equal |
| P.between(lo, hi) | lo ≤ value < hi |
| P.within(x, y, …) | Value is one of the listed options |
| P.without(x, y, …) | Value is none of the listed options |
| TextP.containing(s) | String contains s |
| TextP.startingWith(s) | String starts with s |
Traversal Patterns
Common graph query patterns expressed in Gremlin, using the movie graph schema from the Quick Start.
Neighbourhood queries
// Direct neighbours g.V().has("person", "name", "Alice").out("knows").values("name") // N-hop expansion (BFS up to 3 hops) g.V().has("person", "name", "Alice") .repeat(.out("knows").simplePath()) .times(3) .dedup() .values("name")
Shortest path
g.V().has("person", "name", "Alice") .repeat(.bothE().otherV().simplePath()) .until(.has("person", "name", "Bob")) .path() .limit(1) .next()
Aggregation and grouping
// Movies grouped by year, sorted descending g.V().hasLabel("movie") .group() .by("year") .by(.values("title").fold()) .order(Scope.local).by(Column.keys, Order.desc) .next() // Actor with most credits g.V().hasLabel("person") .project("name", "credits") .by("name") .by(.out("acted_in").count()) .order().by("credits", Order.desc) .limit(10) .toList()
Filtering with where
// People who know someone older than them g.V().hasLabel("person").as("a") .out("knows").as("b") .where("a", P.lt("b")).by("age") .select("a").values("name") .toList()
Java / Kotlin
The Java SDK is the reference implementation. It supports both Maven and Gradle, runs on JVM 11+, and integrates with Spring Boot via an autoconfiguration module.
Spring Boot autoconfiguration
<dependency> <groupId>io.crabgraph</groupId> <artifactId>crabgraph-spring-boot-starter</artifactId> <version>1.0.0</version> </dependency>
crabgraph: schema: classpath:schema.sql data-dir: /var/myapp/graph # omit for in-memory port: 8182
@Service public class PersonService { private final GraphTraversalSource g; public PersonService(CrabGraph crab) { this.g = crab.traversal(); } public List<String> friendNames(String name) { return g.V().has("person", "name", name) .out("knows") .values("name") .toList(); } }
Kotlin
val crab = CrabGraph.start() val g = crab.traversal() val friends: List<String> = g.V() .has("person", "name", "Alice") .out("knows") .values<String>("name") .toList()
Python
The Python package wraps the native CrabGraph binary and exposes a synchronous API that mirrors the TinkerPop Python driver. An asyncio variant is available via crabgraph[async].
Async usage
pip install "crabgraph[async]"
import asyncio from crabgraph import CrabGraph async def main(): crab = await CrabGraph.start(schema="./schema.sql") g = crab.traversal() await g.addV("person").property("name", "Alice").next() friends = await g.V().has("person", "name", "Alice") .out("knows") .valueMap("name") .toList() print(friends) asyncio.run(main())
Using with Django / Flask
Create the CrabGraph instance once at application startup and reuse it across requests. The embedded server is thread-safe.
from crabgraph import CrabGraph from flask import Flask, jsonify crab = CrabGraph.start(schema="schema.sql", data_dir="/var/graph") g = crab.traversal() app = Flask(__name__) @app.route("/people") def people(): names = g.V().hasLabel("person").values("name").toList() return jsonify(names)
Node.js
Full TypeScript types are included. The package ships with native binaries for Linux (x64, arm64), macOS (x64, arm64), and Windows (x64) via optional dependencies — no build step required.
TypeScript types
import { CrabGraph, GraphTraversalSource } from 'crabgraph'; import { P, Order } from 'gremlin'; let g: GraphTraversalSource; export async function initGraph(): Promise<void> { const crab = await CrabGraph.start({ schema: './schema.sql', dataDir: process.env.GRAPH_DIR, }); g = crab.traversal(); } export async function getFriends(name: string): Promise<string[]> { return g.V().has('person', 'name', name) .out('knows').values('name') .toList() as Promise<string[]>; }
Using with Express
import express from 'express'; import { CrabGraph } from 'crabgraph'; const app = express(); const crab = await CrabGraph.start({ schema: './schema.sql' }); const g = crab.traversal(); app.get('/people', async (req, res) => { const people = await g.V().hasLabel('person').valueMap(true).toList(); res.json(people); }); app.listen(3000);
Go
The Go SDK uses CGo to link the embedded library. The API is idiomatic Go — struct options, error returns, context propagation. Requires CGo-enabled builds (CGO_ENABLED=1).
package graph import ( "context" "fmt" crab "io.crabgraph/crabgraph-go" ) func Example(ctx context.Context) error { db, err := crab.Start(ctx, crab.Options{ SchemaPath: "./schema.sql", DataDir: "/var/graph", // omit for in-memory }) if err != nil { return err } defer db.Close() g := db.Traversal() _, err = g.AddV("person"). Property("name", "Alice"). Property("age", 34). Next(ctx) if err != nil { return err } results, err := g.V(). HasLabel("person"). Values("name"). ToList(ctx) fmt.Println(results) return err }
Rust
The Rust crate is async-first, built on tokio. The schema can be embedded at compile time with include_str! for zero-runtime-dependency deployments, or loaded from a file path at startup.
Cargo features
| Feature | Default | Description |
|---|---|---|
| tokio | ✓ | Async runtime (tokio 1.x) |
| serde | ✓ | Serialize/deserialize traversal results |
| persistent | Enable WAL-backed persistent storage | |
| metrics | Expose Prometheus metrics on :9090/metrics |
[dependencies] crabgraph = { version = "1.0", features = ["persistent", "serde"] }
Schema embedded at compile time
use crabgraph::{CrabGraph, P, Order}; use serde::{Deserialize, Serialize}; #[derive(Debug, Serialize, Deserialize)] struct Person { name: String, age: i32, } #[tokio::main] async fn main() -> anyhow::Result<()> { let crab = CrabGraph::builder() .schema_str(include_str!("../schema.sql")) .data_dir("/var/graph") .start() .await()?; let g = crab.traversal(); // Typed result deserialization let people: Vec<Person> = g.v() .has_label("person") .has("age", P::gt(30)) .value_map("name", "age") .into_vec::<Person>() .await()?; println!("{people:#?}"); Ok(()) }
The Rust crate requires a C linker. On Linux, install gcc. On macOS, Xcode Command Line Tools are sufficient.
CrabGraph Cloud
The embedded server is everything you need to ship. When you're ready to scale, CrabGraph Cloud is a fully managed hosted graph with zero code changes — same API, same schema format.
Skip the ops. Keep the Gremlin.
Swap your connection string and your embedded DB becomes a managed cluster — backups, replication, and monitoring included.
Migrate from Embedded
No code changes required. Update your connection config to point at your cloud endpoint — the traversal API is identical.
// Before — embedded var crab = CrabGraph.start(); // After — cloud (same traversal API) var crab = CrabGraph.cloud() .endpoint("wss://my-cluster.crabgraph.io/gremlin") .apiKey(System.getenv("CRAB_API_KEY")) .connect(); var g = crab.traversal(); // identical from here
Data export
Export your embedded graph to a portable format and import it into Cloud in one command via the SDK:
// Export from embedded var embedded = CrabGraph.start(); embedded.export(Path.of("export.graphson")); // Import into Cloud var cloud = CrabGraph.cloud().endpoint("wss://...").connect(); cloud.importFrom(Path.of("export.graphson"));
Configuration
All options can be passed via the builder API or, for Java, via application.yml when using the Spring Boot starter.
| Option | Type | Default | Description |
|---|---|---|---|
| schema | String / Path | schema.sql (classpath) | Path or classpath resource to DDL schema file |
| dataDir | Path | none (in-memory) | Directory for persistent storage. Creates if absent. |
| port | int | 8182 | Gremlin WebSocket server port (loopback only) |
| queryTimeout | Duration | 30s | Max time for a single traversal before cancellation |
| maxConnections | int | 16 | Max concurrent Gremlin connections |
| cacheSize | long (bytes) | 256 MB | In-memory query result cache. Set to 0 to disable. |
| logLevel | String | WARN | DEBUG, INFO, WARN, ERROR |
| metricsPort | int | none | If set, exposes Prometheus metrics on this port |
CrabGraph.builder() .schemaResource("db/schema.sql") .dataDir(Path.of("/var/myapp/graph")) .port(8182) .queryTimeout(Duration.ofSeconds(60)) .cacheSize(512 * 1024 * 1024) // 512 MB .logLevel("INFO") .metricsPort(9090) .start();
API Reference
Core methods available on the CrabGraph instance across all language SDKs.
| Method | Returns | Description |
|---|---|---|
| CrabGraph.start() | CrabGraph | Start with defaults. Schema loaded from classpath schema.sql. |
| CrabGraph.builder() | Builder | Fluent builder for all options before starting. |
| CrabGraph.cloud() | CloudBuilder | Connect to a CrabGraph Cloud endpoint instead of starting locally. |
| .traversal() | GraphTraversalSource | Returns the Gremlin traversal source g. |
| .export(path) | void | Export the entire graph to GraphSON 3.0 format. |
| .importFrom(path) | void | Import a GraphSON file, merging into existing data. |
| .schema() | Schema | Inspect the loaded schema — vertex labels, edge labels, properties. |
| .metrics() | GraphMetrics | Query counts, cache hit rate, active connections. |
| .close() | void | Gracefully shut down the embedded server and flush writes. |
GraphTraversalSource (g)
The traversal source g is a standard TinkerPop GraphTraversalSource. All standard Gremlin steps are supported. See the TinkerPop docs for the complete step library.
Changelog
Initial release
- Embedded Gremlin-compatible graph server in a single dependency
- SQL DDL schema definition —
CREATE TABLEfor vertices and edges,CREATE VIEWfor derived traversals - Persistent (WAL-backed) and in-memory storage modes
- SDKs for Java, Kotlin, Python, Node.js, Go, and Rust
- Spring Boot autoconfiguration starter for Java
- Full TinkerPop 3.7 step compatibility
- Prometheus metrics endpoint (opt-in)
Public beta
- Added
CREATE VIEWsupport for derived edge types - JSONB column support — nested properties traversable with
.has() - Query timeout and connection pool configuration
- Improved error messages for schema parse failures
Private beta
- Initial Java, Python, and Node.js SDKs
- In-memory mode for testing
- Core Gremlin step support:
V,E,out,in,has,repeat,path,project