Get Started

Quick Start

CrabGraph is an embedded graph database. Add it as a dependency, point it at a SQL schema file, and you're traversing graphs in minutes — no separate server process, no config files, no CLI to install.

1

Add the Maven dependency

CrabGraph ships as a single JAR. Add it to your pom.xml — it starts an embedded Gremlin server automatically on initialization.

pom.xml

<dependency>
  <groupId>io.crabgraph</groupId>
  <artifactId>crabgraph-embedded</artifactId>
  <version>1.0.0</version>
</dependency>

2

Define your schema

Create a schema.sql file on the classpath with standard CREATE TABLE statements for vertices and edges, and CREATE VIEW for derived traversals.

src/main/resources/schema.sql

-- Vertex: person
CREATE TABLE person (
  id    UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
  name  VARCHAR(255) NOT NULL,
  age   INT
);

-- Edge: person → person
CREATE TABLE knows (
  id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  from_id UUID REFERENCES person(id),
  to_id   UUID REFERENCES person(id),
  since   INT
);

-- Derived view: friends-of-friends
CREATE VIEW friends_of_friends AS
  SELECT a.from_id, b.to_id
  FROM   knows a JOIN knows b ON a.to_id = b.from_id;

3

Connect and traverse

Call CrabGraph.start() — the schema is applied and the Gremlin traversal source is ready immediately.

Java

import io.crabgraph.CrabGraph;

var crab = CrabGraph.start(); // loads schema.sql from classpath
var g = crab.traversal();

g.addV("person").property("name", "Alice").property("age", 34).next();

List<Map<String, Object>> people = g.V()
  .hasLabel("person")
  .has("age", P.gt(30))
  .valueMap("name", "age")
  .toList();

1

Install via pip

The package bundles a platform-native binary. The embedded server starts in a subprocess on first connection.

shell

pip install crabgraph

2

Connect and traverse

Pass the path to your schema.sql. CrabGraph applies it on startup — vertices, edges, and views are all ready.

Python

from crabgraph import CrabGraph

crab = CrabGraph.start(schema="./schema.sql")
g = crab.traversal()

g.addV("person").property("name", "Alice").property("age", 34).next()

people = (g.V().hasLabel("person")
           .has("age", P.gt(30))
           .valueMap("name", "age")
           .toList())

1

Install via npm

Ships with a platform-specific native binary via optional dependencies. The embedded server starts on first .start().

shell

npm install crabgraph

TypeScript

import { CrabGraph } from 'crabgraph';
import { P } from 'gremlin';

const crab = await CrabGraph.start({ schema: './schema.sql' });
const g = crab.traversal();

await g.addV('person').property('name', 'Alice').next();

const people = await g.V()
  .hasLabel('person')
  .has('age', P.gt(30))
  .valueMap('name', 'age')
  .toList();

1

Add the module

The Go module uses CGo bindings to the embedded CrabGraph library. The server runs in-process.

shell

go get io.crabgraph/crabgraph-go@v1.0.0

Go

package main

import (
  "context"
  crab "io.crabgraph/crabgraph-go"
)

func main() {
  ctx := context.Background()
  db, _ := crab.Start(ctx, crab.Options{Schema: "./schema.sql"})
  defer db.Close()

  g := db.Traversal()
  g.AddV("person").Property("name", "Alice").Next(ctx)
}

1

Add the crate

The Rust crate links against the CrabGraph native library via build.rs. The embedded server starts when you call CrabGraph::start().

Cargo.toml

[dependencies]
crabgraph = "1.0"

2

Connect and traverse

CrabGraph exposes an async Rust API built on tokio. Schema path is resolved relative to the crate root at compile time via include_str!, or passed at runtime.

Rust

use crabgraph::{CrabGraph, P};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
  // Schema embedded at compile time
  let schema = include_str!("../schema.sql");
  let crab = CrabGraph::start().schema_str(schema).await()?;
  let g = crab.traversal();

  g.add_v("person")
    .property("name", "Alice")
    .property("age", 34_i32)
    .next().await()?;

  let people = g.v()
    .has_label("person")
    .has("age", P::gt(30))
    .value_map("name", "age")
    .to_list().await()?;

  println!("{people:?}");
  Ok(())
}

Zero config. CrabGraph discovers schema.sql on the classpath (Java) or at the path you pass. No XML, no YAML, no environment variables needed to get started.

Setup

Installation

CrabGraph is distributed exclusively through language-native package managers. There is no standalone binary or CLI to install.

Language	Package Manager	Package
Java / Kotlin	Maven, Gradle	`io.crabgraph:crabgraph-embedded:1.0.0`
Python ≥ 3.9	pip, Poetry, uv	`crabgraph==1.0.0`
Node.js ≥ 18	npm, yarn, pnpm	`crabgraph@1.0.0`
Go ≥ 1.21	go get	`io.crabgraph/crabgraph-go v1.0.0`
Rust ≥ 1.75	Cargo	`crabgraph = "1.0"`

Gradle (Kotlin DSL)

build.gradle.kts

dependencies {
  implementation("io.crabgraph:crabgraph-embedded:1.0.0")
}

Core Concepts

Define a Schema

CrabGraph uses standard SQL DDL in a single schema.sql file. No proprietary schema language to learn. Tables become graph labels; views become computed traversals.

Vertices

Any table with a UUID PRIMARY KEY is a vertex label. Each row is a vertex; columns are its properties.

schema.sql

CREATE TABLE movie (
  id      UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
  title   VARCHAR(500) NOT NULL,
  year    INT,
  rating  FLOAT
);

Edges

A table with from_id and to_id foreign-key columns becomes an edge label. CrabGraph detects this convention automatically — no annotation needed. Any other columns are edge properties.

schema.sql

CREATE TABLE acted_in (
  id       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  from_id  UUID REFERENCES person(id) ON DELETE CASCADE,
  to_id    UUID REFERENCES movie(id)  ON DELETE CASCADE,
  role     VARCHAR(255)  -- edge property
);

Derived Views

Use CREATE VIEW … AS SELECT to define computed edge types. Views follow the same from_id / to_id convention and are traversable like any other edge label.

schema.sql

-- People who appeared in the same movie
CREATE VIEW co_star AS
  SELECT
    a.from_id               AS from_id,
    b.from_id               AS to_id,
    m.title                 AS movie_title
  FROM  acted_in a
  JOIN  acted_in b ON a.to_id = b.to_id AND a.from_id <> b.from_id
  JOIN  movie m    ON m.id = a.to_id;

Views are read-only. You cannot addV or addE on a label backed by a CREATE VIEW. Writes must go through the underlying tables.

Core Concepts

Your First Query

CrabGraph supports the full Gremlin traversal language via standard Apache TinkerPop drivers. If you've used Gremlin before, everything works identically — CrabGraph just maps it to your SQL-defined schema.

Gremlin

// Movies an actor appeared in
g.V().has("person", "name", "Tom Hanks")
  .out("acted_in")
  .values("title")
  .toList()

// Co-stars via the derived view
g.V().has("person", "name", "Tom Hanks")
  .out("co_star")
  .values("name")
  .dedup()
  .toList()

// Shortest path between two people
g.V().has("person", "name", "Alice")
  .repeat(.bothE().otherV().simplePath())
  .until(.has("name", "Bob"))
  .path()
  .limit(1)
  .next()

New to Gremlin? See the Gremlin Primer for the most useful steps, or the TinkerPop reference for the full spec.

Core Concepts

Embedded Server

When you call CrabGraph.start(), a Gremlin-compatible WebSocket server boots inside your process on 127.0.0.1:8182. Nothing to install separately — the server and storage engine are bundled in the dependency.

Lifecycle

Java

// Minimal start — schema.sql loaded from classpath root
var crab = CrabGraph.start();

// Builder — full options
var crab = CrabGraph.builder()
  .port(9182)
  .dataDir(Path.of("/var/myapp/graph"))
  .schemaResource("db/schema.sql")
  .start();

// Graceful shutdown
crab.close();

Storage modes

Mode	How to enable	Use for
In-memory	Default (no `dataDir`)	Tests, development — clean slate every start
Persistent	Set `dataDir`	Production — WAL-backed, survives restarts

Testing tip. Omit dataDir in tests — in-memory mode gives each test a clean graph with no teardown required.

Core Concepts

Schema Reference

CrabGraph interprets your DDL according to a small set of conventions. No annotations, no special comments — just column names and foreign key relationships.

Type mapping

SQL type	Gremlin property type	Notes
UUID	String	Auto-generated when column has `DEFAULT gen_random_uuid()`
VARCHAR, TEXT	String
INT, BIGINT, SMALLINT	Long
FLOAT, DOUBLE, NUMERIC	Double
BOOLEAN	Boolean
TIMESTAMP, DATE	Date (ISO-8601 string in Gremlin)
JSONB	Map<String, Object>	Traversable as nested properties

Edge detection rules

CrabGraph marks a table as an edge type when all three conditions hold:

The table has a column named from_id with a foreign key to another table's primary key.
The table has a column named to_id with a foreign key (to the same or different table).
The from_id and to_id columns reference UUID PRIMARY KEY columns.

All other columns on an edge table are treated as edge properties and accessible via .values() and .valueMap() in traversals.

Indices

Standard SQL indices are respected by CrabGraph's query planner. Add a CREATE INDEX on any property column you filter on frequently:

schema.sql

CREATE INDEX idx_person_name ON person(name);
CREATE INDEX idx_person_age  ON person(age);

Core Concepts

Gremlin Primer

Gremlin is a functional, data-flow traversal language. A traversal starts at a set of elements and threads through a pipeline of steps. Each step transforms the current traversers.

Starting a traversal

Gremlin

g.V()                          // all vertices
g.V("some-uuid")              // vertex by ID
g.E()                          // all edges
g.V().hasLabel("person")      // vertices of a specific type
g.V().has("person", "name", "Alice")  // filter by property

Step reference

.out(label?)

Move to outgoing adjacent vertices. Label narrows to a specific edge type.

.in(label?)

Move to incoming adjacent vertices.

.both(label?)

Move to adjacent vertices in either direction.

.outE() / .inE()

Move to incident edges instead of vertices.

.has(key, val)

Filter traversers where the property matches. Supports P.* predicates.

.hasNot(key)

Filter traversers where the property is absent.

.values(key…)

Extract property values as the new traverser stream.

.valueMap(key…)

Extract properties as a Map. Good for final projection.

.project(k, …)

Build a named result map from sub-traversals. Preferred over valueMap for complex projections.

.select(k, …)

Retrieve labelled steps previously tagged with .as().

.repeat(t).until(c)

Loop traversal t until condition c is met. Use .times(n) for fixed depth.

.path()

Emit the full traversal history (vertices and edges) as a Path object.

.group().by()

Aggregate traversers into a Map grouped by a key.

.order().by()

Sort traversers by a property. Order.desc for descending.

.limit(n)

Take the first n traversers. Always prefer to toList() unbounded.

.dedup()

Remove duplicate traversers from the stream.

Predicate reference (`P`)

Predicate	Meaning
P.eq(x)	Equal to `x`
P.neq(x)	Not equal
P.gt(x) / P.lt(x)	Greater / less than
P.gte(x) / P.lte(x)	Greater or equal / less or equal
P.between(lo, hi)	lo ≤ value < hi
P.within(x, y, …)	Value is one of the listed options
P.without(x, y, …)	Value is none of the listed options
TextP.containing(s)	String contains `s`
TextP.startingWith(s)	String starts with `s`

Core Concepts

Traversal Patterns

Common graph query patterns expressed in Gremlin, using the movie graph schema from the Quick Start.

Neighbourhood queries

Gremlin

// Direct neighbours
g.V().has("person", "name", "Alice").out("knows").values("name")

// N-hop expansion (BFS up to 3 hops)
g.V().has("person", "name", "Alice")
  .repeat(.out("knows").simplePath())
  .times(3)
  .dedup()
  .values("name")

Shortest path

Gremlin

g.V().has("person", "name", "Alice")
  .repeat(.bothE().otherV().simplePath())
  .until(.has("person", "name", "Bob"))
  .path()
  .limit(1)
  .next()

Aggregation and grouping

Gremlin

// Movies grouped by year, sorted descending
g.V().hasLabel("movie")
  .group()
  .by("year")
  .by(.values("title").fold())
  .order(Scope.local).by(Column.keys, Order.desc)
  .next()

// Actor with most credits
g.V().hasLabel("person")
  .project("name", "credits")
  .by("name")
  .by(.out("acted_in").count())
  .order().by("credits", Order.desc)
  .limit(10)
  .toList()

Filtering with `where`

Gremlin

// People who know someone older than them
g.V().hasLabel("person").as("a")
  .out("knows").as("b")
  .where("a", P.lt("b")).by("age")
  .select("a").values("name")
  .toList()

Languages

Java / Kotlin

The Java SDK is the reference implementation. It supports both Maven and Gradle, runs on JVM 11+, and integrates with Spring Boot via an autoconfiguration module.

Spring Boot autoconfiguration

pom.xml

<dependency>
  <groupId>io.crabgraph</groupId>
  <artifactId>crabgraph-spring-boot-starter</artifactId>
  <version>1.0.0</version>
</dependency>

application.yml

crabgraph:
  schema: classpath:schema.sql
  data-dir: /var/myapp/graph   # omit for in-memory
  port: 8182

Java

@Service
public class PersonService {

  private final GraphTraversalSource g;

  public PersonService(CrabGraph crab) {
    this.g = crab.traversal();
  }

  public List<String> friendNames(String name) {
    return g.V().has("person", "name", name)
             .out("knows")
             .values("name")
             .toList();
  }
}

Kotlin

val crab = CrabGraph.start()
val g = crab.traversal()

val friends: List<String> = g.V()
  .has("person", "name", "Alice")
  .out("knows")
  .values<String>("name")
  .toList()

Languages

Python

The Python package wraps the native CrabGraph binary and exposes a synchronous API that mirrors the TinkerPop Python driver. An asyncio variant is available via crabgraph[async].

Async usage

shell

pip install "crabgraph[async]"

Python

import asyncio
from crabgraph import CrabGraph

async def main():
    crab = await CrabGraph.start(schema="./schema.sql")
    g = crab.traversal()

    await g.addV("person").property("name", "Alice").next()

    friends = await g.V().has("person", "name", "Alice")
                        .out("knows")
                        .valueMap("name")
                        .toList()
    print(friends)

asyncio.run(main())

Using with Django / Flask

Create the CrabGraph instance once at application startup and reuse it across requests. The embedded server is thread-safe.

app.py

from crabgraph import CrabGraph
from flask import Flask, jsonify

crab = CrabGraph.start(schema="schema.sql", data_dir="/var/graph")
g    = crab.traversal()
app  = Flask(__name__)

@app.route("/people")
def people():
    names = g.V().hasLabel("person").values("name").toList()
    return jsonify(names)

Languages

Node.js

Full TypeScript types are included. The package ships with native binaries for Linux (x64, arm64), macOS (x64, arm64), and Windows (x64) via optional dependencies — no build step required.

TypeScript types

TypeScript

import { CrabGraph, GraphTraversalSource } from 'crabgraph';
import { P, Order } from 'gremlin';

let g: GraphTraversalSource;

export async function initGraph(): Promise<void> {
  const crab = await CrabGraph.start({
    schema: './schema.sql',
    dataDir: process.env.GRAPH_DIR,
  });
  g = crab.traversal();
}

export async function getFriends(name: string): Promise<string[]> {
  return g.V().has('person', 'name', name)
            .out('knows').values('name')
            .toList() as Promise<string[]>;
}

Using with Express

server.ts

import express from 'express';
import { CrabGraph } from 'crabgraph';

const app = express();

const crab = await CrabGraph.start({ schema: './schema.sql' });
const g    = crab.traversal();

app.get('/people', async (req, res) => {
  const people = await g.V().hasLabel('person').valueMap(true).toList();
  res.json(people);
});

app.listen(3000);

Languages

Go

The Go SDK uses CGo to link the embedded library. The API is idiomatic Go — struct options, error returns, context propagation. Requires CGo-enabled builds (CGO_ENABLED=1).

Go

package graph

import (
  "context"
  "fmt"
  crab "io.crabgraph/crabgraph-go"
)

func Example(ctx context.Context) error {
  db, err := crab.Start(ctx, crab.Options{
    SchemaPath: "./schema.sql",
    DataDir:    "/var/graph",    // omit for in-memory
  })
  if err != nil { return err }
  defer db.Close()

  g := db.Traversal()

  _, err = g.AddV("person").
    Property("name", "Alice").
    Property("age", 34).
    Next(ctx)
  if err != nil { return err }

  results, err := g.V().
    HasLabel("person").
    Values("name").
    ToList(ctx)

  fmt.Println(results)
  return err
}

Languages

Rust

The Rust crate is async-first, built on tokio. The schema can be embedded at compile time with include_str! for zero-runtime-dependency deployments, or loaded from a file path at startup.

Cargo features

Feature	Default	Description
tokio	✓	Async runtime (tokio 1.x)
serde	✓	Serialize/deserialize traversal results
persistent		Enable WAL-backed persistent storage
metrics		Expose Prometheus metrics on `:9090/metrics`

Cargo.toml

[dependencies]
crabgraph = { version = "1.0", features = ["persistent", "serde"] }

Schema embedded at compile time

Rust

use crabgraph::{CrabGraph, P, Order};
use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize)]
struct Person {
  name: String,
  age:  i32,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
  let crab = CrabGraph::builder()
    .schema_str(include_str!("../schema.sql"))
    .data_dir("/var/graph")
    .start()
    .await()?;

  let g = crab.traversal();

  // Typed result deserialization
  let people: Vec<Person> = g.v()
    .has_label("person")
    .has("age", P::gt(30))
    .value_map("name", "age")
    .into_vec::<Person>()
    .await()?;

  println!("{people:#?}");
  Ok(())
}

The Rust crate requires a C linker. On Linux, install gcc. On macOS, Xcode Command Line Tools are sufficient.

Cloud

CrabGraph Cloud

The embedded server is everything you need to ship. When you're ready to scale, CrabGraph Cloud is a fully managed hosted graph with zero code changes — same API, same schema format.

Skip the ops. Keep the Gremlin.

Swap your connection string and your embedded DB becomes a managed cluster — backups, replication, and monitoring included.

Cloud

Migrate from Embedded

No code changes required. Update your connection config to point at your cloud endpoint — the traversal API is identical.

Java

// Before — embedded
var crab = CrabGraph.start();

// After — cloud (same traversal API)
var crab = CrabGraph.cloud()
  .endpoint("wss://my-cluster.crabgraph.io/gremlin")
  .apiKey(System.getenv("CRAB_API_KEY"))
  .connect();

var g = crab.traversal(); // identical from here

Data export

Export your embedded graph to a portable format and import it into Cloud in one command via the SDK:

Java

// Export from embedded
var embedded = CrabGraph.start();
embedded.export(Path.of("export.graphson"));

// Import into Cloud
var cloud = CrabGraph.cloud().endpoint("wss://...").connect();
cloud.importFrom(Path.of("export.graphson"));

Reference

Configuration

All options can be passed via the builder API or, for Java, via application.yml when using the Spring Boot starter.

Option	Type	Default	Description
schema	String / Path	`schema.sql` (classpath)	Path or classpath resource to DDL schema file
dataDir	Path	none (in-memory)	Directory for persistent storage. Creates if absent.
port	int	`8182`	Gremlin WebSocket server port (loopback only)
queryTimeout	Duration	`30s`	Max time for a single traversal before cancellation
maxConnections	int	`16`	Max concurrent Gremlin connections
cacheSize	long (bytes)	`256 MB`	In-memory query result cache. Set to `0` to disable.
logLevel	String	`WARN`	`DEBUG`, `INFO`, `WARN`, `ERROR`
metricsPort	int	none	If set, exposes Prometheus metrics on this port

Java

CrabGraph.builder()
  .schemaResource("db/schema.sql")
  .dataDir(Path.of("/var/myapp/graph"))
  .port(8182)
  .queryTimeout(Duration.ofSeconds(60))
  .cacheSize(512 * 1024 * 1024)  // 512 MB
  .logLevel("INFO")
  .metricsPort(9090)
  .start();

Reference

API Reference

Core methods available on the CrabGraph instance across all language SDKs.

Method	Returns	Description
CrabGraph.start()	CrabGraph	Start with defaults. Schema loaded from classpath `schema.sql`.
CrabGraph.builder()	Builder	Fluent builder for all options before starting.
CrabGraph.cloud()	CloudBuilder	Connect to a CrabGraph Cloud endpoint instead of starting locally.
.traversal()	GraphTraversalSource	Returns the Gremlin traversal source `g`.
.export(path)	void	Export the entire graph to GraphSON 3.0 format.
.importFrom(path)	void	Import a GraphSON file, merging into existing data.
.schema()	Schema	Inspect the loaded schema — vertex labels, edge labels, properties.
.metrics()	GraphMetrics	Query counts, cache hit rate, active connections.
.close()	void	Gracefully shut down the embedded server and flush writes.

GraphTraversalSource (g)

The traversal source g is a standard TinkerPop GraphTraversalSource. All standard Gremlin steps are supported. See the TinkerPop docs for the complete step library.

Reference

Changelog

v1.0.0

Apr 2026

Major

Initial release

Embedded Gremlin-compatible graph server in a single dependency
SQL DDL schema definition — CREATE TABLE for vertices and edges, CREATE VIEW for derived traversals
Persistent (WAL-backed) and in-memory storage modes
SDKs for Java, Kotlin, Python, Node.js, Go, and Rust
Spring Boot autoconfiguration starter for Java
Full TinkerPop 3.7 step compatibility
Prometheus metrics endpoint (opt-in)

v0.9.0

Feb 2026

Beta

Public beta

Added CREATE VIEW support for derived edge types
JSONB column support — nested properties traversable with .has()
Query timeout and connection pool configuration
Improved error messages for schema parse failures

v0.7.0

Dec 2025

Beta

Private beta

Initial Java, Python, and Node.js SDKs
In-memory mode for testing
Core Gremlin step support: V, E, out, in, has, repeat, path, project

Quick Start

Installation

Gradle (Kotlin DSL)

Define a Schema

Vertices

Edges

Derived Views

Your First Query

Embedded Server

Lifecycle

Storage modes

Schema Reference

Type mapping

Edge detection rules

Indices

Gremlin Primer

Starting a traversal

Step reference

Predicate reference (P)

Traversal Patterns

Neighbourhood queries

Shortest path

Aggregation and grouping

Filtering with where

Java / Kotlin

Spring Boot autoconfiguration

Kotlin

Python

Async usage

Using with Django / Flask

Node.js

TypeScript types

Using with Express

Go

Rust

Cargo features

Schema embedded at compile time

CrabGraph Cloud

Skip the ops. Keep the Gremlin.

Migrate from Embedded

Data export

Configuration

API Reference

GraphTraversalSource (g)

Changelog

Initial release

Public beta

Private beta

Tweaks ×

Predicate reference (`P`)

Filtering with `where`

Tweaks