Get Started

Quick Start

CrabGraph is an embedded graph database. Add it as a dependency, point it at a SQL schema file, and you're traversing graphs in minutes — no separate server process, no config files, no CLI to install.

1
Add the Maven dependency
CrabGraph ships as a single JAR. Add it to your pom.xml — it starts an embedded Gremlin server automatically on initialization.
pom.xml
<dependency>
  <groupId>io.crabgraph</groupId>
  <artifactId>crabgraph-embedded</artifactId>
  <version>1.0.0</version>
</dependency>
2
Define your schema
Create a schema.sql file on the classpath with standard CREATE TABLE statements for vertices and edges, and CREATE VIEW for derived traversals.
src/main/resources/schema.sql
-- Vertex: person
CREATE TABLE person (
  id    UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
  name  VARCHAR(255) NOT NULL,
  age   INT
);

-- Edge: person → person
CREATE TABLE knows (
  id      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  from_id UUID REFERENCES person(id),
  to_id   UUID REFERENCES person(id),
  since   INT
);

-- Derived view: friends-of-friends
CREATE VIEW friends_of_friends AS
  SELECT a.from_id, b.to_id
  FROM   knows a JOIN knows b ON a.to_id = b.from_id;
3
Connect and traverse
Call CrabGraph.start() — the schema is applied and the Gremlin traversal source is ready immediately.
Java
import io.crabgraph.CrabGraph;

var crab = CrabGraph.start(); // loads schema.sql from classpath
var g = crab.traversal();

g.addV("person").property("name", "Alice").property("age", 34).next();

List<Map<String, Object>> people = g.V()
  .hasLabel("person")
  .has("age", P.gt(30))
  .valueMap("name", "age")
  .toList();
1
Install via pip
The package bundles a platform-native binary. The embedded server starts in a subprocess on first connection.
shell
pip install crabgraph
2
Connect and traverse
Pass the path to your schema.sql. CrabGraph applies it on startup — vertices, edges, and views are all ready.
Python
from crabgraph import CrabGraph

crab = CrabGraph.start(schema="./schema.sql")
g = crab.traversal()

g.addV("person").property("name", "Alice").property("age", 34).next()

people = (g.V().hasLabel("person")
           .has("age", P.gt(30))
           .valueMap("name", "age")
           .toList())
1
Install via npm
Ships with a platform-specific native binary via optional dependencies. The embedded server starts on first .start().
shell
npm install crabgraph
TypeScript
import { CrabGraph } from 'crabgraph';
import { P } from 'gremlin';

const crab = await CrabGraph.start({ schema: './schema.sql' });
const g = crab.traversal();

await g.addV('person').property('name', 'Alice').next();

const people = await g.V()
  .hasLabel('person')
  .has('age', P.gt(30))
  .valueMap('name', 'age')
  .toList();
1
Add the module
The Go module uses CGo bindings to the embedded CrabGraph library. The server runs in-process.
shell
go get io.crabgraph/crabgraph-go@v1.0.0
Go
package main

import (
  "context"
  crab "io.crabgraph/crabgraph-go"
)

func main() {
  ctx := context.Background()
  db, _ := crab.Start(ctx, crab.Options{Schema: "./schema.sql"})
  defer db.Close()

  g := db.Traversal()
  g.AddV("person").Property("name", "Alice").Next(ctx)
}
1
Add the crate
The Rust crate links against the CrabGraph native library via build.rs. The embedded server starts when you call CrabGraph::start().
Cargo.toml
[dependencies]
crabgraph = "1.0"
2
Connect and traverse
CrabGraph exposes an async Rust API built on tokio. Schema path is resolved relative to the crate root at compile time via include_str!, or passed at runtime.
Rust
use crabgraph::{CrabGraph, P};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
  // Schema embedded at compile time
  let schema = include_str!("../schema.sql");
  let crab = CrabGraph::start().schema_str(schema).await()?;
  let g = crab.traversal();

  g.add_v("person")
    .property("name", "Alice")
    .property("age", 34_i32)
    .next().await()?;

  let people = g.v()
    .has_label("person")
    .has("age", P::gt(30))
    .value_map("name", "age")
    .to_list().await()?;

  println!("{people:?}");
  Ok(())
}

Zero config. CrabGraph discovers schema.sql on the classpath (Java) or at the path you pass. No XML, no YAML, no environment variables needed to get started.

Setup

Installation

CrabGraph is distributed exclusively through language-native package managers. There is no standalone binary or CLI to install.

LanguagePackage ManagerPackage
Java / KotlinMaven, Gradleio.crabgraph:crabgraph-embedded:1.0.0
Python ≥ 3.9pip, Poetry, uvcrabgraph==1.0.0
Node.js ≥ 18npm, yarn, pnpmcrabgraph@1.0.0
Go ≥ 1.21go getio.crabgraph/crabgraph-go v1.0.0
Rust ≥ 1.75Cargocrabgraph = "1.0"

Gradle (Kotlin DSL)

build.gradle.kts
dependencies {
  implementation("io.crabgraph:crabgraph-embedded:1.0.0")
}
Core Concepts

Define a Schema

CrabGraph uses standard SQL DDL in a single schema.sql file. No proprietary schema language to learn. Tables become graph labels; views become computed traversals.

Vertices

Any table with a UUID PRIMARY KEY is a vertex label. Each row is a vertex; columns are its properties.

schema.sql
CREATE TABLE movie (
  id      UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
  title   VARCHAR(500) NOT NULL,
  year    INT,
  rating  FLOAT
);

Edges

A table with from_id and to_id foreign-key columns becomes an edge label. CrabGraph detects this convention automatically — no annotation needed. Any other columns are edge properties.

schema.sql
CREATE TABLE acted_in (
  id       UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  from_id  UUID REFERENCES person(id) ON DELETE CASCADE,
  to_id    UUID REFERENCES movie(id)  ON DELETE CASCADE,
  role     VARCHAR(255)  -- edge property
);

Derived Views

Use CREATE VIEW … AS SELECT to define computed edge types. Views follow the same from_id / to_id convention and are traversable like any other edge label.

schema.sql
-- People who appeared in the same movie
CREATE VIEW co_star AS
  SELECT
    a.from_id               AS from_id,
    b.from_id               AS to_id,
    m.title                 AS movie_title
  FROM  acted_in a
  JOIN  acted_in b ON a.to_id = b.to_id AND a.from_id <> b.from_id
  JOIN  movie m    ON m.id = a.to_id;

Views are read-only. You cannot addV or addE on a label backed by a CREATE VIEW. Writes must go through the underlying tables.

Core Concepts

Your First Query

CrabGraph supports the full Gremlin traversal language via standard Apache TinkerPop drivers. If you've used Gremlin before, everything works identically — CrabGraph just maps it to your SQL-defined schema.

Gremlin
// Movies an actor appeared in
g.V().has("person", "name", "Tom Hanks")
  .out("acted_in")
  .values("title")
  .toList()

// Co-stars via the derived view
g.V().has("person", "name", "Tom Hanks")
  .out("co_star")
  .values("name")
  .dedup()
  .toList()

// Shortest path between two people
g.V().has("person", "name", "Alice")
  .repeat(.bothE().otherV().simplePath())
  .until(.has("name", "Bob"))
  .path()
  .limit(1)
  .next()

New to Gremlin? See the Gremlin Primer for the most useful steps, or the TinkerPop reference for the full spec.

Core Concepts

Embedded Server

When you call CrabGraph.start(), a Gremlin-compatible WebSocket server boots inside your process on 127.0.0.1:8182. Nothing to install separately — the server and storage engine are bundled in the dependency.

Lifecycle

Java
// Minimal start — schema.sql loaded from classpath root
var crab = CrabGraph.start();

// Builder — full options
var crab = CrabGraph.builder()
  .port(9182)
  .dataDir(Path.of("/var/myapp/graph"))
  .schemaResource("db/schema.sql")
  .start();

// Graceful shutdown
crab.close();

Storage modes

ModeHow to enableUse for
In-memoryDefault (no dataDir)Tests, development — clean slate every start
PersistentSet dataDirProduction — WAL-backed, survives restarts

Testing tip. Omit dataDir in tests — in-memory mode gives each test a clean graph with no teardown required.

Core Concepts

Schema Reference

CrabGraph interprets your DDL according to a small set of conventions. No annotations, no special comments — just column names and foreign key relationships.

Type mapping

SQL typeGremlin property typeNotes
UUIDStringAuto-generated when column has DEFAULT gen_random_uuid()
VARCHAR, TEXTString
INT, BIGINT, SMALLINTLong
FLOAT, DOUBLE, NUMERICDouble
BOOLEANBoolean
TIMESTAMP, DATEDate (ISO-8601 string in Gremlin)
JSONBMap<String, Object>Traversable as nested properties

Edge detection rules

CrabGraph marks a table as an edge type when all three conditions hold:

  • The table has a column named from_id with a foreign key to another table's primary key.
  • The table has a column named to_id with a foreign key (to the same or different table).
  • The from_id and to_id columns reference UUID PRIMARY KEY columns.

All other columns on an edge table are treated as edge properties and accessible via .values() and .valueMap() in traversals.

Indices

Standard SQL indices are respected by CrabGraph's query planner. Add a CREATE INDEX on any property column you filter on frequently:

schema.sql
CREATE INDEX idx_person_name ON person(name);
CREATE INDEX idx_person_age  ON person(age);
Core Concepts

Gremlin Primer

Gremlin is a functional, data-flow traversal language. A traversal starts at a set of elements and threads through a pipeline of steps. Each step transforms the current traversers.

Starting a traversal

Gremlin
g.V()                          // all vertices
g.V("some-uuid")              // vertex by ID
g.E()                          // all edges
g.V().hasLabel("person")      // vertices of a specific type
g.V().has("person", "name", "Alice")  // filter by property

Step reference

.out(label?)
Move to outgoing adjacent vertices. Label narrows to a specific edge type.
.in(label?)
Move to incoming adjacent vertices.
.both(label?)
Move to adjacent vertices in either direction.
.outE() / .inE()
Move to incident edges instead of vertices.
.has(key, val)
Filter traversers where the property matches. Supports P.* predicates.
.hasNot(key)
Filter traversers where the property is absent.
.values(key…)
Extract property values as the new traverser stream.
.valueMap(key…)
Extract properties as a Map. Good for final projection.
.project(k, …)
Build a named result map from sub-traversals. Preferred over valueMap for complex projections.
.select(k, …)
Retrieve labelled steps previously tagged with .as().
.repeat(t).until(c)
Loop traversal t until condition c is met. Use .times(n) for fixed depth.
.path()
Emit the full traversal history (vertices and edges) as a Path object.
.group().by()
Aggregate traversers into a Map grouped by a key.
.order().by()
Sort traversers by a property. Order.desc for descending.
.limit(n)
Take the first n traversers. Always prefer to toList() unbounded.
.dedup()
Remove duplicate traversers from the stream.

Predicate reference (P)

PredicateMeaning
P.eq(x)Equal to x
P.neq(x)Not equal
P.gt(x) / P.lt(x)Greater / less than
P.gte(x) / P.lte(x)Greater or equal / less or equal
P.between(lo, hi)lo ≤ value < hi
P.within(x, y, …)Value is one of the listed options
P.without(x, y, …)Value is none of the listed options
TextP.containing(s)String contains s
TextP.startingWith(s)String starts with s
Core Concepts

Traversal Patterns

Common graph query patterns expressed in Gremlin, using the movie graph schema from the Quick Start.

Neighbourhood queries

Gremlin
// Direct neighbours
g.V().has("person", "name", "Alice").out("knows").values("name")

// N-hop expansion (BFS up to 3 hops)
g.V().has("person", "name", "Alice")
  .repeat(.out("knows").simplePath())
  .times(3)
  .dedup()
  .values("name")

Shortest path

Gremlin
g.V().has("person", "name", "Alice")
  .repeat(.bothE().otherV().simplePath())
  .until(.has("person", "name", "Bob"))
  .path()
  .limit(1)
  .next()

Aggregation and grouping

Gremlin
// Movies grouped by year, sorted descending
g.V().hasLabel("movie")
  .group()
  .by("year")
  .by(.values("title").fold())
  .order(Scope.local).by(Column.keys, Order.desc)
  .next()

// Actor with most credits
g.V().hasLabel("person")
  .project("name", "credits")
  .by("name")
  .by(.out("acted_in").count())
  .order().by("credits", Order.desc)
  .limit(10)
  .toList()

Filtering with where

Gremlin
// People who know someone older than them
g.V().hasLabel("person").as("a")
  .out("knows").as("b")
  .where("a", P.lt("b")).by("age")
  .select("a").values("name")
  .toList()
Languages

Java / Kotlin

The Java SDK is the reference implementation. It supports both Maven and Gradle, runs on JVM 11+, and integrates with Spring Boot via an autoconfiguration module.

Spring Boot autoconfiguration

pom.xml
<dependency>
  <groupId>io.crabgraph</groupId>
  <artifactId>crabgraph-spring-boot-starter</artifactId>
  <version>1.0.0</version>
</dependency>
application.yml
crabgraph:
  schema: classpath:schema.sql
  data-dir: /var/myapp/graph   # omit for in-memory
  port: 8182
Java
@Service
public class PersonService {

  private final GraphTraversalSource g;

  public PersonService(CrabGraph crab) {
    this.g = crab.traversal();
  }

  public List<String> friendNames(String name) {
    return g.V().has("person", "name", name)
             .out("knows")
             .values("name")
             .toList();
  }
}

Kotlin

Kotlin
val crab = CrabGraph.start()
val g = crab.traversal()

val friends: List<String> = g.V()
  .has("person", "name", "Alice")
  .out("knows")
  .values<String>("name")
  .toList()
Languages

Python

The Python package wraps the native CrabGraph binary and exposes a synchronous API that mirrors the TinkerPop Python driver. An asyncio variant is available via crabgraph[async].

Async usage

shell
pip install "crabgraph[async]"
Python
import asyncio
from crabgraph import CrabGraph

async def main():
    crab = await CrabGraph.start(schema="./schema.sql")
    g = crab.traversal()

    await g.addV("person").property("name", "Alice").next()

    friends = await g.V().has("person", "name", "Alice")
                        .out("knows")
                        .valueMap("name")
                        .toList()
    print(friends)

asyncio.run(main())

Using with Django / Flask

Create the CrabGraph instance once at application startup and reuse it across requests. The embedded server is thread-safe.

app.py
from crabgraph import CrabGraph
from flask import Flask, jsonify

crab = CrabGraph.start(schema="schema.sql", data_dir="/var/graph")
g    = crab.traversal()
app  = Flask(__name__)

@app.route("/people")
def people():
    names = g.V().hasLabel("person").values("name").toList()
    return jsonify(names)
Languages

Node.js

Full TypeScript types are included. The package ships with native binaries for Linux (x64, arm64), macOS (x64, arm64), and Windows (x64) via optional dependencies — no build step required.

TypeScript types

TypeScript
import { CrabGraph, GraphTraversalSource } from 'crabgraph';
import { P, Order } from 'gremlin';

let g: GraphTraversalSource;

export async function initGraph(): Promise<void> {
  const crab = await CrabGraph.start({
    schema: './schema.sql',
    dataDir: process.env.GRAPH_DIR,
  });
  g = crab.traversal();
}

export async function getFriends(name: string): Promise<string[]> {
  return g.V().has('person', 'name', name)
            .out('knows').values('name')
            .toList() as Promise<string[]>;
}

Using with Express

server.ts
import express from 'express';
import { CrabGraph } from 'crabgraph';

const app = express();

const crab = await CrabGraph.start({ schema: './schema.sql' });
const g    = crab.traversal();

app.get('/people', async (req, res) => {
  const people = await g.V().hasLabel('person').valueMap(true).toList();
  res.json(people);
});

app.listen(3000);
Languages

Go

The Go SDK uses CGo to link the embedded library. The API is idiomatic Go — struct options, error returns, context propagation. Requires CGo-enabled builds (CGO_ENABLED=1).

Go
package graph

import (
  "context"
  "fmt"
  crab "io.crabgraph/crabgraph-go"
)

func Example(ctx context.Context) error {
  db, err := crab.Start(ctx, crab.Options{
    SchemaPath: "./schema.sql",
    DataDir:    "/var/graph",    // omit for in-memory
  })
  if err != nil { return err }
  defer db.Close()

  g := db.Traversal()

  _, err = g.AddV("person").
    Property("name", "Alice").
    Property("age", 34).
    Next(ctx)
  if err != nil { return err }

  results, err := g.V().
    HasLabel("person").
    Values("name").
    ToList(ctx)

  fmt.Println(results)
  return err
}
Languages

Rust

The Rust crate is async-first, built on tokio. The schema can be embedded at compile time with include_str! for zero-runtime-dependency deployments, or loaded from a file path at startup.

Cargo features

FeatureDefaultDescription
tokioAsync runtime (tokio 1.x)
serdeSerialize/deserialize traversal results
persistentEnable WAL-backed persistent storage
metricsExpose Prometheus metrics on :9090/metrics
Cargo.toml
[dependencies]
crabgraph = { version = "1.0", features = ["persistent", "serde"] }

Schema embedded at compile time

Rust
use crabgraph::{CrabGraph, P, Order};
use serde::{Deserialize, Serialize};

#[derive(Debug, Serialize, Deserialize)]
struct Person {
  name: String,
  age:  i32,
}

#[tokio::main]
async fn main() -> anyhow::Result<()> {
  let crab = CrabGraph::builder()
    .schema_str(include_str!("../schema.sql"))
    .data_dir("/var/graph")
    .start()
    .await()?;

  let g = crab.traversal();

  // Typed result deserialization
  let people: Vec<Person> = g.v()
    .has_label("person")
    .has("age", P::gt(30))
    .value_map("name", "age")
    .into_vec::<Person>()
    .await()?;

  println!("{people:#?}");
  Ok(())
}

The Rust crate requires a C linker. On Linux, install gcc. On macOS, Xcode Command Line Tools are sufficient.

Cloud

CrabGraph Cloud

The embedded server is everything you need to ship. When you're ready to scale, CrabGraph Cloud is a fully managed hosted graph with zero code changes — same API, same schema format.

Skip the ops. Keep the Gremlin.

Swap your connection string and your embedded DB becomes a managed cluster — backups, replication, and monitoring included.

Cloud

Migrate from Embedded

No code changes required. Update your connection config to point at your cloud endpoint — the traversal API is identical.

Java
// Before — embedded
var crab = CrabGraph.start();

// After — cloud (same traversal API)
var crab = CrabGraph.cloud()
  .endpoint("wss://my-cluster.crabgraph.io/gremlin")
  .apiKey(System.getenv("CRAB_API_KEY"))
  .connect();

var g = crab.traversal(); // identical from here

Data export

Export your embedded graph to a portable format and import it into Cloud in one command via the SDK:

Java
// Export from embedded
var embedded = CrabGraph.start();
embedded.export(Path.of("export.graphson"));

// Import into Cloud
var cloud = CrabGraph.cloud().endpoint("wss://...").connect();
cloud.importFrom(Path.of("export.graphson"));
Reference

Configuration

All options can be passed via the builder API or, for Java, via application.yml when using the Spring Boot starter.

OptionTypeDefaultDescription
schemaString / Pathschema.sql (classpath)Path or classpath resource to DDL schema file
dataDirPathnone (in-memory)Directory for persistent storage. Creates if absent.
portint8182Gremlin WebSocket server port (loopback only)
queryTimeoutDuration30sMax time for a single traversal before cancellation
maxConnectionsint16Max concurrent Gremlin connections
cacheSizelong (bytes)256 MBIn-memory query result cache. Set to 0 to disable.
logLevelStringWARNDEBUG, INFO, WARN, ERROR
metricsPortintnoneIf set, exposes Prometheus metrics on this port
Java
CrabGraph.builder()
  .schemaResource("db/schema.sql")
  .dataDir(Path.of("/var/myapp/graph"))
  .port(8182)
  .queryTimeout(Duration.ofSeconds(60))
  .cacheSize(512 * 1024 * 1024)  // 512 MB
  .logLevel("INFO")
  .metricsPort(9090)
  .start();
Reference

API Reference

Core methods available on the CrabGraph instance across all language SDKs.

MethodReturnsDescription
CrabGraph.start()CrabGraphStart with defaults. Schema loaded from classpath schema.sql.
CrabGraph.builder()BuilderFluent builder for all options before starting.
CrabGraph.cloud()CloudBuilderConnect to a CrabGraph Cloud endpoint instead of starting locally.
.traversal()GraphTraversalSourceReturns the Gremlin traversal source g.
.export(path)voidExport the entire graph to GraphSON 3.0 format.
.importFrom(path)voidImport a GraphSON file, merging into existing data.
.schema()SchemaInspect the loaded schema — vertex labels, edge labels, properties.
.metrics()GraphMetricsQuery counts, cache hit rate, active connections.
.close()voidGracefully shut down the embedded server and flush writes.

GraphTraversalSource (g)

The traversal source g is a standard TinkerPop GraphTraversalSource. All standard Gremlin steps are supported. See the TinkerPop docs for the complete step library.

Reference

Changelog

v1.0.0
Apr 2026
Major

Initial release

  • Embedded Gremlin-compatible graph server in a single dependency
  • SQL DDL schema definition — CREATE TABLE for vertices and edges, CREATE VIEW for derived traversals
  • Persistent (WAL-backed) and in-memory storage modes
  • SDKs for Java, Kotlin, Python, Node.js, Go, and Rust
  • Spring Boot autoconfiguration starter for Java
  • Full TinkerPop 3.7 step compatibility
  • Prometheus metrics endpoint (opt-in)
v0.9.0
Feb 2026
Beta

Public beta

  • Added CREATE VIEW support for derived edge types
  • JSONB column support — nested properties traversable with .has()
  • Query timeout and connection pool configuration
  • Improved error messages for schema parse failures
v0.7.0
Dec 2025
Beta

Private beta

  • Initial Java, Python, and Node.js SDKs
  • In-memory mode for testing
  • Core Gremlin step support: V, E, out, in, has, repeat, path, project

Tweaks

Default Language
Compact sidebar