CAMEODB
Core Documentation

CameoDB Engine

A high-performance, distributed, shared-nothing hybrid-search database built in Rust 2024 Edition.

Combines the reliability of ACID-compliant key-value storage (redb), flexible document modeling, and full-text search (Tantivy) in a multi-tenant, horizontally scalable architecture.

stars Key Features

sync

Multi-Tenant Architecture

Complete index isolation with dynamic scaling.

bolt

Atomic Batch Operations

High-throughput bulk processing with ACID guarantees.

search

Hybrid Storage

Combined KV store (redb) + full-text search (Tantivy).

schema

Schema Management

Dynamic schema evolution with type validation.

lan

Distributed Ready

Actor-based architecture with consistent hashing.

memory

Performance Optimized

Smart Commits, memory budgets, and adaptive batching.

manage_search

Query Language & Syntax

CameoDB features a powerful, strict-typed query parser supporting complex boolean logic, phrase matching, wildcards, and deep JSON traversal.

rocket_launch Quick Start References

For detailed, step-by-step instructions, visit our Interactive Quickstart Guide. Below are the raw commands for rapid setup.

Option 1: Docker Hub (Recommended)

dns 1. Start Server

# Create data directory with proper permissions
mkdir -p $(pwd)/data/cameodb

# Pull and run CameoDB from Docker Hub
docker run -d \
  --name cameodb-server \
  --user $(id -u):$(id -g) \
  -p 9480:9480 \
  -p 9580:9580 \
  -v $(pwd)/data/cameodb:/data/cameodb \
  -e RUST_LOG=error \
  goranc/cameodb:latest

terminal 2. Run Client

# Run interactive client
docker run --rm -it \
  --name cameodb-client \
  --network host \
  goranc/cameodb:latest \
  client --interactive

Option 2: Build from Source

# Clone and start CameoDB
cargo run --bin cameodb

# CameoDB starts on http://localhost:9480 by default

account_tree Distributed Architecture Overview

CameoDB is designed as a distributed, shared-nothing cluster:

  • Per-node storage is handled by the server crate with actors (NodeOrchestrator, MicroshardActor) on top of redb + Tantivy.
  • Routing & clustering use a ClusterCoordinator actor with a consistent hash ring and libp2p Kademlia DHT.
  • Remote execution is powered by Kameo remote actors over a custom libp2p swarm (TCP/QUIC/Noise/Yamux, no mDNS).
  • Scatter–gather search and multi-node writes are implemented via a RouterActor that fans out to peers and aggregates results.
  • Event-driven metadata - Cluster state transitions and persistence triggered purely by actor messages with no background polling or timeouts.
  • State reconciliation - On boot, nodes compare expected cluster topology from snapshots vs actual peer reports, logging discrepancies and converging to distributed reality.

Connection Pool & Cache Invalidation

The RemotePeerPool eliminates repeated swarm registry/DHT lookups on every remote operation:

                    ┌───────────────────────────────────┐
                             RemotePeerPool            
                      RwLock<HashMap<(Uuid, Channel),  
                             RemoteActorRef>>          
                    ├───────────────────────────────────┤
                      get_orchestrator(node, channel)  ──→ cache hit: clone ref
                      get_coordinator(node)            ──→ cache miss: lookup + cache
                      invalidate_peer(node)            ──→ evict all refs for node
                      invalidate_all()                 ──→ full cache clear
                    └───────────────────────────────────┘
                                    
                                     invalidate_peer()
                    ┌───────────────┴───────────────┐
                      ClusterCoordinator           
                      handle(PeerLost { node_id }) 
                    └───────────────────────────────┘
                                    
                                     swarm event
                              Peer disconnected

route Operation Routing Workflows

Every client request follows the same top-level path: HTTP handler → RouterActor → ClusterCoordinator routing decision → execute. The routing decision determines whether the operation runs locally, is forwarded to a single remote node (unicast), or is fanned out to all nodes (broadcast).

Routing Decision Logic

                         ┌──────────────────────┐
                           ClusterCoordinator  
                           RouteOperation msg  
                         └─────────┬────────────┘
                                    
                         routing_key present?
                           ┌────────┴────────┐
                          YES                NO
                                            
                    Hash ring lookup    RoutingDecision::
                                         Broadcast
                    owner == local?
                     ┌─────┴─────┐
                    YES          NO
                                
              RoutingDecision  RoutingDecision::Remote
                ::Local        { node_id, peer_addr }
  • Local: The owning shard lives on this node. Execute directly.
  • Remote: The owning shard lives on another node. Forward via cached RemoteActorRef.
  • Broadcast: No routing key (e.g. search). Fan out to local + all known peers, merge results.

Read (Search) Workflow

Searches have no routing key, so they always broadcast to gather results from all nodes.

HTTP POST /api/{index}/search
  
  
RouterActor::route_and_handle(routing_key=None)
  
   RoutingDecision::Broadcast
  
  ├── LOCAL ──→ Worker Pool (or actor mailbox fallback)
                 └── OrchestratorEngine::orch_search()
                       └── Fan out to all local MicroshardActors
                             └── spawn_blocking { store.search() }
  
  └── REMOTE (per peer, up to fanout_limit) ──→ try_remote()
        
        
      RemotePeerPool::get_orchestrator(node_id)    ◄── cache hit: O(1)
        ├── RwLock read → HashMap lookup           ◄── cache miss: swarm lookup, then cached
        
        
      remote_ref.ask(&ClientOp::Search)
        
        
      Remote node executes same local search path
        
        
  ┌────────────────────────────────────────────┐
    Merge: bounded score-aware top-K merge,   
    then truncate to the requested limit      
  └────────────────────────────────────────────┘

Bulk Write Workflow

Bulk writes are the most complex path: documents are routed individually, then grouped by owning node for batched forwarding.

HTTP POST /api/{index}/_bulk
  
  
RouterActor::route_and_handle(routing_hint=first_doc.id)
  
   Routed to one node (usually local for the first doc)
  
  
NodeOrchestrator::orch_bulk_write(index, docs[])
  
  ├── 1. Schema Resolution
       └── Fingerprint cache → shard fallback
  
  ├── 2. Staged Schema Validation
       └── Parallel Rayon validation + sequential evolution
  
  ├── 3. Per-Document Routing (spawn_blocking + Rayon par_iter)
       └── For each doc: hash(routing_key) → ConsistentRing → target shard
  
  ├── 4. Separate Local vs Remote
       ├── shard in self.shards → local_docs
       └── shard owned by other node → remote_docs (grouped by node_id)
  
  ├── 5. Phase 3.1: Parallel Local Shard Processing
       └── Per-shard MicroshardActor::write_batch()
             └── writer_thread → redb WAL + Tantivy index
  
  └── 6. Phase 3.2: Parallel Remote Forwarding (futures::join_all)
        
        for each (node_id, docs_for_remote):
          
          
        NodeOrchestrator::forward_bulk_to_remote()
          
          
        RemotePeerPool::get_orchestrator(node_id)    ◄── cached lookup
          
          
        remote_ref.ask(&ClientOp::BulkWrite)
          
          
        Remote node runs orch_bulk_write() (recursive, same path)

api HTTP API Reference

CameoDB provides a comprehensive REST API for document management, search, and system administration.

Search Operations

POST /api/{index}/search

Search documents within an index with relevance scoring. Returns a single JSON payload.

curl -s -X POST http://localhost:9480/api/books/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "science fiction space",
    "limit": 10
  }'
Return fields list: You can ask CameoDB to return only a subset of document fields by either:
  1. Supplying an explicit list in the payload: "fields": ["title", "author", "year"]
  2. Embedding a return clause at the end of the query: "query": "space opera return title,author"
POST /api/{index}/search/stream

Get search results as a real-time stream (NDJSON) for large result sets.

curl -s -X POST http://localhost:9480/api/books/search/stream \
  -H "Content-Type: application/json" \
  -d '{"query": "fantasy adventure"}' \
  --no-buffer

Document Operations

PUT /api/{index}/document

Insert or update a single document.

curl -s -X PUT http://localhost:9480/api/books/document \
  -H "Content-Type: application/json" \
  -d '{
    "id": "book_001",
    "routing_key": "book_001",
    "doc": {
      "title": "The Rust Programming Language",
      "author": "Steve Klabnik",
      "publication_year": 2018,
      "genres": ["Programming", "Technical"]
    }
  }'
POST /api/{index}/_bulk

Insert or update multiple documents in a single atomic operation.

curl -s -X POST http://localhost:9480/api/books/_bulk \
  -H "Content-Type: application/json" \
  -d '[
    {
      "id": "book_002",
      "doc": {
        "title": "Clean Code",
        "author": "Robert C. Martin"
      }
    }
  ]'
POST /api/{index}/document/stream

Insert or update multiple documents using NDJSON streaming for large datasets.

cat << 'EOF' | curl -s -X POST http://localhost:9480/api/books/document/stream \
  -H "Content-Type: application/json" \
  --data-binary @-
{"id": "book_002", "doc": {"title": "Clean Code", "author": "Robert C. Martin", "genres": ["Programming"]}}
{"id": "book_003", "doc": {"title": "Design Patterns", "author": "Gang of Four", "genres": ["Programming", "Software Engineering"]}}
EOF

Index Management & System

GET /api/{index}/_config

Retrieve current schema.

curl -s http://localhost:9480/api/books/_config
DELETE /api/{index}

Permanently delete an index.

curl -s -X DELETE http://localhost:9480/api/books
GET /_indexes

List all available indexes.

curl -s http://localhost:9480/_indexes
GET /_cluster/health

Cluster health check.

curl -s http://localhost:9480/_cluster/health

settings Configuration Options

CameoDB configuration via cameodb.toml mirrors the runtime struct layout:

[node]
label = "cameo-node-01"
zone = "default"

[network.http]
bind_address = "0.0.0.0"
port = 9480
request_timeout_secs = 30
max_body_size_mb = 200
cors_allowed_origins = ["*"]

[network.cluster]
enabled = true
bind_address = "0.0.0.0"
port = 9580
cluster_name = "cameodb-cluster"
seed_nodes = []
# cluster_nodes = ["/ip4/10.0.1.5/tcp/9580"] # Optional validation list

[storage]
data_paths = ["./data/cameodb"]
disk_usage_threshold_percent = 90
wal_sync = true
wal_segment_size_mb = 64
default_batch_size = 1000
num_shards_init = 4
max_shards_per_node = 8

[search]
indexer_memory_min_mb = 32
indexer_memory_max_mb = 512
total_memory_limit_mb = 4096
memory_pressure_threshold_percent = 80
search_threads = 8
enable_streaming_search = true
max_concurrent_shard_searches = 32
max_concurrent_remote_searches = 8
enable_early_termination = true
supervisor_timeout_secs = 5
default_search_limit = 10
  • node provides human-friendly identity fields (label, zone).
  • network separates HTTP and cluster transport while clarifying bind_address.
  • storage centralizes shard configuration plus disk thresholds.
  • search exposes indexer memory budgets, streaming search settings, concurrency caps, supervisor timeout, and default_search_limit.

deployed_code Docker Deployment

CameoDB provides configurations for both single-node and multi-node cluster deployments using Docker Compose.

1. Single-Node

Ideal for local development. Uses docker-compose.yml.

mkdir -p data/cameodb
docker-compose -f docker/docker-compose.yml up -d

Access: http://localhost:9480

2. Multi-Node Cluster

Runs a 3-node cluster with NGINX load balancer.

mkdir -p data/cameodb/node{1,2,3}
docker-compose -f docker/docker-compose-cluster.yml up -d

Load Balanced: http://localhost:9480

Direct: ports 9481, 9482, 9483

Docker Run vs Compose Equivalent

Docker Run Flag Docker Compose Equivalent
-p 9480:9480 -p 9580:9580 ports: ["9480:9480", "9580:9580"]
-v $(pwd)/data/cameodb:/data/cameodb volumes: ["../data/cameodb:/data/cameodb"]
-e RUST_LOG=info environment: ["RUST_LOG=info"]
--restart unless-stopped restart: unless-stopped
--user 65532:65532 user: "65532:65532"

inventory_2 RPM / DEB Package Building

CameoDB supports building RPM and DEB packages for x86_64 Linux distributions using cargo-zigbuild or Docker for cross-compilation.

Automated Build Script (Recommended for CI/CD)

This script handles both RPM and DEB package generation in one run with persistent caching.

chmod +x build-dist.sh
./build-dist.sh

Manual RPM Generation (cargo-zigbuild)

cargo install cargo-zigbuild cargo-generate-rpm

RUSTFLAGS="-C target-feature=+crt-static -C relocation-model=pie -C relro-level=full -C link-arg=-pie -C link-arg=-static" \
cargo zigbuild --release --target x86_64-unknown-linux-musl --no-default-features

cargo generate-rpm -p crates/server --target x86_64-unknown-linux-musl --auto-req disabled \
  -o target/x86_64-unknown-linux-musl/release/cameodb-0.2.2-1.x86_64.rpm \
  --set-metadata 'package.name="cameodb"'