You've Got CameoDB Running. Now What?
If you followed the Quickstart, you already have CameoDB running and a books index loaded. That's your proof-of-concept. Now let's go from a handful of records to one million.
The dataset: SocialGrep/one-million-reddit-jokes on Hugging Face. A single CSV file with joke titles, body text, scores, subreddits, and timestamps. Perfect for demonstrating CameoDB's ability to detect schemas and ingest data directly from a URL.
Step 1: Detect the Schema
Point CameoDB's CLI at the raw CSV URL. The schema detect command reads the header row and samples data to infer field types automatically:
schema detect https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes/resolve/main/one-million-reddit-jokes.csv
CameoDB returns a full schema definition with 10 detected fields:
{
"routing_field_name": "id",
"fields": {
"id": { "field_type": "Text", "indexed": true, "stored": true, "tokenizer": "raw" },
"title": { "field_type": "Text", "indexed": true, "stored": false },
"selftext": { "field_type": "Text", "indexed": true, "stored": false },
"score": { "field_type": "I64", "indexed": true, "fast": true },
"created_utc": { "field_type": "I64", "indexed": true, "fast": true },
"subreddit": { "field_type": "Boolean", "indexed": true },
"type": { "field_type": "Text", "indexed": true },
"permalink": { "field_type": "Text", "indexed": true },
"domain": { "field_type": "Text", "indexed": true },
"url": { "field_type": "Text", "indexed": true }
}
}
Notice: score and created_utc are detected as I64 with fast fields enabled, meaning they support range queries and sorting. Text fields are fully indexed for search.
Step 2: Load One Million Records
Now the single command that does everything—creates the index, applies the schema, downloads the CSV, and streams all rows in batches:
data load jokes https://huggingface.co/datasets/SocialGrep/one-million-reddit-jokes/resolve/main/one-million-reddit-jokes.csv
Schema was missing; detected and applied schema to index 'jokes' Ingestion complete for index 'jokes': loaded=1000000 failed=0 (batch size 4000)
That's it. One million records, zero failures. CameoDB auto-detected the schema on first contact and streamed the data in batches of 4,000. The jokes index didn't exist before this command—it was created on the fly.
Step 3: Search Instantly
The index is immediately queryable. Let's find football jokes:
search jokes title:football limit 5 return title, selftext
{
"hits": [
{ "_score": 10.50, "title": "Football", "selftext": "[removed]" },
{ "_score": 10.41, "title": "Football", "selftext": "As a woman passed her daughter's closed bedroom door..." },
{ "_score": 9.91, "title": "Fart Football", "selftext": "An old married couple no sooner hit the pillows..." },
// ... 2 more results
],
"hits_returned": 5,
"total_hits": 1330,
"took_ms": 11,
"stats": { "shards": { "total": 8, "responded": 8, "failed": 0 } }
}
1,330 football jokes found across 8 shards in 11 milliseconds. The data was distributed automatically. No configuration, no manual sharding, no external tooling.
The Takeaway
Three commands. That's the entire workflow from discovering a dataset on Hugging Face to running full-text search queries against a million records. CameoDB handles schema inference, index creation, batch ingestion, and distributed search—all from the CLI.