Skip to content

Slow Indexing

  • canopy index takes more than 2-3 minutes for a typical codebase.
  • Progress appears to stall with no output for long periods.
  • canopy index --with-embeddings takes considerably longer than indexing alone.

For reference, the Pith monorepo (622 TypeScript/Rust source files) indexes in approximately 2.6 seconds for an incremental run. A full re-index from scratch takes under 10 seconds on modern hardware.


CauseIndicator
Large repository with many filescanopy status shows > 50,000 files
Binary and generated files being indexedUnusual file types in index
No .gitignore in the repositoryNo automatic exclusions
--full flag used unnecessarilyWipes and rebuilds entire index
Embedding generation active--with-embeddings flag, or slow Ollama
First-time embedding model downloadDownloading nomic-embed-text-v1.5 (~270 MB)

Step 1: Check how many files are being indexed

Section titled “Step 1: Check how many files are being indexed”
Terminal window
canopy status --repo .

Expected output includes a file count. If the count is unexpectedly high (tens of thousands), proceed to Step 2.

Step 2: Add ignored paths to your per-repo config

Section titled “Step 2: Add ignored paths to your per-repo config”

Create or edit <repo>/.canopy/config.toml:

ignored_paths = [
"dist/",
"build/",
".next/",
"coverage/",
"node_modules/",
"vendor/",
"generated/",
"__pycache__/",
"target/",
".cache/",
"*.min.js",
"*.bundle.js",
]

Canopy already respects .gitignore automatically. Use ignored_paths for directories that exist in your repo but are not in .gitignore.

After adding ignored paths, run a full re-index to apply them:

Terminal window
canopy index . --full --with-search

Step 3: Use incremental indexing for subsequent runs

Section titled “Step 3: Use incremental indexing for subsequent runs”

After the initial full index, subsequent runs should be incremental (the default):

Terminal window
canopy index .

Only files modified since the last index are re-processed. Reserve --full for when you suspect the index is stale or corrupt.

If --with-embeddings is slow, check your Ollama server:

Terminal window
curl http://localhost:11434/api/tags

If Ollama is not running, start it:

Terminal window
ollama serve

Embedding generation is optional. The AST index and keyword search work without embeddings. Only semantic (vector) search requires them. Consider indexing without embeddings first:

Terminal window
canopy index . --with-search

Then add embeddings as a separate step:

Terminal window
canopy index . --with-embeddings

If the first canopy index --with-embeddings is slow due to model download, run this once to pre-download and verify the model:

Terminal window
canopy warmup-model

Subsequent runs skip the download.

--with-git ingests the full commit history (up to 1000 commits by default). This adds time proportional to your commit history depth. Run it once after the initial index, not on every subsequent run:

Terminal window
# First setup
canopy index . --with-search --with-git
# Daily incremental updates (no need to re-ingest git history)
canopy index .

Contact support at [email protected] if:

  • Indexing a codebase of fewer than 10,000 source files takes more than 5 minutes with no embedded generation.
  • canopy status shows significantly fewer files than expected (the index may be silently skipping files).

Include the output of:

Terminal window
FORGE_LOG=debug canopy index . 2>forge-debug.log
canopy status --repo .

Attach forge-debug.log to your support request.