Slow Indexing
Symptoms
Section titled “Symptoms”canopy indextakes more than 2-3 minutes for a typical codebase.- Progress appears to stall with no output for long periods.
canopy index --with-embeddingstakes considerably longer than indexing alone.
For reference, the Pith monorepo (622 TypeScript/Rust source files) indexes in approximately 2.6 seconds for an incremental run. A full re-index from scratch takes under 10 seconds on modern hardware.
Likely Causes
Section titled “Likely Causes”| Cause | Indicator |
|---|---|
| Large repository with many files | canopy status shows > 50,000 files |
| Binary and generated files being indexed | Unusual file types in index |
No .gitignore in the repository | No automatic exclusions |
--full flag used unnecessarily | Wipes and rebuilds entire index |
| Embedding generation active | --with-embeddings flag, or slow Ollama |
| First-time embedding model download | Downloading nomic-embed-text-v1.5 (~270 MB) |
Step-by-Step Fix
Section titled “Step-by-Step Fix”Step 1: Check how many files are being indexed
Section titled “Step 1: Check how many files are being indexed”canopy status --repo .Expected output includes a file count. If the count is unexpectedly high (tens of thousands), proceed to Step 2.
Step 2: Add ignored paths to your per-repo config
Section titled “Step 2: Add ignored paths to your per-repo config”Create or edit <repo>/.canopy/config.toml:
ignored_paths = [ "dist/", "build/", ".next/", "coverage/", "node_modules/", "vendor/", "generated/", "__pycache__/", "target/", ".cache/", "*.min.js", "*.bundle.js",]Canopy already respects .gitignore automatically. Use ignored_paths for directories that exist in your repo but are not in .gitignore.
After adding ignored paths, run a full re-index to apply them:
canopy index . --full --with-searchStep 3: Use incremental indexing for subsequent runs
Section titled “Step 3: Use incremental indexing for subsequent runs”After the initial full index, subsequent runs should be incremental (the default):
canopy index .Only files modified since the last index are re-processed. Reserve --full for when you suspect the index is stale or corrupt.
Step 4: Address slow embedding generation
Section titled “Step 4: Address slow embedding generation”If --with-embeddings is slow, check your Ollama server:
curl http://localhost:11434/api/tagsIf Ollama is not running, start it:
ollama serveEmbedding generation is optional. The AST index and keyword search work without embeddings. Only semantic (vector) search requires them. Consider indexing without embeddings first:
canopy index . --with-searchThen add embeddings as a separate step:
canopy index . --with-embeddingsStep 5: Pre-download the embedding model
Section titled “Step 5: Pre-download the embedding model”If the first canopy index --with-embeddings is slow due to model download, run this once to pre-download and verify the model:
canopy warmup-modelSubsequent runs skip the download.
Step 6: Use --with-git only when needed
Section titled “Step 6: Use --with-git only when needed”--with-git ingests the full commit history (up to 1000 commits by default). This adds time proportional to your commit history depth. Run it once after the initial index, not on every subsequent run:
# First setupcanopy index . --with-search --with-git
# Daily incremental updates (no need to re-ingest git history)canopy index .When to Escalate
Section titled “When to Escalate”Contact support at [email protected] if:
- Indexing a codebase of fewer than 10,000 source files takes more than 5 minutes with no embedded generation.
canopy statusshows significantly fewer files than expected (the index may be silently skipping files).
Include the output of:
FORGE_LOG=debug canopy index . 2>forge-debug.logcanopy status --repo .Attach forge-debug.log to your support request.