Skip to main content

Documentation Index

Fetch the complete documentation index at: https://simili.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Index Command

Bulk index GitHub issues and pull requests to the vector database for semantic search.

Syntax

simili index [OPTIONS]

Options

OptionShortTypeDescriptionDefault
--repo-rstringRepository: owner/nameRequired
--config-cfileConfiguration file.github/simili.yaml
--since-sstringIndex issues since (ISO date or issue number)-
--limitnumberMaximum issues to indexunlimited
--workers-wnumberParallel workers5
--include-prsboolInclude pull requeststrue
--tokenstringGitHub token (falls back to GITHUB_TOKEN)-
--dry-runboolSimulate without writingfalse
--help-hboolShow help message-

Examples

Index last 30 days

simili index --repo owner/repo --since 2026-01-01

Index since specific date

simili index --repo owner/repo --since 2024-01-01

Limit results

Index at most 200 issues:
simili index --repo owner/repo --limit 200

Parallel processing

Use more workers for faster indexing:
simili index --repo owner/repo --workers 10

Index everything (issues only)

simili index --repo owner/repo --since 2020-01-01 --include-prs=false

Index with PR collection

When qdrant.pr_collection is set in config, PRs are routed to a dedicated collection:
simili index --repo owner/repo --config .github/simili.yaml

Output

Fetching issues from owner/repo...
  Found 523 issues

Creating Qdrant collection...
  ✓ Collection created

Indexing with 5 workers...
  [====>----] 45% (235/523) - Speed: 12 issues/sec

Date format

ISO 8601 dates

  • 2024-01-01
  • 2024-01-01T12:00:00Z

Performance

Typical Speed:
  • With 1 worker: ~2-3 issues/second
  • With 5 workers: ~10-15 issues/second
  • With 10 workers: ~20-25 issues/second
Depends on:
  • Issue complexity and comment count
  • API response times
  • Network latency

Process

For each issue:
  1. Fetch issue details from GitHub
  2. Fetch all comments
  3. Combine title + body + comments
  4. Split into chunks (recursive character splitter)
  5. Generate embeddings for each chunk
  6. Upsert to Qdrant with metadata
For PRs (when --include-prs is enabled):
  1. Fetch PR details and changed file paths
  2. Embed: Title: ...\n\nBody: ...\n\nChanged Files:\n- path/a
  3. Route to pr_collection if configured, otherwise to main collection

Configuration

Requires a valid configuration with:
  • Qdrant connection details
  • Gemini or OpenAI API key
  • Target repository
See Configuration Overview.

Tips

  1. Start with a small batch
    simili index --repo owner/repo --since 2026-01-01 --limit 50
    
  2. Use multiple workers for large repos
    simili index --repo owner/repo --workers 10
    
  3. Run during off-hours to avoid API rate limits
  4. Set a PR collection to separate PR and issue search results

Troubleshooting

Rate limited

Error: 429 Too Many Requests
Reduce worker count:
simili index --repo owner/repo --workers 2

Out of memory

Reduce workers or use --limit to process in batches.

Collection already exists

Re-indexing is safe — upsert updates existing vectors.

Next steps

Process command

Process individual issues

PR duplicate command

Detect duplicate pull requests