Skip to main content

Index Command

Bulk index GitHub issues and pull requests to the vector database for semantic search.

Syntax

simili index [OPTIONS]

Options

OptionShortTypeDescriptionDefault
--repo-rstringRepository: owner/nameRequired
--config-cfileConfiguration file.github/simili.yaml
--since-sstringIndex issues since (ISO date or issue number)-
--limitnumberMaximum issues to indexunlimited
--workers-wnumberParallel workers5
--include-prsboolInclude pull requeststrue
--tokenstringGitHub token (falls back to GITHUB_TOKEN)-
--dry-runboolSimulate without writingfalse
--help-hboolShow help message-

Examples

Index last 30 days

simili index --repo owner/repo --since 2026-01-01

Index since specific date

simili index --repo owner/repo --since 2024-01-01

Limit results

Index at most 200 issues:
simili index --repo owner/repo --limit 200

Parallel processing

Use more workers for faster indexing:
simili index --repo owner/repo --workers 10

Index everything (issues only)

simili index --repo owner/repo --since 2020-01-01 --include-prs=false

Index with PR collection

When qdrant.pr_collection is set in config, PRs are routed to a dedicated collection:
simili index --repo owner/repo --config .github/simili.yaml

Output

Fetching issues from owner/repo...
  Found 523 issues

Creating Qdrant collection...
  ✓ Collection created

Indexing with 5 workers...
  [====>----] 45% (235/523) - Speed: 12 issues/sec

Date format

ISO 8601 dates

  • 2024-01-01
  • 2024-01-01T12:00:00Z

Performance

Typical Speed:
  • With 1 worker: ~2-3 issues/second
  • With 5 workers: ~10-15 issues/second
  • With 10 workers: ~20-25 issues/second
Depends on:
  • Issue complexity and comment count
  • API response times
  • Network latency

Process

For each issue:
  1. Fetch issue details from GitHub
  2. Fetch all comments
  3. Combine title + body + comments
  4. Split into chunks (recursive character splitter)
  5. Generate embeddings for each chunk
  6. Upsert to Qdrant with metadata
For PRs (when --include-prs is enabled):
  1. Fetch PR details and changed file paths
  2. Embed: Title: ...\n\nBody: ...\n\nChanged Files:\n- path/a
  3. Route to pr_collection if configured, otherwise to main collection

Configuration

Requires a valid configuration with:
  • Qdrant connection details
  • Gemini or OpenAI API key
  • Target repository
See Configuration Overview.

Tips

  1. Start with a small batch
    simili index --repo owner/repo --since 2026-01-01 --limit 50
    
  2. Use multiple workers for large repos
    simili index --repo owner/repo --workers 10
    
  3. Run during off-hours to avoid API rate limits
  4. Set a PR collection to separate PR and issue search results

Troubleshooting

Rate limited

Error: 429 Too Many Requests
Reduce worker count:
simili index --repo owner/repo --workers 2

Out of memory

Reduce workers or use --limit to process in batches.

Collection already exists

Re-indexing is safe — upsert updates existing vectors.

Next steps

Process command

Process individual issues

PR duplicate command

Detect duplicate pull requests