Index Command
Bulk index GitHub issues and pull requests to the vector database for semantic search.Syntax
Options
| Option | Short | Type | Description | Default |
|---|---|---|---|---|
--repo | -r | string | Repository: owner/name | Required |
--config | -c | file | Configuration file | .github/simili.yaml |
--since | -s | string | Index issues since (ISO date or issue number) | - |
--limit | number | Maximum issues to index | unlimited | |
--workers | -w | number | Parallel workers | 5 |
--include-prs | bool | Include pull requests | true | |
--token | string | GitHub token (falls back to GITHUB_TOKEN) | - | |
--dry-run | bool | Simulate without writing | false | |
--help | -h | bool | Show help message | - |
Examples
Index last 30 days
Index since specific date
Limit results
Index at most 200 issues:Parallel processing
Use more workers for faster indexing:Index everything (issues only)
Index with PR collection
Whenqdrant.pr_collection is set in config, PRs are routed to a dedicated collection:
Output
Date format
ISO 8601 dates
2024-01-012024-01-01T12:00:00Z
Performance
Typical Speed:- With 1 worker: ~2-3 issues/second
- With 5 workers: ~10-15 issues/second
- With 10 workers: ~20-25 issues/second
- Issue complexity and comment count
- API response times
- Network latency
Process
For each issue:- Fetch issue details from GitHub
- Fetch all comments
- Combine title + body + comments
- Split into chunks (recursive character splitter)
- Generate embeddings for each chunk
- Upsert to Qdrant with metadata
--include-prs is enabled):
- Fetch PR details and changed file paths
- Embed:
Title: ...\n\nBody: ...\n\nChanged Files:\n- path/a - Route to
pr_collectionif configured, otherwise to main collection
Configuration
Requires a valid configuration with:- Qdrant connection details
- Gemini or OpenAI API key
- Target repository
Tips
-
Start with a small batch
-
Use multiple workers for large repos
- Run during off-hours to avoid API rate limits
- Set a PR collection to separate PR and issue search results
Troubleshooting
Rate limited
Out of memory
Reduce workers or use--limit to process in batches.
Collection already exists
Re-indexing is safe — upsert updates existing vectors.Next steps
Process command
Process individual issues
PR duplicate command
Detect duplicate pull requests

