Index Command
Bulk index GitHub issues to the vector database for semantic search.Syntax
Options
| Option | Short | Type | Description | Default |
|---|---|---|---|---|
--repo | -r | string | Repository: owner/name | Required |
--config | -c | file | Configuration file | simili.yaml |
--since | -s | string | Index issues since (duration or ISO date) | 30d |
--workers | -w | number | Parallel workers | 5 |
--help | -h | bool | Show help message | - |
Examples
Index Last 30 Days
Index Since Specific Date
Parallel Processing
Use more workers for faster indexing:Index Everything
Output
Time Format
Duration Strings
1d- 1 day ago7d- 7 days ago30d- 30 days ago1w- 1 week ago1m- 1 month ago
ISO 8601 Dates
2024-01-012024-01-01T12:00:00Z
Performance
Typical Speed:- With 1 worker: ~2-3 issues/second
- With 5 workers: ~10-15 issues/second
- With 10 workers: ~20-25 issues/second
- Issue complexity
- Comment count
- API response times
- Network latency
Process
For each issue:- Fetch issue details
- Fetch all comments
- Combine title + body + comments
- Split into chunks (recursive character splitter)
- Generate embeddings for each chunk
- Upsert to Qdrant with metadata
Configuration
Needs valid configuration with:- Qdrant connection details
- Gemini API key
- Target repository
Tips
-
Start with small batch
-
Use multiple workers for large repos
- Run during off-hours to avoid rate limits
- Check progress - output updates regularly