Documentation Index
Fetch the complete documentation index at: https://simili.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
PR Duplicate Detection
Simili Bot v0.2.0 can detect duplicate pull requests by indexing PR content in a dedicated Qdrant collection and performing semantic search at PR open time.
How it works
- PR content is embedded: title + body + changed file paths
- Qdrant is searched for similar items in both the issues collection and the PR collection
- Candidates are ranked by similarity score
- Optionally, an LLM gives a duplicate verdict on the top candidates
- Results are returned as JSON (or posted as a comment, when run via GitHub Actions)
Setup
In .github/simili.yaml:
qdrant:
url: "${QDRANT_URL}"
api_key: "${QDRANT_API_KEY}"
collection: "my-issues"
pr_collection: "my-prs" # Dedicated PR collection
If pr_collection is omitted, PRs are stored alongside issues in the main collection.
2. Index existing PRs
Before the bot can find duplicates, your PRs must be indexed:
simili index --repo owner/repo --include-prs=true
Or via GitHub Actions (bulk index workflow).
3. Create a PR triage workflow
Create .github/workflows/simili-pr.yml:
name: Simili PR Triage
on:
pull_request:
types: [opened, edited, reopened]
jobs:
pr-triage:
runs-on: ubuntu-latest
permissions:
pull-requests: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: similigh/simili-bot@v0.2.0
with:
command: "pr-duplicate"
config_path: ".github/simili.yaml"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
QDRANT_URL: ${{ secrets.QDRANT_URL }}
QDRANT_API_KEY: ${{ secrets.QDRANT_API_KEY }}
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}
Running from the CLI
Check a specific PR for duplicates:
simili pr-duplicate --repo owner/repo --number 42
With custom threshold:
simili pr-duplicate --repo owner/repo --number 42 --threshold 0.80 --top-k 10
Example output
{
"pr": {
"repo": "owner/repo",
"number": 42,
"title": "Fix authentication timeout in middleware"
},
"candidates": [
{
"type": "pull_request",
"number": 38,
"title": "Fix session expiry in auth middleware",
"score": 0.93,
"url": "https://github.com/owner/repo/pull/38"
},
{
"type": "issue",
"number": 123,
"title": "Login session expires unexpectedly",
"score": 0.88,
"url": "https://github.com/owner/repo/issues/123"
}
],
"duplicate_detected": true,
"duplicate_of": 38,
"confidence": 0.91,
"reasoning": "PR #38 addresses the exact same authentication timeout issue with an overlapping fix."
}
What gets embedded
The following content is combined for the PR embedding:
Title: Fix authentication timeout in middleware
Body: This PR fixes the issue where sessions expire after 30 seconds...
Changed Files:
- internal/middleware/auth.go
- internal/middleware/session.go
- tests/middleware_test.go
Including changed file paths improves matching accuracy for code-level duplicate detection.
Tips
- Index PRs regularly — run
simili index --include-prs=true on a schedule to keep the PR collection fresh.
- Set a dedicated
pr_collection — this keeps PR and issue search results cleanly separated.
- Tune the threshold — for stricter duplicate detection, raise
--threshold to 0.80 or higher.
- Use LLM reasoning — the bot’s LLM verdict provides human-readable reasoning about why two PRs are considered duplicates.
Next steps
PR duplicate CLI reference
Full command options
Index command
Index issues and PRs
Configuration schema
Configure PR collection