PR Duplicate Detection
Simili Bot v0.2.0 can detect duplicate pull requests by indexing PR content in a dedicated Qdrant collection and performing semantic search at PR open time.How it works
- PR content is embedded: title + body + changed file paths
- Qdrant is searched for similar items in both the issues collection and the PR collection
- Candidates are ranked by similarity score
- Optionally, an LLM gives a duplicate verdict on the top candidates
- Results are returned as JSON (or posted as a comment, when run via GitHub Actions)
Setup
1. Configure a PR collection
In.github/simili.yaml:
pr_collection is omitted, PRs are stored alongside issues in the main collection.
2. Index existing PRs
Before the bot can find duplicates, your PRs must be indexed:3. Create a PR triage workflow
Create.github/workflows/simili-pr.yml:
Running from the CLI
Check a specific PR for duplicates:Example output
What gets embedded
The following content is combined for the PR embedding:Tips
- Index PRs regularly — run
simili index --include-prs=trueon a schedule to keep the PR collection fresh. - Set a dedicated
pr_collection— this keeps PR and issue search results cleanly separated. - Tune the threshold — for stricter duplicate detection, raise
--thresholdto 0.80 or higher. - Use LLM reasoning — the bot’s LLM verdict provides human-readable reasoning about why two PRs are considered duplicates.

