Skip to main content

PR Duplicate Command

Detect duplicate pull requests using semantic search across your issues and PR collections.

Syntax

simili pr-duplicate [OPTIONS]

Options

OptionShortTypeDescriptionDefault
--repostringRepository (owner/name)Required
--number-nnumberPR number to checkRequired
--top-knumberMaximum candidates to return5
--thresholdnumberMinimum similarity score (0.0-1.0)0.65
--tokenstringGitHub token (falls back to GITHUB_TOKEN)-
--config-cfilePath to configuration file.github/simili.yaml
--dry-runboolSkip Qdrant search (returns empty candidates)false
--help-hboolShow help message-

Examples

Check PR for duplicates

simili pr-duplicate --repo owner/repo --number 42

Adjust similarity threshold

simili pr-duplicate --repo owner/repo --number 42 --threshold 0.80

Return more candidates

simili pr-duplicate --repo owner/repo --number 42 --top-k 10

Dry-run mode

simili pr-duplicate --repo owner/repo --number 42 --dry-run

How it works

  1. Fetches PR details and changed file paths from GitHub
  2. Embeds the PR content: Title: ...\n\nBody: ...\n\nChanged Files:\n- path/a
  3. Searches both the collection (issues) and pr_collection (PRs) in Qdrant
  4. Deduplicates and sorts candidates by similarity score
  5. Optionally runs an LLM-based duplicate verdict on the top-3 issue candidates
  6. Returns a JSON result with candidates and duplicate assessment

Output

{
  "pr": {
    "repo": "owner/repo",
    "number": 42,
    "title": "Fix authentication timeout"
  },
  "candidates": [
    {
      "type": "issue",
      "number": 123,
      "title": "Login session expires unexpectedly",
      "score": 0.92,
      "url": "https://github.com/owner/repo/issues/123"
    },
    {
      "type": "pull_request",
      "number": 38,
      "title": "Fix session expiry in auth middleware",
      "score": 0.88,
      "url": "https://github.com/owner/repo/pull/38"
    }
  ],
  "duplicate_detected": true,
  "duplicate_of": 38,
  "confidence": 0.91,
  "reasoning": "PR #38 addresses the same auth timeout issue with an identical fix approach."
}

Configuration

To enable a dedicated PR collection, set qdrant.pr_collection in your config:
qdrant:
  url: "${QDRANT_URL}"
  api_key: "${QDRANT_API_KEY}"
  collection: "my-issues"
  pr_collection: "my-prs"   # PRs indexed here
If pr_collection is not set, PRs are stored alongside issues in the main collection.

Indexing PRs first

Before running pr-duplicate, make sure PRs are indexed:
simili index --repo owner/repo --include-prs=true

GitHub Actions integration

Run PR duplicate check on every PR:
name: Simili PR Duplicate Check

on:
  pull_request:
    types: [opened, edited, reopened]

jobs:
  pr-duplicate:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      contents: read
    steps:
      - uses: actions/checkout@v4

      - uses: similigh/simili-bot@v0.2.0
        with:
          command: "pr-duplicate"
          config_path: ".github/simili.yaml"
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          QDRANT_URL: ${{ secrets.QDRANT_URL }}
          QDRANT_API_KEY: ${{ secrets.QDRANT_API_KEY }}
          GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}

Next steps