Skip to main content

Pipeline Steps

Simili Bot’s modular pipeline consists of 13 composable steps.

Step overview

#StepPurposeSpeedDependencies
1gatekeeperCheck repo enabled<1msConfig
2command_handlerParse commands<10msGitHub
3vectordb_prepEnsure collection10-100msQdrant
4similarity_searchFind related500ms-1sQdrant, Embedder
5transfer_checkRule-based route<100msTransfer
6llm_routerAI routing2-5sLLM
7duplicate_detectorIdentify dupes2-5sLLM
8quality_checkerScore issue1-3sLLM
9triageSuggest labels1-2sLLM
10response_builderBuild comment<100msResults
11action_executorPost to GitHub100-500msGitHub API
12indexerAdd to vector DB500ms-1sQdrant, Embedder
13pending_actionsSchedule ops<10msState

Detailed steps## Detailed steps

1. gatekeeper

Checks if repository is enabled in configuration. Input: Issue, Config Output: Skip if disabled Speed: <1ms Dependencies: None

2. command_handler

Processes @simili-bot commands in issue comments. Input: Comments Output: Commands parsed Speed: <10ms Dependencies: GitHub API

3. vectordb_prep

Ensures Qdrant collection exists, creates if needed. Input: Config Output: Collection ready Speed: 10-100ms Dependencies: Qdrant Finds related issues using semantic search. Input: Issue embedding Output: Related issues list Speed: 500ms-1s Dependencies: Qdrant, Embedder

5. transfer_check

Evaluates rule-based routing against configured rules. Input: Issue metadata Output: Target repository (or empty) Speed: <100ms Dependencies: Transfer matcher

6. llm_router

Uses AI to determine correct repository based on content. Input: Issue + repo descriptions Output: Routing recommendation Speed: 2-5s Dependencies: LLM (Gemini)

7. duplicate_detector

Analyzes similar issues to identify duplicates. Input: Similar issues, current issue Output: Duplicate info + confidence score Speed: 2-5s Dependencies: LLM

8. quality_checker

Assesses issue description quality. Input: Issue content Output: Quality score + improvement suggestions Speed: 1-3s Dependencies: LLM

9. triage

Suggests appropriate labels based on content. Input: Issue + existing labels Output: Suggested labels with confidence Speed: 1-2s Dependencies: LLM

10. response_builder

Constructs comprehensive analysis comment from results. Input: All previous step results Output: Formatted comment text Speed: <100ms Dependencies: None (uses previous results)

11. action_executor

Posts to GitHub and applies suggested actions. Input: Comment text, labels, transfer target Output: GitHub updates Speed: 100-500ms per action Dependencies: GitHub API

12. indexer

Adds or updates issue in vector database for semantic search. Input: Issue text + metadata Output: Indexed in Qdrant Speed: 500ms-1s Dependencies: Embedder, Qdrant

13. pending_action_scheduler

Schedules actions for later execution. Input: Pending actions Output: Scheduled operations Speed: <10ms Dependencies: State management

Execution flow

START

gatekeeper (Check enabled)

command_handler (Parse commands)

vectordb_prep (Create collection)

similarity_search (Find related)

transfer_check (Rule routing)

llm_router (AI routing)

duplicate_detector (Find dupes)

quality_checker (Score)

triage (Suggest labels)

response_builder (Build message)

action_executor (Post actions)

indexer (Add to DB)

pending_action_scheduler (Schedule)

END (Output results)
Each step processes the same Context object, passing data forward.

Step dependencies

Steps can be reordered but dependencies matter:
  • similarity_search needs vectordb_prep
  • duplicate_detector needs similarity_search
  • llm_router optional (needs LLM only)
  • response_builder should be near end
  • action_executor should be near end

Next steps