Cost mode:

Semantic similarity judgment: does this thing belong in that bucket / match that target?

12 task types in this category.

Task-by-task breakdown

Relevance Scoring (POST)

Scores RetrievedContent against synthesis capability description (stage 40 relevance_analysis). Split from pooled relevance_scoring on 2026-05-17 to remove inter-family σ inflation. …

RankModelQualityCost / callvs best
1Gemini 3.1 Pro Preview best5.69$0.004000

Relevance Scoring (Topic Report)

Scores TOPIC_REPORT PartialSyntheses against analysis template (stage 132) and report chapters (stage 134). Split from pooled relevance_scoring on 2026-05-17. …

RankModelQualityCost / callvs best
1Qwen 3.5 Flash best8.48$0.000160
2MiniMax M2.58.44$0.000900-462%

Relevance Scoring (X Post)

Scores batched X-com posts against synthesis capability (x_post_relevance stage). Split from pooled relevance_scoring on 2026-05-17. GENERIC_RELEVANCE_SCORE_{SYSTEM,USER}_PROMPT, batched input (≥20k …

No model has reached MEDIUM confidence yet — accumulating evidence.