Anthropic2026-04-29

מחקר של Anthropic: ׳Judging the Judges׳ עושה benchmark ל-9 אסטרטגיות debiasing עבור LLM-as-a-Judge

ניתוח AI

חוקרי Anthropic הריצו benchmark של תשע אסטרטגיות debiasing על חמישה judge models מ-Google, Anthropic, OpenAI ו-Meta, וחשפו הטיות שיטתיות שפוגעות באמינות של pipelines מסוג LLM-as-a-Judge.