Tag: benchmarks

All the articles with the tag "benchmarks".

LHC v0.2: A Benchmark for Long-Horizon Agent Coherence (and the Methodology That Got It Honest)
Published:May 10, 2026 at 08:00 PM
I just published LHC v0.2, an open benchmark for long-horizon coherence in 8B-class agent models, plus a deterministic parser baseline that puts a useful floor on what fine-tuning is worth for structured-state tasks. This post explains what they're for, how to use them, and the methodology arc that produced them across five rounds of external review.
LHC v0.2: Um Benchmark para Coerência de Longo Horizonte em Agentes (e a Metodologia que Tornou os Resultados Honestos)
Published:May 10, 2026 at 08:00 PM
Acabei de publicar o LHC v0.2, um benchmark aberto para coerência de longo horizonte em modelos de agentes da classe 8B, mais um baseline de parser determinístico que coloca um piso útil sobre o que fine-tuning vale para tarefas de estado estruturado. Este post explica para que servem, como usá-los, e o arco metodológico que os produziu ao longo de cinco rodadas de revisão externa.

LHC v0.2: A Benchmark for Long-Horizon Agent Coherence (and the Methodology That Got It Honest)