Claude Fable 5 does not appear to have been deliberately weakened, despite two benchmarks producing sharply different conclusions about its capabilities. The discrepancy is reportedly explained by a routing layer that behaves too cautiously, rather than by a decline in the underlying model.
The distinction matters because benchmark scores can shape how users and companies judge AI systems. If an intermediary routing mechanism restricts or redirects requests, tests may measure that layer’s behavior instead of the model’s actual performance.
The contrasting results also show why a single benchmark can provide an incomplete picture. Two evaluations of the same system may diverge when prompts are handled differently before reaching the model.
In this case, the evidence described points to an overprotective router as the source of the apparent regression. The episode is therefore less about Claude Fable 5 becoming less capable and more about how surrounding infrastructure can influence perceived model quality.