Fine-grain evaluation & Large Reasoning Models that fails in reasoning due to reasoning rigidity.
ConditionedMath (AIME & MATH500) ยท PuzzleTrivial ยท Zero-shot pipelines


๐Ÿ“œ Why ReasoningTrap?

Current RL-tuned Reasoning LLMs excel at producing answers but often ignore explicit user constraints.
ReasoningTrap surfaces these failure modes with carefully crafted, conditioned problems.