Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning
Violet Xiang, Chase Blagden, Rafael Rafailov, Nathan Lile, Sang Truong, Chelsea Finn, Nick Haber
Large reasoning models achieve higher performance on challenging tasks by generating more tokens, but this verbosity wastes computation on easy problems. We introduce Adaptive Length Penalty (ALP), which tailors generation length to per-prompt difficulty and cuts average token usage by ~50% with minimal performance loss.



