⏶4

大语言模型推理超频：监测和控制大语言模型中的思维路径长度

06月08日发表

06月10日由 Itamar Zimerman 提交

作者: Roy Eisenstadt, Itamar Zimerman, Lior Wolf

摘要

Recently, techniques such as explicit structured reasoning have demonstrated strong test-time scaling behavior by enforcing a separation between the model's internal "thinking" process and the final response. A key factor influencing answer quality in this setting is the length of the thinking stage. When the reasoning is too short, the model may fail to capture the complexity of the task. Conversely, when it is too long, the model may overthink, leading to unnecessary computation and degraded performance. This paper explores and exploits the underlying mechanisms by which LLMs understand and regulate the length of their reasoning during explicit thought processes. First, we show that LLMs encode their progress through the reasoning process and introduce an interactive progress bar visualization, which is then used to reveal insights on the model's planning dynamics. Second, we manipulate the internal progress encoding during inference to reduce unnecessary steps and generate a more concise and decisive chain of thoughts. Our empirical results demonstrate that this "overclocking" method mitigates overthinking, improves answer accuracy, and reduces inference latency. Our code is publicly available.

查看 arXiv 页面查看 PDF

Itamar Zimerman

论文提交者

最近，显式结构化推理等技术通过强制分离模型的内部“思考”过程和最终响应，展示了强大的测试时扩展行为。在这种情况下，影响答案质量的一个关键因素是思考阶段的长度。当推理过短时，模型可能无法捕捉任务的复杂性。相反，当推理过长时，模型可能会过度思考，导致不必要的计算和性能下降。本文探索并利用了LLM在显式思考过程中理解和调节推理长度的潜在机制。首先，我们展示了LLM编码其在推理过程中的进展，并引入了一种交互式进度条可视化，然后用其揭示模型规划动态的见解。其次，我们在推理过程中操纵内部进度编码，以减少不必要的步骤，并生成更简洁和果断的思维链。我们的实证结果表明，这种“超频”方法减轻了过度思考，提高了答案准确性，并减少了推理延迟。我们的代码已公开可用。

大语言模型推理超频：监测和控制大语言模型中的思维路径长度

摘要

评论