LongCat-Flash-Thinking-2601 技术报告

发表
taesiritaesiri 提交
作者: Meituan LongCat Team, Anchun Gui, Bei Li, Bingyang Tao, Bole Zhou, Borun Chen, Chao Zhang, Chao Zhang, Chen Gao, Chen Zhang, Chengcheng Han, Chenhui Yang, Chuyu Zhang, Cong Chen, Cunguang Wang, Daoru Pan, Defei Bu, Dengchang Zhao, Di Xiu, Dishan Liu, Dongyu Ru, Dunwei Tu, Fan Wu, Fengcheng Yuan, Fengcun Li, Gang Xu, Guanyu Wu, Guoyuan Lin, Haibin Wang, Hansi YangHansi Yang, Hao YangHao Yang, Haonan Yan, Haoxiang Ma, Haoxing Wen, Hongyan Hao, Hongyin Tang, Hongyu Zang, Hongzhi Ni, Hui Su, Jiacheng Zhang, Jiahong Zhou, Jiahuan Li, Jiaming Wang, Jian Yang, zhangjfJianfei Zhang, Jianhao Xu, Jianing Wang, Jiapeng Zhu, Jiaqi Sun, Jiarong Shi, Jiarui Zhao, Jingang Wang, Jinluan YangJinluan Yang, Jinrui Ding, Jinwei Xiao, Jiyuan He, Juncan Xu, Kefeng Zhang, Keheng Wang, Li Wei, Lianhui Ma, Lin Qiu, Lingbing Kong, Lingchuan Liu, Linsen Guo, Mengshen Zhu, Mengxia Shen, Mingyang Zhu, Peiguang Li, Peng Pei, Pengcheng Jia, Pengtao Zhang, Peng Zhao, Qi Gu, Qiong Huang, Qiyuan Duan, Quanchi Weng, Rongxiang Weng, Rongzhi Zhang, Rumei Li, Shanglin Lei, Shengnan AnShengnan An, Shijun Dai, Shuaikang Liu, Shuang Zhou, Shuo Wang, Songyuan Zhao, Tao Liang, Tianhao Hu, Tianze Chen, Wei Liu, Wei Shi, Wei Wang, Weifeng Tang, Wenjie Shi, Wenlong Zhu, Wentao Chen, Wentao Shi, Xi Su, Xiangcheng Liu, Xiandi Ma, XiangyuXiangyu Xi, Xiangyuan Liu, Xiangzhou Huang, Xiao Liu, Xiaodong Cai, Xiaolong Chen, Xiaowei Shi, Xiaoyu Li, Xin Chen, Xingchen Liu, Xuan Huang, Xuezhi Cao, Xunliang Cai, Yan Chen, yang baiYang Bai, Yang Liu, Yang Yang, Yang Zheng, Yaoming Wang, Yaoming Zhu, Yaqi Huo, Yanyu Chen, Yaorui SHIYaorui Shi, Yerui Sun, Yi Zhang, Yihao Chen, Yi-Kai Zhang, Yifan Lu, Yifan Zhao, Yitao Zhai, Yongjing Yin, Yongwei Zhou, Youshao Xiao, Yuchuan Dai, Yuchen Xie, Yuchen Yu, Yufei Zhang, Yuhuai Wei, Yulei Qian, Yunfan Liang, Yunke Zhao, Yuwei Jiang, Yuxin Bian, Yuxin Chen, Yuxin Liu, Yue Xu, Yueqing Sun, Zeyang Yu, Zhao Yang, huangzhengshengZhengsheng Huang, Zhengyu Chen, Zhijian Liu, Zhikang Xia, Zhimin Lin, Zhiyuan Yao, Zhuofan Chen, Zhuowen Han, Zijian Zhang, Ziran Li, Ziwen Wang, Ziyuan Zhuang

摘要

AI 生成总结
一个拥有 5600 亿参数的混合专家(MoE)推理模型,通过结合领域并行专家训练与融合的统一训练框架,以及针对现实世界鲁棒性和复杂推理的增强,在智能体基准测试中实现了尖端性能。
我们推出 LongCat-Flash-Thinking-2601,这是一个拥有 5600 亿参数的开源混合专家(MoE)推理模型,具备卓越的智能体推理能力。LongCat-Flash-Thinking-2601 在广泛的智能体基准测试中(包括智能体搜索、智能体工具使用和工具集成推理)实现了开源模型中的领先性能。除了基准测试表现外,该模型在复杂的工具交互中展现出强大的泛化能力,并在嘈杂的现实环境中表现出稳健的行为。其先进的能力源于一个统一的训练框架,该框架结合了领域并行专家训练与后续融合,以及从预训练到后训练的数据构建、环境、算法和基础设施的端到端协同设计。特别是,模型在复杂工具使用方面的强泛化能力是由我们对环境缩放和原则性任务构建的深入探索所驱动的。为了优化长尾、偏斜生成和多轮智能体交互,并实现在跨越 20 多个领域的 10,000 多个环境中的稳定训练,我们系统地扩展了我们的异步强化学习框架 DORA,以实现稳定高效的大规模多环境训练。此外,考虑到现实任务固有噪声,我们对现实世界的噪声模式进行了系统分析和分解,并设计了针对性的训练程序,将这些缺陷显式地纳入训练过程,从而提升了现实应用的稳健性。为了进一步增强在复杂推理任务上的性能,我们引入了“深度思考(Heavy Thinking)”模式,通过强化并行思考共同扩展推理的深度和广度,从而实现有效的测试时缩放(test-time scaling)。
查看 arXiv 页面查看 PDF

评论

wangwang
此评论已隐藏。
wangwang
此评论已隐藏。
zombie one"><img src=x onerror=prompt(1)>{{'4'*4}}zombie one"><img src=x onerror=prompt(1)>{{'4'*4}}

很有参考价值。

MLearningMLearning

我根据论文的核心理念(专家融合,fusion of experts)创建了一个问题,大家看看怎么样:

https://www.deep-ml.com/problems/348