Seed1.5-VL 技术报告

发表
Tianheng ChengTianheng Cheng 提交
作者: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang ShiGuang Shi, haobin chenHaobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei WangJiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping YuanLiping Yuan, Lishu Luo, Pengfei Liu, QinghaoYeQinghao Ye, Rui Qian, Shen Yan, Shixiong ZhaoShixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng ChengTianheng Cheng, Weiwei Liu, Wenqian Wang, Xianhan ZengXianhan Zeng, Xiao Liu, Xiaobo Qin, Xiaohan Ding, Xiaojun Xiao, Xiaoying Zhang, Xuanwei Zhang, Xuehan Xiong, Yanghua Peng, Yangrui Chen, Yanwei Li, Yanxu Hu, Yi Lin, Yiyuan Hu, Yiyuan ZhangYiyuan Zhang, Youbin Wu, Yu Li, Yudong Liu, Yue Ling, Yujia Qin, Zanbo Wang, hezhiwuZhiwu He, Aoxue Zhang, YiBairen Yi, Bencheng Liao, Can Huang, Can Zhang, Chaorui Deng, Chaoyi Deng, Cheng Lin, Cheng Yuan, Chenggang Li, Chenhui GouChenhui Gou, Chenwei Lou, Chengzhi Wei, Chundian Liu, Chunyuan Li, Deyao Zhu, Donghong Zhong, Feng Li, Feng Zhang, Gang Wu, Guodong Li, Guohong Xiao, Haibin Lin, Haihua Yang, Haoming WangHaoming Wang, Heng Ji, Hongxiang Hao, Hui Shen, HXLeeHuixia Li, Jiahao Li, Jialong WuJialong Wu, Jianhua Zhu, Jianpeng Jiao, Jiashi Feng, Jiaze Chen, Jianhui Duan, Jihao Liu, Jin Zeng, Jingqun Tang, Jingyu Sun, Joya ChenJoya Chen, Jun Long, Junda Feng, Junfeng Zhan, Junjie Fang, Junting Lu, Kai Hua, Kai Liu, Kai Shen, Kaiyuan Zhang, shenkeKe Shen, Ke Wang, Keyu Pan, Kun Zhang, Kunchang LiKunchang Li, Lanxin Li, Lei Li, Lei Shi, Li Han, Liang Xiang, Liangqiang Chen, Lin ChenLin Chen, Lin Li, Lin Yan, Liying Chi, Longxiang Liu, Mengfei Du, Mingxuan Wang, Ningxin Pan, Peibin Chen, Pengfei Chen, Pengfei Wu, Qingqing Yuan, Qingyao Shuai, Qiuyan Tao, Renjie Zheng, Renrui Zhang, Ru Zhang, Rui Wang, Rui Yang, Rui Zhao, Shaoqiang Xu, Shihao Liang, Shipeng Yan, Shu Zhong, Shuaishuai Cao, Shuangzhi Wu, Shufan Liu, Shuhan Chang, Songhua Cai, Tenglong Ao, Tianhao Yang, Tingting Zhang, Wanjun Zhong, Wei Jia, Wei Weng, Weihao YuWeihao Yu, Wenhao Huang, Wenjia Zhu, Wenli Yang, Wenzhi Wang, Xiang Long, XiangRui Yin, Xiao Li, Xiaolei Zhu, Xiaoying Jia, Xijin Zhang, Xin Liu, Xinchen ZhangXinchen Zhang, Xinyu Yang, Xiongcai Luo, Xiuli Chen, Xuantong Zhong, Xuefeng Xiao, Xujing Li, Yan Wu, Yawei Wen, Yifan Du, Yihao Zhang, Yining Ye, Yonghui Wu, Yu Liu, Yu Yue, Yufeng Zhou, Yufeng Yuan, Yuhang Xu, Yuhong Yang, Yun Zhang, Yunhao Fang, Yuntao Li, Yurui Ren, Yuwen Xiong, hongzehuaZehua Hong, Zehua Wang, Zewei Sun, Zeyu Wang, Zhao Cai, Zhaoyue Zha, Zhecheng An, Zhehui Zhao, Zhengzhuo Xu, Zhipeng Chen, Zhiyong Wu, Zhuofan Zheng, Zihao WangZihao Wang, Zilong HuangZilong Huang, Ziyu Zhu, Zuquan Song

摘要

AI 生成总结
Seed1.5-VL是一个结合了视觉编码器和大型MoE LLM的视觉-语言基础模型,在各种基准测试中取得了最先进的性能,并在视觉谜题等多模态推理任务中表现出色。
我们提出了 Seed1.5-VL,一个旨在提升通用多模态理解和推理能力的视觉-语言基础模型。Seed1.5-VL 由一个 5.32 亿参数的视觉编码器和一个具有 200 亿活跃参数的专家混合 (MoE) 大语言模型组成。尽管其架构相对紧凑,但在广泛的公共 VLM 基准测试和内部评估套件中均表现强劲,在 60 个公共基准测试中有 38 个达到了最先进的性能。此外,在 GUI 控制和游戏玩法等智能体中心任务中,Seed1.5-VL 表现优于包括 OpenAI CUA 和 Claude 3.7 在内的领先多模态系统。除了视觉和视频理解外,它还展示了强大的推理能力,这使其对于视觉谜题等多模态推理挑战尤其有效。我们相信这些能力将赋能更广泛、更多样化的任务应用。在本报告中,我们主要全面回顾了我们在模型设计、数据构建和训练各个阶段构建 Seed1.5-VL 的经验,希望本报告能启发进一步的研究。Seed1.5-VL 现已可在 https://www.volcengine.com/ 访问(火山引擎模型ID:doubao-1-5-thinking-vision-pro-250428)。
查看 arXiv 页面查看 PDF

评论

Tianheng ChengTianheng Cheng
论文作者
论文提交者

Seed1.5-VL,一个强大且高效的视觉-语言基础模型,旨在实现先进的通用多模态理解和推理,在60个公开基准测试中的38个上实现了最先进的性能。

GitHub: https://github.com/ByteDance-Seed/Seed1.5-VL
API: https://www.volcengine.com/product/doubao

ZhangZhang

这个模型将来会开源吗?

YJYJ

便于随时随地学习的语音介绍:https://youtu.be/h-l7jqKs-Xg