Spark: Synergistic Policy And Reward Co-Evolving Framework

๐Ÿ“–Paper | ๐Ÿค—Models | ๐Ÿค—Datasets | ๐Ÿค—Daily Paper

๐ŸŒˆ Introduction: We propose SPARK, a unified framework that integrates policy and reward into a single model for joint and synchronous training. SPARK can automatically derive reward and reflection data from verifiable reward, enabling self-learning and self-evolution.

๐Ÿค— Models: We release the checkpoints at internlm/Spark-VL-7B.

๐Ÿค— Datasets: Training data is available at internlm/Spark-Data.

๐Ÿ’ป Training Code: The training code and implementation details can be found at InternLM/Spark.


๐Ÿ“ธ Upload an image and enter a prompt or ๐Ÿ–ผ๏ธ choose the input from the example gallery (image + prompt).

512 4096
0 1.5
0 1
1 200

Examples

๐Ÿ“ Math Reasoning Examples

๐ŸŽฏ Reward Model Examples


If you find this project useful, please kindly cite:

@article{liu2025spark,
  title={SPARK: Synergistic Policy And Reward Co-Evolving Framework},
  author={Liu, Ziyu and Zang, Yuhang and Ding, Shengyuan and Cao, Yuhang and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2509.22624},
  year={2025}
}