Spark: Synergistic Policy And Reward Co-Evolving Framework
๐Paper | ๐คModels | ๐คDatasets | ๐คDaily Paper
๐ Introduction: We propose SPARK, a unified framework that integrates policy and reward into a single model for joint and synchronous training. SPARK can automatically derive reward and reflection data from verifiable reward, enabling self-learning and self-evolution.
๐ค Models: We release the checkpoints at internlm/Spark-VL-7B.
๐ค Datasets: Training data is available at internlm/Spark-Data.
๐ป Training Code: The training code and implementation details can be found at InternLM/Spark.
๐ธ Upload an image and enter a prompt or ๐ผ๏ธ choose the input from the example gallery (image + prompt).

512 4096
0 1.5
0 1
1 200
Examples
๐ Math Reasoning Examples
๐ฏ Reward Model Examples
If you find this project useful, please kindly cite:
@article{liu2025spark,
title={SPARK: Synergistic Policy And Reward Co-Evolving Framework},
author={Liu, Ziyu and Zang, Yuhang and Ding, Shengyuan and Cao, Yuhang and Dong, Xiaoyi and Duan, Haodong and Lin, Dahua and Wang, Jiaqi},
journal={arXiv preprint arXiv:2509.22624},
year={2025}
}