Shaobo Wang (王少博)
Ph.D. Student, SAI, SJTU | Short Bio
Ditan Park, Beijing, 2026
Under old cypresses,
beside red-tiled eaves.
I am currently a Ph.D. student at Shanghai Jiao Tong University, under the supervision of Prof. Linfeng Zhang. Previously, I was a master’s student of ReThinkLab at SJTU, supervised by Prof. Junchi Yan.
I study data. My work centers on data for foundation models, especially at the pre-training and mid-training stages. Currently, I focus on coding and general agent capabilities at
Moonshot Kimi. Previously, I worked on pre-training and mid-training at the
Qwen Team. My research interests include:
- Data selection and mixture: Maximizing data utility, including both quality and diversity (Qwen3/3.5, OPUS, Data Whisperer).
- Data distillation and synthesis: Creating unseen data to break the data wall (Qwen3.6/3.7, NCFM, InfoUtil, Socratic-Zero).
- Data evaluation: Building faithful evaluation for modeling real-world capabilities (EssenceBench, OPUS).
I have been fortunate to work with outstanding researchers, particularly Dr. Dayiheng Liu and Xingzhang Ren at the Qwen Team. During my internship at Qwen, I also collaborated with Huiqiang Jiang, Fei Huang, and Junyang Lin. Academically, I also collaborate closely with Dr. Conghui He at Shanghai AI Laboratory and Prof. Xuming Hu at HKUST(GZ). I am currently on the job market and seeking research scientist positions.
News
| 2026 | My paper OPUS was accepted as an oral presentation at ICML 2026. |
|---|---|
| 2025 | I was selected as a Tencent Hunyuan Scholar in 2025, one of 23 recipients. |
| 2025 | My paper NCFM received full scores at CVPR 2025 and was selected as a Highlight paper. |