Shaobo Wang (王少博)

Ph.D. Student, SAI, SJTU | Short Bio

profile_photo_cropped.jpg

Ditan Park, Beijing, 2026
Under old cypresses,
beside red-tiled eaves.

I am currently a Ph.D. student at Shanghai Jiao Tong University, under the supervision of Prof. Linfeng Zhang. Previously, I was a master’s student of ReThinkLab at SJTU, supervised by Prof. Junchi Yan.

I study data. My work centers on data for foundation models, especially at the pre-training and mid-training stages. Currently, I focus on coding and general agent capabilities at Moonshot Kimi. Previously, I worked on pre-training and mid-training at the Qwen Team. My research interests include:

I have been fortunate to work with outstanding researchers, particularly Dr. Dayiheng Liu and Xingzhang Ren at the Qwen Team. During my internship at Qwen, I also collaborated with Huiqiang Jiang, Fei Huang, and Junyang Lin. Academically, I also collaborate closely with Dr. Conghui He at Shanghai AI Laboratory and Prof. Xuming Hu at HKUST(GZ). I am currently on the job market and seeking research scientist positions.

News

2026 My paper OPUS was accepted as an oral presentation at ICML 2026.
2025 I was selected as a Tencent Hunyuan Scholar in 2025, one of 23 recipients.
2025 My paper NCFM received full scores at CVPR 2025 and was selected as a Highlight paper.

Selected Publications

* denotes the equal contribution.
  1. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
    Shaobo Wang , Xuan Ouyang , Tianyi Xu , Yuzheng Hu , Jialin Liu , Guo Chen , Tianyu Zhang , Junhao Zheng , and 4 more authors
    In International Conference on Machine Learning , 2026
  2. Grounding and Enhancing Informativeness and Utility in Dataset Distillation
    Shaobo Wang , Yantai Yang , Guo Chen , Peiru Li , Kaixin Li , Yufa Zhou , Zhaorun Chen , and Linfeng Zhang
    In The Fourteenth International Conference on Learning Representations , 2026
  3. Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
    Shaobo Wang* , Cong Wang* , Wenjie Fu* , Yue Min , Mingquan Feng , Isabel Guan , Xuming Hu , Conghui He , and 6 more authors
    In The Fourteenth International Conference on Learning Representations , 2026
  4. Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
    Shaobo Wang , Xiangqi Jin , Ziming Wang , Jize Wang , Jiajun Zhang , Kaixin Li , Zichen Wen , Zhong Li , and 3 more authors
    In Annual Meeting of the Association for Computational Linguistics , 2025
  5. Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
    Shaobo Wang , Yicun Yang , Zhiyuan Liu , Chenghao Sun , Xuming Hu , Conghui He , and Linfeng Zhang
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2025
  6. Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
    Shaobo Wang , Hongxuan Tang , Mingyang Wang , Hongrui Zhang , Xuyang Liu , Weiya Li , Xuming Hu , and Linfeng Zhang
    In International Conference on Learning Representations , 2025