Shaobo Wang (王少博)

Ph.D Candidate, SAI, SJTU

avatar.jpg

Mail: gszfwsb@gmail.com

Tel: (+86) 15000937315

City: Shanghai, 200240

I am now a second-year Ph.D Candidate in the School of Artificial Intelligence, Shanghai Jiao Tong University (SJTU), fortunate to be advised by Prof. Linfeng Zhang. Currently, I am also a research intern at the Alibaba Qwen Team, where I am supervised by Dr. Dayiheng Liu and Xingzhang Ren. Here, I also closely collaborate with Dr. Fei Huang, Huiqiang Jiang, Kexin Yang, Yubo Ma, and Beichen Zhang.

Previously, I was a master’s student of ReThinkLab at SJTU, where I was grateful to be mentored by Prof. Junchi Yan. Additionally, I collaborated closely with Prof. Xuming Hu at Hong Kong University of Science and Technology (Guangzhou), and Dr. Conghui He at Shanghai AI Laboratory. I used to work with Prof. Zhuoran Yang at Yale University.

Research. My research bridges empirical and theoretical perspectives on data-centric AI. I am primarily focused on data selection, synthesis, and sampling strategies for Large Language Model pre-training. Previously, my work centered on Explainable AI.

:blush: Short bio. I was born in Hefei, China. Beyond academia, I have been devoted to the piano for 15 years and once had the privilege of performing alongside the world-renowned pianist Lang Lang. My musical inspirations are drawn from the Romantic era—particularly the works of Frédéric Chopin and Franz Liszt—while I also find myself captivated by the soulful rhythms of R&B, Jazz, and Neo-Soul. During my teenage years, I won several chess championships in Anhui Province, China, under the guidance of Chess Grandmaster Chongsheng Zeng and Chess Master Yongjin Zhou.

News

  • [July 2025] I am honored to be selected for the Tencent PhD Research Incentive Program (one of 23 recipients in China).
  • [March 2025] Our paper, “Dataset Distillation with Neural Characteristic Function: A Minmax Perspective,” received full scores (5/5/5) from all three reviewers at CVPR 2025.

Selected Publications

* denotes the equal contribution.
  1. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
    Shaobo Wang , Xuan Ouyang , Tianyi Xu , Yuzheng Hu , Jialin Liu , Guo Chen , Tianyu Zhang , Junhao Zheng , and 3 more authors
    arXiv preprint arXiv:2602.05400, 2026
  2. Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
    Shaobo Wang , Cong Wang , Wenjie Fu , Yue Min , Mingquan Feng , Isabel Guan , Xuming Hu , Conghui He , and 6 more authors
    2026
  3. Grounding and Enhancing Informativeness and Utility in Dataset Distillation
    Shaobo Wang , Yantai Yang , Guo Chen , Peiru Li , Kaixin Li , Yufa Zhou , and Zhaorun Chen Zhang
    The Fourteenth International Conference on Learning Representations, 2026
  4. Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
    Shaobo Wang , Yicun Yang , Zhiyuan Liu , Chenghao Sun , Xuming Hu , Conghui He , and Linfeng Zhang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
  5. Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
    Shaobo Wang , Xiangqi Jin , Ziming Wang , Jize Wang , Jiajun Zhang , Kaixin Li , Zichen Wen , Zhong Li , and 3 more authors
    Annual Meeting of the Association for Computational Linguistics, 2025
  6. Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
    Shaobo Wang , Hongxuan Tang , Mingyang Wang , Hongrui Zhang , Xuyang Liu , Weiya Li , Xuming Hu , and Linfeng Zhang
    International Conference on Learning Representations, 2025
  7. Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
    Shaobo Wang , Jiaming Wang , Jiajun Zhang , Cong Wang , Yue Min , Zichen Wen , Fei Huang , Huiqiang Jiang , and 3 more authors
    2025
  8. Socratic-Zero: Bootstrapping Reasoning via Data-Free Agent Co-evolution
    Shaobo Wang , Zhengbo Jiao , Zifan Zhang , Yilang Peng , Xu Ze , Boyu Yang , Wei Wang , Hu Wei , and 1 more author
    2025