Shaobo Wang (王少博)

Ph.D Candidate, SAI, SJTU

avatar.jpg

Mail: gszfwsb@gmail.com

Tel: (+86) 15000937315

City: Shanghai, 200240

I am now a second-year Ph.D Candidate in the School of Artificial Intelligence, Shanghai Jiao Tong University (SJTU), fortunate to be advised by Prof. Linfeng Zhang. Currently, I am also a research intern at the Alibaba Qwen Team, where I am supervised by Dr. Dayiheng Liu, Xingzhang Ren, and Kexin Yang. Here, I also closely collaborate with Dr. Fei Huang and Huiqiang Jiang.

Previously, I was a master’s student of ReThinkLab at SJTU, where I was grateful to be mentored by Prof. Junchi Yan. Additionally, I collaborated closely with Prof. Xuming Hu at Hong Kong University of Science and Technology (Guangzhou), and Dr. Conghui He at Shanghai AI Laboratory. I used to work with Prof. Zhuoran Yang at Yale University.

Research. My research bridges empirical and theoretical perspectives on data-centric AI. I am primarily focused on data selection, synthesis, and sampling strategies for Large Language Model pre-training. Previously, my work centered on Explainable AI.

Curriculum Vitae | 中文简历

:blush: Short bio. I was born in Hefei, China. Beyond academia, I have been devoted to the piano for 15 years and once had the privilege of performing alongside the world-renowned pianist Lang Lang. My musical inspirations are drawn from the Romantic era—particularly the works of Frédéric Chopin and Franz Liszt—while I also find myself captivated by the soulful rhythms of R&B, Jazz, and Neo-Soul. During my teenage years, I won several chess championships in Anhui Province, China, under the guidance of Chess Grandmaster Chongsheng Zeng and Chess Master Yongjin Zhou.

News

  • [July 2025] I am honored to be selected for the Tencent PhD Research Incentive Program (one of 23 recipients in China).
  • [March 2025] Our paper, “Dataset Distillation with Neural Characteristic Function: A Minmax Perspective,” received full scores (5/5/5) from all three reviewers at CVPR 2025.

Selected Publications

* denotes the equal contribution.
  1. Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
    Shaobo Wang , Cong Wang , Wenjie Fu , Yue Min , Mingquan Feng , Isabel Guan , Xuming Hu , Conghui He , and 6 more authors
    2026
  2. Grounding and Enhancing Informativeness and Utility in Dataset Distillation
    Shaobo Wang , Yantai Yang , Guo Chen , Peiru Li , Kaixin Li , Yufa Zhou , and Zhaorun Chen Zhang
    The Fourteenth International Conference on Learning Representations, 2026
  3. WWW
    Bridging Visual Dynamics and Narrative Reasoning: Multimodal Large Language Models for Short Drama Quality Assessment
    Qingyang Liu , Jiangtong Li , Zelin Peng , Shaobo Wang , Zhaohe Liao , Shuochen Chang , Bingjie Gao , Haonan Zhao , and 3 more authors
    In The ACM Web Conference 2026 Industry Track , 2026
  4. UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective
    Furui Xu* , Shaobo Wang* , Jiajun Zhang , Chenghao Sun , Haixiang Tang , and Linfeng Zhang
    Annual AAAI Conference on Artificial Intelligence, 2026
  5. ImagebindDC: Compressing Multimodal Data with Imagebind-based Condensation
    Yue Min* , Shaobo Wang* , Jiaze Li , Tianle Niu , Junxin Fan , Yongliang Miao , Lijin Yang , and Linfeng Zhang
    Annual AAAI Conference on Artificial Intelligence, 2026
  6. Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
    Shaobo Wang , Yicun Yang , Zhiyuan Liu , Chenghao Sun , Xuming Hu , Conghui He , and Linfeng Zhang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
  7. Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
    Shaobo Wang , Xiangqi Jin , Ziming Wang , Jize Wang , Jiajun Zhang , Kaixin Li , Zichen Wen , Zhong Li , and 3 more authors
    Annual Meeting of the Association for Computational Linguistics, 2025
  8. Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
    Shaobo Wang , Hongxuan Tang , Mingyang Wang , Hongrui Zhang , Xuyang Liu , Weiya Li , Xuming Hu , and Linfeng Zhang
    International Conference on Learning Representations, 2025
  9. Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
    Shaobo Wang , Jiaming Wang , Jiajun Zhang , Cong Wang , Yue Min , Zichen Wen , Fei Huang , Huiqiang Jiang , and 3 more authors
    2025
  10. Socratic-Zero: Bootstrapping Reasoning via Data-Free Agent Co-evolution
    Shaobo Wang , Zhengbo Jiao , Zifan Zhang , Yilang Peng , Xu Ze , Boyu Yang , Wei Wang , Hu Wei , and 1 more author
    2025
  11. VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
    Shaobo Wang , Tianle Niu , Runkang Yang , Deshan Liu , Xu He , Zichen Wen , Conghui He , Xuming Hu , and 1 more author
    2025
  12. CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs
    Shaobo Wang , Yongliang Miao , Yuancheng Liu , Qianli Ma , Ning Liao , and Linfeng Zhang
    2025
  13. Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
    Zichen Wen , Shaobo Wang , Yufa Zhou , Junyuan Zhang , Qintong Zhang , Yifeng Gao , Zhaorun Chen , Bin Wang , and 3 more authors
    2025
  14. Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-V2)
    Qifeng Li , Xiaosong Jia , Shaobo Wang , and Junchi Yan
    European Conference on Computer Vision, 2024
  15. Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
    Zichen Wen , Yifeng Gao , Shaobo Wang , Junyuan Zhang , Qintong Zhang , Weijia Li , Conghui He , and Linfeng Zhang
    2025
  16. Compute only 16 tokens in one timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching
    Zhixin Zheng , Xinyu Wang , Chang Zou , Shaobo Wang , and Linfeng Zhang
    ACM Multimedia, 2025
  17. SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching
    Jiacheng Liu , Chang Zou , Yuanhuiyi Lyu , Fei Ren , Shaobo Wang , Kaixin Li , and Linfeng Zhang
    ACM Multimedia, 2025
  18. Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
    Siyu Chen , Heejune Sheen , Tianhao Wang , and Zhuoran Yang
    Advances in Neural Information Processing Systems, 2024
  19. DRUPI: Dataset Reduction Using Privileged Information
    Shaobo Wang , Yantai Yang , Shuaiyu Zhang , Chenghao Sun , Weiya Li , Xuming Hu , and Linfeng Zhang
    The Future of Machine Learning Data Practices and Repositories at ICLR, 2024
  20. Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation
    Shaobo Wang , Yantai Yang , Qilong Wang , Kaixin Li , Linfeng Zhang , and Junchi Yan
    Synthetic Data for Computer Vision Workshop at CVPR, 2025
  21. Visualizing the emergence of intermediate visual patterns in dnns
    Mingjie Li , Shaobo Wang , and Quanshi Zhang
    Advances in Neural Information Processing Systems, 2021