Publications

publications by categories in reversed chronological order. * denotes the equal contribution.

2025

  1. Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
    Shaobo Wang , Yicun Yang , Zhiyuan Liu , Chenghao Sun , Xuming Hu , Conghui He , and Linfeng Zhang
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025
  2. Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
    Shaobo Wang , Xiangqi Jin , Ziming Wang , Jize Wang , Jiajun Zhang , Kaixin Li , Zichen Wen , Zhong Li , and 3 more authors
    Annual Meeting of the Association for Computational Linguistics, 2025
  3. Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
    Shaobo Wang , Hongxuan Tang , Mingyang Wang , Hongrui Zhang , Xuyang Liu , Weiya Li , Xuming Hu , and Linfeng Zhang
    International Conference on Learning Representations, 2025
  4. Dd-ranking: Rethinking the evaluation of dataset distillation
    Zekai Li , Xinhao Zhong , Samir Khaki , Zhiyuan Liang , Yuhao Zhou , Mingjia Shi , Ziqiao Wang , Xuanlei Zhao , and 3 more authors
    2025
  5. Shifting AI Efficiency From Model-Centric to Data-Centric Compression
    Xuyang Liu* , Zichen Wen* , Shaobo Wang* , Junjie Chen , Zhishan Tao , Yubo Wang , Xiangqi Jin , Chang Zou , and 8 more authors
    2025
  6. dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
    Zhiyuan Liu , Yicun Yang , Yaojie Zhang , Junjie Chen , Chang Zou , Qingyan Wei , Shaobo Wang , and Linfeng Zhang
    2025
  7. Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
    Yufa Zhou* , Shaobo Wang* , Xingyu Dong* , Xiangqi Jin , Yifang Chen , Yue Min , Xingzhang Ren , Kexin Yang , and 2 more authors
    2025
  8. Rethink Dataset Pruning From a Generalization Perspective
    Furui Xu* , Shaobo Wang* , Chenghao Sun , Jiajun Zhang , and Linfeng Zhang
    The Future of Machine Learning Data Practices and Repositories at ICLR 2025, 2025
  9. Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
    Zichen Wen , Yifeng Gao , Shaobo Wang , Junyuan Zhang , Qintong Zhang , Weijia Li , Conghui He , and Linfeng Zhang
    2025
  10. Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation
    Shaobo Wang , Yantai Yang , Qilong Wang , Kaixin Li , Linfeng Zhang , and Junchi Yan
    Synthetic Data for Computer Vision Workshop at CVPR, 2025

2024

  1. Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-V2)
    Qifeng Li , Xiaosong Jia , Shaobo Wang , and Junchi Yan
    European Conference on Computer Vision, 2024
  2. Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
    Siyu Chen , Heejune Sheen , Tianhao Wang , and Zhuoran Yang
    Advances in Neural Information Processing Systems, 2024
  3. DRUPI: Dataset Reduction Using Privileged Information
    Shaobo Wang , Yantai Yang , Shuaiyu Zhang , Chenghao Sun , Weiya Li , Xuming Hu , and Linfeng Zhang
    The Future of Machine Learning Data Practices and Repositories at ICLR, 2024

2023

  1. Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework
    Shaobo Wang , Xiangdong Zhang , and Junchi Yan
    arXiv preprint arXiv:2311.15993 (arXiv), 2023

2022

  1. Trap of Feature Diversity in the Learning of MLPs
    Dongrui Liu* , Shaobo Wang* , Jie Ren , Kangrui Wang , Sheng Yin , Huiqi Deng , and Quanshi Zhang
    arXiv preprint arXiv:2112.00980 (arXiv), 2022

2021

  1. Visualizing the emergence of intermediate visual patterns in dnns
    Mingjie Li , Shaobo Wang , and Quanshi Zhang
    Advances in Neural Information Processing Systems, 2021