Publications

publications by categories in reversed chronological order. * denotes the equal contribution.

2026

  1. OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration
    Shaobo Wang , Xuan Ouyang , Tianyi Xu , Yuzheng Hu , Jialin Liu , Guo Chen , Tianyu Zhang , Junhao Zheng , and 4 more authors
    In International Conference on Machine Learning , 2026
  2. Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis
    Zhengbo Jiao , Shaobo Wang , Zifan Zhang , Xuan Ren , Wei Wang , Bing Zhao , Hu Wei , and Linfeng Zhang
    In International Conference on Machine Learning , 2026
  3. Breaking Dual Bottlenecks: Evolving Unified Multimodal Models into Self-Adaptive Interleaved Visual Reasoners
    Qingyang Liu , Bingjie Gao , Canmiao Fu , Zhipeng Huang , Chen Li , Feng Wang , Shuochen Chang , Shaobo Wang , and 4 more authors
    In International Conference on Machine Learning , 2026
  4. dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching
    Zhiyuan Liu , Yicun Yang , Yaojie Zhang , Junjie Chen , Chang Zou , Qingyan Wei , Shaobo Wang , Yichen Zhu , and 1 more author
    In International Conference on Machine Learning , 2026
  5. CircuitSeer: Mining High-Quality Data by Probing Mathematical Reasoning Circuits in LLMs
    Shaobo Wang* , Yongliang Miao* , Yuancheng Liu , Qianli Ma , Ning Liao , and Linfeng Zhang
    In The 64th Annual Meeting of the Association for Computational Linguistics , 2026
  6. Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling
    Yujie Chen , Tailai Chen , Yifeng Gao , Zoe Wanying He , Yijue Xu , Shaobo Wang , and Linfeng Zhang
    In The 64th Annual Meeting of the Association for Computational Linguistics , 2026
  7. MelTrim: Coarse-to-Fine Data Pruning for Speech Classification
    Shaobo Wang , Tianle Niu , Xuan Ouyang , Xintong Li , Zhengkun Ge , Yue Min , Xiaoqian Liu , Hankun Wang , and 9 more authors
    In Findings of the Association for Computational Linguistics: ACL 2026 , 2026
  8. WWW
    Bridging Visual Dynamics and Narrative Reasoning: Multimodal Large Language Models for Short Drama Quality Assessment
    Qingyang Liu , Jiangtong Li , Zelin Peng , Shaobo Wang , Zhaohe Liao , Shuochen Chang , Bingjie Gao , Haonan Zhao , and 3 more authors
    In The ACM Web Conference 2026 Industry Track , 2026
  9. Socratic-Geo: Synthetic Data Generation and Cross-Modal Geometric Reasoning via Multi-Agent Interaction
    Zhengbo Jiao* , Shaobo Wang* , Zifan Zhang* , Wei Wang , Bing Zhao , Hu Wei , and Linfeng Zhang
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2026
  10. Flash-Unified: A Training-Free and Task-Aware Acceleration Framework for Native Unified Models
    Junlong Ke , Zichen Wen , Boxue Yang , Yantai Yang , Xuyang Liu , Chenfei Liao , Zhaorun Chen , Shaobo Wang , and 1 more author
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Findings, 2026
  11. Grounding and Enhancing Informativeness and Utility in Dataset Distillation
    Shaobo Wang , Yantai Yang , Guo Chen , Peiru Li , Kaixin Li , Yufa Zhou , Zhaorun Chen , and Linfeng Zhang
    In The Fourteenth International Conference on Learning Representations , 2026
  12. Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
    Shaobo Wang* , Cong Wang* , Wenjie Fu* , Yue Min , Mingquan Feng , Isabel Guan , Xuming Hu , Conghui He , and 6 more authors
    In The Fourteenth International Conference on Learning Representations , 2026
  13. Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
    Shaobo Wang , Jiaming Wang , Jiajun Zhang , Cong Wang , Yue Min , Zichen Wen , Xingzhang Ren , Fei Huang , and 4 more authors
    ICLR 2026 Workshop on Data Problems for Foundation Models, 2026
  14. IndustryCode: A Benchmark for Industry Code Generation
    Puyu Zeng , Zhaoxi Wang , Zhixu Duan , Liang Feng , Shaobo Wang , Cunxiang Wang , Jinghang Wang , Bing Zhao , and 2 more authors
    arXiv preprint arXiv:2604.02729, 2026
  15. Do Phone-Use Agents Respect Your Privacy?
    Zhengyang Tang , Ke Ji , Xidong Wang , Zihan Ye , Xinyuan Wang , Yiduo Guo , Ziniu Li , Chenxin Li , and 14 more authors
    arXiv, 2026
  16. Bridging Visual Representation and Reinforcement Learning from Verifiable Rewards in Large Vision-Language Models
    Yuhang Han , Yuyang Wu , Zhengbo Jiao , Yiyu Wang , Xuyang Liu , Shaobo Wang , Hanlin Xu , Xuming Hu , and 1 more author
    arXiv preprint arXiv:2603.27375, 2026
  17. Towards Principled Dataset Distillation: A Spectral Distribution Perspective
    Ruixi Wu* , Shaobo Wang* , Jiahuan Chen , Zhiyuan Liu , Yicun Yang , Zhaorun Chen , Zekai Li , Kaixin Li , and 4 more authors
    arXiv preprint arXiv:2603.01698, 2026
  18. Credit Where It is Due: Cross-Modality Connectivity Drives Precise Reinforcement Learning for MLLM Reasoning
    Zhengbo Jiao* , Shaobo Wang* , Zifan Zhang , Wei Wang , Bing Zhao , Hu Wei , and Linfeng Zhang
    arXiv preprint arXiv:2602.11455, 2026
  19. UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective
    Furui Xu* , Shaobo Wang* , Jiajun Zhang , Chenghao Sun , Haixiang Tang , and Linfeng Zhang
    In Annual AAAI Conference on Artificial Intelligence , 2026
  20. ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation
    Yue Min* , Shaobo Wang* , Jiaze Li , Tianle Niu , Junxin Fan , Yongliang Miao , Lijin Yang , and Linfeng Zhang
    In Annual AAAI Conference on Artificial Intelligence , 2026

2025

  1. Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
    Zichen Wen , Shaobo Wang , Yufa Zhou , Junyuan Zhang , Qintong Zhang , Yifeng Gao , Zhaorun Chen , Bin Wang , and 3 more authors
    In Advances in Neural Information Processing Systems , 2025
  2. VideoCompressa: Data-Efficient Video Understanding via Joint Temporal Compression and Spatial Reconstruction
    Shaobo Wang* , Tianle Niu* , Runkang Yang , Deshan Liu , Xu He , Zichen Wen , Conghui He , Xuming Hu , and 1 more author
    arXiv preprint arXiv:2511.18831, 2025
  3. Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More
    Zichen Wen , Yifeng Gao , Shaobo Wang , Junyuan Zhang , Qintong Zhang , Weijia Li , Conghui He , and Linfeng Zhang
    In Conference on Empirical Methods in Natural Language Processing , 2025
  4. Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way
    Yicun Yang , Cong Wang , Shaobo Wang , Zichen Wen , Biqing Qi , Hanlin Xu , and Linfeng Zhang
    arXiv preprint arXiv:2510.24605, 2025
  5. SpeCa: Accelerating Diffusion Transformers with Speculative Feature Caching
    Jiacheng Liu , Chang Zou , Yuanhuiyi Lyu , Fei Ren , Shaobo Wang , Kaixin Li , and Linfeng Zhang
    In ACM Multimedia , 2025
  6. Compute Only 16 Tokens in One Timestep: Accelerating Diffusion Transformers with Cluster-Driven Feature Caching
    Zhixin Zheng , Xinyu Wang , Chang Zou , Shaobo Wang , and Linfeng Zhang
    In ACM Multimedia , 2025
  7. Socratic-Zero: Bootstrapping Reasoning via Data-Free Agent Co-evolution
    Shaobo Wang* , Zhengbo Jiao* , Zifan Zhang , Yilang Peng , Xu Ze , Boyu Yang , Wei Wang , Hu Wei , and 1 more author
    arXiv preprint arXiv:2509.24726, 2025
  8. Data Whisperer: Efficient Data Selection for Task-Specific LLM Fine-Tuning via Few-Shot In-Context Learning
    Shaobo Wang , Xiangqi Jin , Ziming Wang , Jize Wang , Jiajun Zhang , Kaixin Li , Zichen Wen , Zhong Li , and 3 more authors
    In Annual Meeting of the Association for Computational Linguistics , 2025
  9. Dataset Distillation with Neural Characteristic Function: A Minmax Perspective
    Shaobo Wang , Yicun Yang , Zhiyuan Liu , Chenghao Sun , Xuming Hu , Conghui He , and Linfeng Zhang
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2025
  10. Not All Samples Should Be Utilized Equally: Towards Understanding and Improving Dataset Distillation
    Shaobo Wang , Yantai Yang , Qilong Wang , Kaixin Li , Linfeng Zhang , and Junchi Yan
    Synthetic Data for Computer Vision Workshop at CVPR, 2025
  11. Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs
    Yufa Zhou* , Shaobo Wang* , Xingyu Dong* , Xiangqi Jin , Yifang Chen , Yue Min , Kexin Yang , Xingzhang Ren , and 2 more authors
    arXiv preprint arXiv:2506.00577, 2025
  12. Shifting AI Efficiency From Model-Centric to Data-Centric Compression
    Xuyang Liu* , Zichen Wen* , Shaobo Wang* , Junjie Chen , Zhishan Tao , Yubo Wang , Tailai Chen , Xiangqi Jin , and 9 more authors
    arXiv preprint arXiv:2505.19147, 2025
  13. KO: Kinetics-inspired Neural Optimizer with PDE Simulation Approaches
    Mingquan Feng , Yixin Huang , Yifan Fu , Shaobo Wang , and Junchi Yan
    arXiv preprint arXiv:2505.14777, 2025
  14. DD-Ranking: Rethinking the Evaluation of Dataset Distillation
    Zekai Li , Xinhao Zhong , Samir Khaki , Zhiyuan Liang , Yuhao Zhou , Mingjia Shi , Ziqiao Wang , Xuanlei Zhao , and 3 more authors
    arXiv preprint arXiv:2505.13300, 2025
  15. Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Transformers
    Shaobo Wang , Hongxuan Tang , Mingyang Wang , Hongrui Zhang , Xuyang Liu , Weiya Li , Xuming Hu , and Linfeng Zhang
    In International Conference on Learning Representations , 2025
  16. DRUPI: Dataset Reduction Using Privileged Information
    Shaobo Wang , Youxin Jiang , Tianle Niu , Yantai Yang , Ruiji Zhang , Shuhao Hu , Shuaiyu Zhang , Chenghao Sun , and 4 more authors
    The Future of Machine Learning Data Practices and Repositories Workshop at ICLR 2025, 2025

2024

  1. Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving (in CARLA-v2)
    Qifeng Li , Xiaosong Jia , Shaobo Wang , and Junchi Yan
    In European Conference on Computer Vision , 2024

2023

  1. Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework
    Shaobo Wang , Xiangdong Zhang , Dongrui Liu , and Junchi Yan
    arXiv preprint arXiv:2311.15993, 2023

2022

  1. Trap of Feature Diversity in the Learning of MLPs
    Dongrui Liu* , Shaobo Wang* , Jie Ren , Kangrui Wang , Sheng Yin , Huiqi Deng , and Quanshi Zhang
    arXiv preprint arXiv:2112.00980, 2022

2021

  1. Visualizing the Emergence of Intermediate Visual Patterns in DNNs
    Mingjie Li , Shaobo Wang , and Quanshi Zhang
    In Advances in Neural Information Processing Systems , 2021