Junhao Hu

Published Conference/Journal Papers:

[FAST 2026] CacheSlide: Unlocking Cross Position-Aware KV Cache Reuse for Accelerating LLM Serving (CCF-A)
Yang Liu, Yunfei Gu, Liqiang Zhang, Chentao Wu, Guangtao Xue, Jie Li, Minyi Guo, Junhao Hu, Jie Meng
In: Proceedings of the 24th USENIX Conference on File and Storage Technologies, 2026
[paper] [code]
[ACL 2025] RaaS: Reasoning-Aware Attention Sparsity for Efficient Long-Decoding Inference (CCF-A)
Junhao Hu, Wenrui Huang, Weidong Wang, Zhenwen Li, Tiancheng Hu, Zhixia Liu, Xusheng Chen, Tao Xie, Yizhou Shan
In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
[paper] [code]
[ATC 2025] DeepServe: Serverless Large Language Model Serving at Scales (CCF-A)
Junhao Hu, Jiang Xu, Zhixia Liu, Yulong He, Yuetao Chen, Hao Xu, Jiang Liu, Jie Meng, Baoquan Zhang, Shining Wan, Gengyuan Dan, Zhiyu Dong, Zhihao Ren, Changhong Liu, Tao Xie, Dayun Lin, Qin Zhang, Yue Yu, Hao Feng, Xusheng Chen, Yizhou Shan
In: Proceedings of the 2025 USENIX Annual Technical Conference, 2025
[paper] [code]
[ICML 2025] EPIC: Efficient Position-Independent Caching for Serving Large Language Models (CCF-A)
Junhao Hu, Wenrui Huang, Weidong Wang, Haoyi Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie
In: Proceedings of the 42nd International Conference on Machine Learning, 2025
[paper] [code]
[TSE 2025] Directional Diffusion-Style Code Editing Pre-training (CCF-A)
Qingyuan Liang, Zeyu Sun, Qihao Zhu, Junhao Hu, Yifan Zhao, Yizhou Chen, Mingxuan Zhu, Guoqing Wang, Lu Zhang
In: Transactions of Software Engineering, 2025
[paper] [code]
[SCIS 2025] Cupcleaner: A data cleaning approach for comment updating (CCF-A)
Qingyuan Liang, Zeyu Sun, Qihao Zhu, Junhao Hu, Yifan Zhao, Lu Zhang
In: Transactions of SCIENCE CHINA Information Sciences, 2025
[paper] [code]
[ASE 2023] Predicting Compilation Resources for Adaptive Build in an Industrial Setting (CCF-A)
Junhao Hu, Chaozheng Wang, Hailiang Huang, Huang Luo, Yu Jin, Yuetang Deng, Tao Xie
In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, pages 1808-1813, 2023.
[paper] [code]
[ESEC/FSE 2023] How Practitioners Expect Code Completion? (CCF-A)
Chaozheng Wang, Junhao Hu, Cuiyun Gao, Yu Jin, Tao Xie, Hailiang Huang, Zhenyu Lei, Yuetang Deng
In: Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1294-1306, 2023.
[paper] [code]

Technical Reports:

[preprint] MiMo-V2-Flash Technical Report
Junhao Hu (Core contributor in author list), LLM-Core @ Xiaomi
In: GitHub
[paper] [code]
[preprint] MiMo-Audio: Audio Language Models are Few-Shot Learners
Junhao Hu (Contributor in author list), LLM-Core @ Xiaomi
In: GitHub
[paper] [code]
[preprint] xDeepServe: Model-as-a-Service on Huawei CloudMatrix384
Junhao Hu (Core contributor in author list), xDeepServe Team @ Huawei
In: arXiv preprint arXiv:2508.02520
[paper] [code]

Unpublished High-Impact Papers:

[preprint] Lil: Less is Less When Applying Post-Training Sparse-Attention Algorithms in Long-Decode Stage
Junhao Hu, Fangze Li, Mingtao Xu, Feifan Meng, Shiju Zhao, Tiancheng Hu, Ting Peng, Anmin Liu, Wenrui Huang, Chenxu Liu, Ziyue Hua, Tao Xie
In: arXiv preprint arXiv:2601.03043
[paper] [code]
[preprint] MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan
In: arXiv preprint arXiv:2406.17565
[paper] [code]
[preprint] MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving
Shiju Zhao, Junhao Hu, Rongxiao Huang, Jiaqi Zheng, Guihai Chen
In: arXiv preprint arXiv:2502.01960
[paper] [code]
[preprint] LLMigrate: Transforming “Lazy” Large Language Models into Efficient Source Code Migrators
Yuchen Liu, Junhao Hu, Yingdi Shan, Ge Li, Yanzhen Zou, Yihong Dong, Tao Xie
In: arXiv preprint arXiv:2503.23791
[paper] [code]