[ACL 2025] RaaS: Reasoning-Aware Attention Sparsity for Efficient Long-Decoding Inference (CCF-A)
Junhao Hu, Wenrui Huang, Weidong Wang, Zhenwen Li, Tiancheng Hu, Zhixia Liu, Xusheng Chen, Tao Xie, Yizhou Shan
In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, 2025
[paper] [code][ATC 2025] DeepServe: Serverless Large Language Model Serving at Scales (CCF-A)
Junhao Hu, Jiang Xu, Zhixia Liu, Yulong He, Yuetao Chen, Hao Xu, Jiang Liu, Jie Meng, Baoquan Zhang, Shining Wan, Gengyuan Dan, Zhiyu Dong, Zhihao Ren, Changhong Liu, Tao Xie, Dayun Lin, Qin Zhang, Yue Yu, Hao Feng, Xusheng Chen, Yizhou Shan
In: Proceedings of the 2025 USENIX Annual Technical Conference, 2025
[paper] [code][ICML 2025] EPIC: Efficient Position-Independent Caching for Serving Large Language Models (CCF-A)
Junhao Hu, Wenrui Huang, Weidong Wang, Haoyi Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie
In: Proceedings of the 42nd International Conference on Machine Learning, 2025
[paper] [code][TSE 2025] Directional Diffusion-Style Code Editing Pre-training (CCF-A)
Qingyuan Liang, Zeyu Sun, Qihao Zhu, Junhao Hu, Yifan Zhao, Yizhou Chen, Mingxuan Zhu, Guoqing Wang, Lu Zhang
In: Transactions of Software Engineering, 2025
[paper] [code][SCIS 2025] Cupcleaner: A data cleaning approach for comment updating (CCF-A)
Qingyuan Liang, Zeyu Sun, Qihao Zhu, Junhao Hu, Yifan Zhao, Lu Zhang
In: Transactions of SCIENCE CHINA Information Sciences, 2025
[paper] [code][ASE 2023] Predicting Compilation Resources for Adaptive Build in an Industrial Setting (CCF-A)
Junhao Hu, Chaozheng Wang, Hailiang Huang, Huang Luo, Yu Jin, Yuetang Deng, Tao Xie
In: Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, pages 1808-1813, 2023.
[paper] [code][ESEC/FSE 2023] How Practitioners Expect Code Completion? (CCF-A)
Chaozheng Wang, Junhao Hu, Cuiyun Gao, Yu Jin, Tao Xie, Hailiang Huang, Zhenyu Lei, Yuetang Deng
In: Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 1294-1306, 2023.
[paper] [code]
[preprint] xDeepServe: Model-as-a-Service on Huawei CloudMatrix384
xDeepServe Team @ Huawei
In: arXiv preprint arXiv:2508.02520
[paper] [code][preprint] MiMo-Audio: Audio Language Models are Few-Shot Learners
LLM-Core @ Xiaomi
In: GitHub
[paper] [code][preprint] MPIC: Position-Independent Multimodal Context Caching System for Efficient MLLM Serving
Shiju Zhao, Junhao Hu, Rongxiao Huang, Jiaqi Zheng, Guihai Chen
In: arXiv preprint arXiv:2502.01960
[paper] [code][preprint] LLMigrate: Transforming “Lazy” Large Language Models into Efficient Source Code Migrators
Yuchen Liu, Junhao Hu, Yingdi Shan, Ge Li, Yanzhen Zou, Yihong Dong, Tao Xie
In: arXiv preprint arXiv:2503.23791
[paper] [code][preprint] MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool
Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan
In: arXiv preprint arXiv:2406.17565
[paper] [code]