All Publications
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
arXiv preprint, April 2026 (arXiv 2026)
PASTA: A Modular Program Analysis Tool Framework for Accelerators
The 23rd ACM/IEEE International Symposium on Code Generation and Optimization, January 31–February 4, 2026, Sydney, Australia (CGO '26)
Forest: Access-aware GPU UVM Management
The 52nd Annual International Symposium on Computer Architecture, June 21–25, 2025, Tokyo, Japan (ISCA '25)
Understanding Oversubscribed Memory Management for Deep Learning Training
The 5th Workshop on Machine Learning and Systems, March 30–April 3, 2025, Rotterdam, Netherlands (EuroMLSys '25)
DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications
The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 25-29, 2023, Vancouver, BC, Canada (ASPLOS '23)
A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022 (TCAD '22)
Poster: Squeezing GPU Memory Usage in PyTorch
Dec. 2022, New Orleans, LA, USA (PyTorch Conference '22)