Mao
Graduate Student at UC Merced
Researching ML/LLM systems, heterogeneous computer architectures, high-performance computing, and program analysis and optimization.
About Me
I am currently a graduate student at UC Merced, where I am fortunate to work with Prof. Hyeran Jeon. Prior to this, I received my M.S. in Software Engineering and my B.E. in Computer Science from Shandong University, where I had the privilege to work with Prof. Lei Ju.
My research interests include ML/LLM systems, heterogeneous computer architectures and systems, high-performance and parallel computing (CUDA), as well as static and dynamic program analysis and optimization.
Research Areas
Experience
ByteDance - Research Intern
05/2023 - 11/2023Seattle/San Jose, CA/WA, USA
Analyzing and optimizing GPU memory wastage and fragmentation in large language model (LLM) training.
Uber - Part-time Software Engineer
11/2022 - 02/2023Sunnyvale, CA, USA
Detecting and fixing data races in Uber's Golang code base.
PNNL - Research Intern
06/2022 - 08/2022Richland, WA, USA
Detecting floating-point data overflow of GPU-accelerated applications.
Research Highlights
GPU Memory Management
Developing novel techniques for device memory and unified virtual memory management in GPU systems.
GPU Performance Analysis
Creating tools and frameworks for analyzing and optimizing GPU-accelerated applications, with a focus on memory efficiency.
Deep Learning Systems
Tackling memory management challenges in deep learning, especially for large language model (LLM) training and inference.
Publications
Forest: Access-aware GPU UVM Management
The 52nd Annual International Symposium on Computer Architecture, June 21β25, 2025, Tokyo, Japan
Understanding Oversubscribed Memory Management for Deep Learning Training
The 5th Workshop on Machine Learning and Systems, March 30βApril 3, 2025, Rotterdam, Netherlands
DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications
The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 25-29, 2023, Vancouver, BC, Canada
Poster: Squeezing GPU Memory Usage in PyTorch
Dec. 2022, New Orleans, LA, USA
A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2022)
Talks & Presentations
ISCA '25 Presentation
Present "Forest: Access-aware GPU UVM Management"
EuroMLSys '25 Presentation
Present "Understanding Oversubscribed Memory Management for Deep Learning Training"
ASPLOS '23 Conference
Present "DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications"
PyTorch Conference '22
Present Poster "Squeezing GPU Memory Usage in PyTorch"