Mao Lin (林茂)
ML Systems · AI Accelerators · Performance Analysis · Infrastructure
Researching ML/LLM systems, heterogeneous computer architectures, high-performance computing, and program analysis and optimization.
About Me
I am currently a graduate student at UC Merced, working with Prof. Hyeran Jeon. Prior to this, I received my M.S. in Software Engineering and B.E. in Computer Science from Shandong University, where I worked with Prof. Lei Ju.
My research interests include ML/LLM systems, heterogeneous computer architectures and systems, high-performance and parallel computing (CUDA), as well as static and dynamic program analysis and optimization.
Research Areas
Experience
Samsung - Research Intern
05/2026 - 08/2026San Jose, CA, USA
Working on hardware/software co-design for MoE models on Samsung's AI accelerator.
ByteDance - Research Intern
05/2023 - 11/2023Seattle/San Jose, WA/CA, USA
Optimized PyTorch memory management for distributed LLM training, reducing memory usage by 10% to 30% on models including GPT-2 and Whisper.
Uber - Software Engineer Intern
11/2022 - 02/2023Sunnyvale, CA, USA
Analyzed production Go services and fixed more than 50 data race issues.
PNNL - Research Intern
06/2022 - 08/2022Richland, WA, USA
Built GPU profiling and floating-point analysis tooling that found critical overflow issues in DOE applications.
Selected Publications
GPU Memory Profiling
PASTA: A Modular Program Analysis Tool Framework for Accelerators
The 23rd ACM/IEEE International Symposium on Code Generation and Optimization (CGO '26)
Understanding Oversubscribed Memory Management for Deep Learning Training
The 5th Workshop on Machine Learning and Systems (EuroMLSys '25)
DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications
The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '23)
GPU Memory Optimization
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
(arXiv 2026)
Forest: Access-aware GPU UVM Management
The 52nd Annual International Symposium on Computer Architecture (ISCA '25)
Poster: Squeezing GPU Memory Usage in PyTorch
(PyTorch Conference '22)