Mao Lin (林茂)

ML Systems · AI Accelerators · Performance Analysis · Infrastructure

Researching ML/LLM systems, heterogeneous computer architectures, high-performance computing, and program analysis and optimization.

Location Merced, CA, USA
Institution UC Merced
Email [email protected]

About Me

I am currently a graduate student at UC Merced, working with Prof. Hyeran Jeon. Prior to this, I received my M.S. in Software Engineering and B.E. in Computer Science from Shandong University, where I worked with Prof. Lei Ju.

My research interests include ML/LLM systems, heterogeneous computer architectures and systems, high-performance and parallel computing (CUDA), as well as static and dynamic program analysis and optimization.

Research Areas

GPU Architectures and Systems LLM Training/Inference Systems Machine Learning Systems Static and Dynamic Performance Analysis

Experience

Samsung - Research Intern

05/2026 - 08/2026

San Jose, CA, USA

Working on hardware/software co-design for MoE models on Samsung's AI accelerator.

ByteDance - Research Intern

05/2023 - 11/2023

Seattle/San Jose, WA/CA, USA

Optimized PyTorch memory management for distributed LLM training, reducing memory usage by 10% to 30% on models including GPT-2 and Whisper.

Uber - Software Engineer Intern

11/2022 - 02/2023

Sunnyvale, CA, USA

Analyzed production Go services and fixed more than 50 data race issues.

PNNL - Research Intern

06/2022 - 08/2022

Richland, WA, USA

Built GPU profiling and floating-point analysis tooling that found critical overflow issues in DOE applications.

Open Source Software

AccelProf

A profiling and analysis framework for various accelerator applications.

DrGPUM

Tooling for guiding memory optimization in GPU-accelerated applications.

Selected Publications

GPU Memory Profiling

PASTA: A Modular Program Analysis Tool Framework for Accelerators

Mao Lin, Hyeran Jeon, and Keren Zhou

The 23rd ACM/IEEE International Symposium on Code Generation and Optimization (CGO '26)

Understanding Oversubscribed Memory Management for Deep Learning Training

Mao Lin and Hyeran Jeon

The 5th Workshop on Machine Learning and Systems (EuroMLSys '25)

DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications

Mao Lin, Keren Zhou, and Pengfei Su

The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '23)

GPU Memory Optimization

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

Mao Lin, Xi Wang, Guilherme Cox, Dong Li, and Hyeran Jeon

(arXiv 2026)

Forest: Access-aware GPU UVM Management

Mao Lin, Yuan Feng, Guilherme Cox, and Hyeran Jeon

The 52nd Annual International Symposium on Computer Architecture (ISCA '25)

Poster: Squeezing GPU Memory Usage in PyTorch

Mao Lin, Keren Zhou, and Pengfei Su

(PyTorch Conference '22)

Technical Skills

Programming Languages

C/C++ Python CUDA Go Java Shell

Platforms & Systems

Linux/Windows/MacOS CPU-GPU HMPSoCs CPU-FPGA HMPSoCs

Frameworks & Libraries

vLLM PyTorch TensorFlow

Development Tools

Nsight Systems Nsight Compute Linux perf GDB Git CMake Xilinx Vivado Suite

Get in Touch

Location

Merced, CA, USA

Institution

UC Merced