Back to Home

Mao Lin (林茂)

Graduate Student at UC Merced

Researching ML/LLM systems, heterogeneous computer architectures, high-performance computing, and program analysis and optimization.

Education

University of California, Merced
Ph.D. in Electrical Engineering and Computer Science · Merced, CA, USA
Shandong University
Master of Software Engineering · Jinan, China
Shandong University
Bachelor of Computer Science · Jinan, China

Research Areas

GPU Architectures and Systems LLM Training/Inference Systems Machine Learning Systems Static and Dynamic Performance Analysis

Experience

Samsung — Research Intern
San Jose, CA, USA

Working on hardware/software co-design for MoE models on Samsung's AI accelerator.

ByteDance — Research Intern
Seattle/San Jose, WA/CA, USA

Optimized PyTorch memory management for distributed LLM training, reducing memory usage by 10% to 30% on models including GPT-2 and Whisper.

Uber — Software Engineer Intern
Sunnyvale, CA, USA

Analyzed production Go services and fixed more than 50 data race issues.

PNNL — Research Intern
Richland, WA, USA

Built GPU profiling and floating-point analysis tooling that found critical overflow issues in DOE applications.

Open Source Software

AccelProf

A profiling and analysis framework for various accelerator applications.

DrGPUM

Tooling for guiding memory optimization in GPU-accelerated applications.

Publications

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

Mao Lin, Xi Wang, Guilherme Cox, Dong Li, and Hyeran Jeon

arXiv preprint, April 2026 (arXiv 2026)

PASTA: A Modular Program Analysis Tool Framework for Accelerators

Mao Lin, Hyeran Jeon, and Keren Zhou

The 23rd ACM/IEEE International Symposium on Code Generation and Optimization, Jan 31–Feb 4, 2026, Sydney, Australia (CGO '26)

Forest: Access-aware GPU UVM Management

Mao Lin, Yuan Feng, Guilherme Cox, and Hyeran Jeon

The 52nd Annual International Symposium on Computer Architecture, Jun 21–25, 2025, Tokyo, Japan (ISCA '25)

Understanding Oversubscribed Memory Management for Deep Learning Training

Mao Lin and Hyeran Jeon

The 5th Workshop on Machine Learning and Systems, Mar 30–Apr 3, 2025, Rotterdam, Netherlands (EuroMLSys '25)

DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications

Mao Lin, Keren Zhou, and Pengfei Su

The 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Mar 25–29, 2023, Vancouver, BC, Canada (ASPLOS '23)

A Comprehensive Memory Management Framework for CPU-FPGA Heterogenous SoCs

Zelin Du, Qianling Zhang, Mao Lin, Shiqing Li, Xin Li, and Lei Ju

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022 (TCAD '22)

Poster: Squeezing GPU Memory Usage in PyTorch

Mao Lin, Keren Zhou, and Pengfei Su

PyTorch Conference '22, Dec 2022, New Orleans, LA, USA (PyTorch Conference '22)

Talks & Presentations

Forest: Access-aware GPU UVM Management
ISCA '25 · Tokyo, Japan
Understanding Oversubscribed Memory Management for Deep Learning Training
EuroMLSys '25 · Rotterdam, Netherlands
DrGPUM: Guiding Memory Optimization for GPU-accelerated Applications
ASPLOS '23 · Remote
Squeezing GPU Memory Usage in PyTorch
Poster · PyTorch Conference '22 · New Orleans, LA, USA

Professional Services

Artifact Evaluation Committee

PPoPP '23 ASPLOS '24 ISCA '25 SOSP '25 IISWC '25 EuroSys '26 ISPASS '26 MLSys '26 ISCA '26

Reviewer

GPGPU '25 ICCAD '26

Teaching Experience

Computer Architecture and Design (EECS 240) 2025 Fall
Guest Lecturer
Computer Architecture (CSE 140) 2024 Spring, 2025 Spring
Teaching Assistant
Intro to Programming Laboratory Skills/Techniques (CSE 022) 2023 Spring
Teaching Assistant
Data Structure (CSE 030) 2022 Spring
Teaching Assistant
Advanced Programming (CSE 024) 2021 Fall, 2024 Fall
Teaching Assistant
Intro to Object Orient Program (CSE 165) 2021 Fall
Teaching Assistant

Technical Skills

Programming Languages

C/C++ Python CUDA Go Java Shell

Platforms & Systems

Linux/Windows/MacOS CPU-GPU HMPSoCs CPU-FPGA HMPSoCs

Frameworks & Libraries

vLLM PyTorch TensorFlow

Development Tools

Nsight Systems Nsight Compute Linux perf GDB Git CMake Xilinx Vivado Suite