Memory subsystem optimization for CPU-FPGA HMPSoCs
08/2019 - 09/2020
Shared memory resource contentions between parallel running CPU and FPGA kernels on commercial off-the-shelf (COTS) CPU-FPGA heterogeneous multiprocessor system-on-chip (HMPSoC) incur overall system performance degradation. We show that the contentions happen at both shared last-level cache (LLC) and the main memory levels, where a cross-layer memory management scheme is indispensable in order to achieve optimal overall system performance. The proposed framework incorporates data placement as well as cache partition strategies in order to mitigate the memory resource contentions at both LLC and main memory levels. Experimental results on COTS Xilinx ZYNQ7020 HMPSoC show that the proposed memory management framework leads 33.55% overall system performance improvement compared with the state-of-the-art FPGA-only data allocation strategies.
UAV object detection system
08/2018 - 09/2019
Real-time detection of tiny objects on NVIDIA JETSON TX2 is used to autonomously control drone flight. The adopted YOLOv2 neural network is modified to obtain better performance in tiny object detection, which includes cutting layers and fine-tuning layers. Optimization strategies including multi-threading, pipelining and transferring float(fp32) precision operation to half(fp16) precision operation is utilized for seeking high inference speed. The system runs on the Robot Operating System (ROS) for facilitating communication between various modules within the system. The system achieves over 90% accuracy and 90% recall and reaches 30Hz for 1080 video input. This project is cooperated with commercial UAV manufacturer, and the detection system has been integrated into actual aircraft. (As the project leader, I went to Changsha(China) to cooperate with commercial manufacturer project participants to integrate the system into the actual aircraft.)