Profile picture

Experience

Microsoft

AB Test Software Engineer Sep 2025 - Present
Technologies: C#, AB Test, Big Data, Azure
  • Work on ExP team.
  • Design, implement, and maintain Microsoft’s AB Test platform that powers large-scale experimentation and analysis; support Bing, Office, Xbox, Microsoft AI, and other teams in experiment design, data collection, and result analysis to improve performance and experience.
Distributed Software Engineer Apr 2022 - Aug 2025
Technologies: C++, C#, Distributed System, HPC, Azure
  • Work on Azure Batch team.
  • Drive the design and implementation of Azure Batch’s task scheduler (Batch Scheduler), supporting the scheduling of 100M+ tasks daily; maintain and optimize system performance and reliability to keep it running efficiently at large scale.

Huawei

Virtualization Software Engineer Jul 2018 - Apr 2022
Technologies: C, Linux, KVM, QEMU, Libvirt
  • Work on Virtualization team of 2012 Laboratory.
  • I was responsible for feature developing, problem solving, and performance tuning of hypervisor based on KVM and QEMU.

Projects

Batch Scheduler vNext

Microsoft - 2022-2025
  • Batch Scheduler vNext is the next generation of the Batch Scheduler component inside Azure Batch, built to raise the system’s stability, performance, and maintainability while meeting growing customer demand. The project rewrites Batch Scheduler, optimizes the scheduling algorithms, and introduces fresh capabilities for large-scale task orchestration.
  • Designed and implemented the Policy Engine (the scheduling algorithm framework) to allow multiple policies to be configured and extended flexibly.
  • Created CI/CD pipelines that preserve integration quality and delivery consistency for Batch Scheduler vNext.
  • Built benchmark suites and integrated them into the CI process to continuously monitor and tune scheduler performance.

Memory leak fixing for Batch Scheduler

Microsoft - 2025
  • Batch Scheduler is a critical Azure Batch component that schedules massive compute workloads; memory leaks caused by historical baggage and recently added features forced the service to restart every few hours, harming reliability.
  • Ported a usable version of LeakSanitizer (LSAN) to Windows, used LSAN to pinpoint leaks, and fixed several of them, significantly improving system stability and performance.
Profile picture

Projects (Cont.)

QEMU hot-replace

Huawei - 2021
  • QEMU hot-replace is the ability to replace the QEMU binary without stopping customer virtual machines, enabling online patches.
  • Implemented the overall framework based on fork+exec, used PIPE for inter-process communication, and handled device state serialization/deserialization plus lifecycle transitions while two processes performed the hot switch.

Support enable/disable CPU features for AArch64

Huawei - 2020
  • The feature adds support to enable/disable CPU features for guests so that the cloud provides can expose a uniform set of features for a group of guests on systems with different ARM CPUs.
  • I was responsible for developing code to allow userspace to modify a value of ID registers in KVM. I was also responsible for configuring ID registers and writing them to KVM.

A-Tune

Huawei - 2019-2020
  • A-Tune is an OS tuning engine powered by AI. A-Tune uses AI technologies to enable the OS to understand services, simplify IT system tuning, and maximize application performance.
  • I was responsible for tuning performance of Hadoop, Spark, SPECJbb2015, and Dubbo so that the performance of those were improved by 30%, 30%, 40%, and 10%. I was also responsible for refactoring of the module of data collecting and parsing to improve the scalability and configurability.

Support DBM of stage-2 for AArch64

Huawei - 2018-2019
  • Dirty page logging is important for pre-copy model migration. AArch64 only support software way currently by marking all pages write-protected, which is inefficient because writing to memory in guest will trap to KVM.
  • I was responsible for developing hardware dirty page logging via DBM bit of stage-2 pgtable. Comparing to software dirty page logging, the number of traps decreases 30%~80% during migration when using hardware dirty page logging.