Payam Parvizi

I am a PhD-trained Research Associate with 4+ years of experience developing and validating reinforcement learning methods for continuous control under real-world-motivated constraints. I am driven by a deep curiosity about how learning-based controllers can achieve the smooth, stable, and reliable behavior needed for safe deployment in physical systems.

I specialize in action/policy regularization and policy smoothing to reduce high-frequency oscillations and improve reliability, validated through the development of custom Gymnasium-compatible environments and simulation-to-real transfer. My work includes deployment on quadcopter platforms and applications in satellite-to-ground optical communications, conducted at the University of Ottawa under the supervision of Dr. Davide Spinello and Dr. Colin Bellinger, and in collaboration with Dr. Ross Cheriton at the National Research Council of Canada (NRC).

CV Google Scholar GitHub LinkedIn Email

Adaptive Policy Regularization for Smooth Control in Reinforcement Learning

This work introduces State-Adaptive Proportional Policy Smoothing (SAPPS), a lightweight policy regularization method that suppresses high-frequency action oscillations in continuous-control reinforcement learning while preserving responsiveness to rapid state changes.

The method is validated across MuJoCo benchmarks, an adaptive optics system, and sim-to-real quadcopter experiments.

PhD research conducted at the University of Ottawa in collaboration with the National Research Council of Canada (NRC).

Sim-to-Real Reinforcement Learning for Quadrotor Control

This work develops a sim-to-real reinforcement learning framework for quadrotor control using the Proximal Policy Optimization (PPO). A training is designed for both simulation and real-world deployment on the Crazyflie 2.1 nano-quadrotor, enabling stable hovering control.

[Code]

Wavefront Sensorless Adaptive Optics (WSL-AO) with Reinforcement Learning

Reinforcement Learning for Wavefront Sensorless Adaptive Optics

This work develops a simulated wavefront sensorless adaptive optics (AO) reinforcement learning (RL) environment for training and evaluating RL algorithms. It is the first AO–RL environment implemented using the OpenAI Gymnasium framework, enabling reproducible benchmarking and analysis. The work also demonstrates, for the first time, the potential of reinforcement learning for wavefront sensorless AO in satellite communication downlinks.

PhD research conducted in collaboration with the National Research Council of Canada (NRC).

Multi-Robot Area Coverage Control with Repulsive Density

Multi-Robot Area Coverage Control with Repulsive Risk Density

This work formulates an area coverage control problem for multi-agent systems operating under partial environmental information. Mobile targets are modeled as risk sources, and the risk density is incorporated directly into the agent dynamics as a repulsive term, enabling online motion planning in service robotics applications.

PhD Research in Multi-Robot Area Coverage, supervised by Dr. Davide Spinello at University of Ottawa.

Robotic Deburring via Learning from Demonstration

This work presents a learning-from-demonstration approach for robotic deburring of workpieces with unknown shapes. Task-relevant motions of a human expert were recorded using a 6-DOF haptic device and encoded using augmented Dynamic Movement Primitives (DMPs) to generate adaptable deburring trajectories.

M.Sc. research, part of the “Development of a High-Precision Hybrid Robotic Deburring System” project funded by TÜBİTAK.

Adaptive Policy Regularization for Smooth Control in Reinforcement Learning

Payam Parvizi, Abhishek Naik, Colin Bellinger, Ross Cheriton, Davide Spinello
Authorea, 2026. Submitted to IEEE Transactions on Automation Science and Engineering
Paper • Code

PhD Thesis: Adaptive Policy Smoothing in Reinforcement Learning

Adaptive Policy Smoothing in Reinforcement Learning: Applications to Wavefront Sensorless Adaptive Optics and Robotics

Payam Parvizi
Ph.D. Thesis, University of Ottawa, 2025
Ph.D. Thesis

Action-Regularized Reinforcement Learning for Adaptive Optics in Optical Satellite Communication

Payam Parvizi, Colin Bellinger, Ross Cheriton, Abhishek Naik, Davide Spinello
Optica Open, 2025. In final production at Journal of the Optical Society of America B (Optica)
Paper • Code

RL Environment for Wavefront Sensorless Adaptive Optics

Reinforcement Learning Environment for Wavefront Sensorless Adaptive Optics in Single-Mode Fiber-Coupled Optical Satellite Communications Downlinks

Payam Parvizi, Runnan Zou, Colin Bellinger, Ross Cheriton, Davide Spinello
Photonics, vol. 10, no. 12, 2023 (MDPI)
Paper • Code

Wavefront Sensorless Adaptive Optics with Reinforcement Learning

Poster Presentation — Photonics North (Ottawa, Canada)

Seminar Talk — MCG Seminars, Department of Mechanical Engineering, University of Ottawa

2023

Poster Presentation — Technology for the Digital Transformation of Society, University of Ottawa

Seminar Talk — AI for Design Seminar, National Research Council of Canada (NRC)

Poster Presentation — Photonics North (Montreal, Canada)

Work Experience

Research Associate, University of Ottawa (Nov. 2025 – Present)
Research Assistant, University of Ottawa & NRC (Apr. 2022 – Sept. 2025)
Teaching Assistant, University of Ottawa (Jan. 2019 – Apr. 2022)
Project Assistant, Middle East Technical University & TÜBİTAK (Nov. 2015 – Nov. 2017)
R&D Intern, Mercedes-Benz Türk Inc. (Jun. 2013 - Aug. 2013)
Maintenance Engineering Intern, Turkish Airlines Technic (Jan. 2012 - Mar. 2012)

Education

Ph.D. in Mechanical Engineering, University of Ottawa (2025)
M.Sc. in Mechanical Engineering, Middle East Technical University (2018)
B.Sc. in Aerospace Engineering, Middle East Technical University (2014)

Teaching Experience

MCG 3307: Control Systems (Winter 2019, 2020, 2021 & 2022)
MCG 3306: System Dynamics (Fall 2020 & 2021)
MCG 3305: Biomedical System Dynamics (Fall 2019)

Certifications

MITx MicroMasters — Statistics and Data Science (2019 - 2020)
- Machine Learning with Python
- Fundamentals of Statistics
- Probability

Payam Parvizi

Adaptive Policy Regularization for Smooth Control in Reinforcement Learning

Sim-to-Real Reinforcement Learning for Quadrotor Control

Reinforcement Learning for Wavefront Sensorless Adaptive Optics

Multi-Robot Area Coverage Control with Repulsive Risk Density

Robotic Deburring via Learning from Demonstration

Adaptive Policy Regularization for Smooth Control in Reinforcement Learning

Adaptive Policy Smoothing in Reinforcement Learning: Applications to Wavefront Sensorless Adaptive Optics and Robotics

Action-Regularized Reinforcement Learning for Adaptive Optics in Optical Satellite Communication

Reinforcement Learning Environment for Wavefront Sensorless Adaptive Optics in Single-Mode Fiber-Coupled Optical Satellite Communications Downlinks

Wavefront Sensorless Adaptive Optics for Free-Space Satellite-to-Ground Communication Using Reinforcement Learning

Demonstration of Robotic Deburring Process Motor Skills from Motion Primitives of Human Skill Model

Dynamic Movement Primitives and Force Feedback: Teleoperation in Precision Grinding Process

Parametrization of Robotic Deburring Process with Motor Skills from Motion Primitives of Human Skill Model

Work Experience

Education

Teaching Experience

Certifications