Xiaoyi (Jeremy) Cai
email

| Google Scholar | LinkedIn | CV |

I am a Ph.D. student at MIT advised by Prof. Jonathan How. Previously, I received B.S./M.S. in Electrical and Computer Engineering from Georgia Tech, advised by Prof. Magnus Egerstedt (now at UC Irvine).

My research lies in the intersection of Machine Learning and Robotics, with the goal of enabling reliable autonomous navigation in the real world. To this end, my work aims at modeling uncertainty and planning navigation behaviors to mitigate the risk of failures and reduce uncertainty. My work has been applied to off-road navigation and coordination of heterogeneous robot teams.

I spent a summer at the AI Institute working with Prof. Bernadette Bucher, Stephen Phillips, and Jiuguang Wang, where we enabled a Spot robot to navigate in the wild with a learned traversability model based on elevation and semantics.


  Publications
  Preprint
image not found

PIETRA: Physics-Informed Evidential Learning for Traversing Out-of-Distribution Terrain
Xiaoyi Cai, James Queeney, Tong Xu, Aniket Datar, Chenhui Pan, Max Miller, Ashton Flather, Philip R. Osteen, Nicholas Roy, Xuesu Xiao, Jonathan P. How
arXiv

paper | abstract | video

Self-supervised learning is a powerful approach for developing traversability models for off-road navigation, but these models often struggle with inputs unseen during training. Existing methods utilize techniques like evidential deep learning to quantify model uncertainty, helping to identify and avoid out-of-distribution terrain. However, always avoiding out-of-distribution terrain can be overly conservative, e.g., when novel terrain can be effectively analyzed using a physics-based model. To overcome this challenge, we introduce Physics-Informed Evidential Traversability (PIETRA), a self-supervised learning framework that integrates physics priors directly into the mathematical formulation of evidential neural networks and introduces physics knowledge implicitly through an uncertainty-aware, physics-informed training loss. Our evidential network seamlessly transitions between learned and physics-based predictions for out-of-distribution inputs. Additionally, the physics-informed loss regularizes the learned model, ensuring better alignment with the physics model. Extensive simulations and hardware experiments demonstrate that PIETRA improves both learning accuracy and navigation performance in environments with significant distribution shifts.

image not found

CGD: Constraint-Guided Diffusion Policies for UAV Trajectory Planning
Kota Kondo, Andrea Tagliabue, Xiaoyi Cai, Claudius Tewari, Olivia Garcia, Marcos Espitia-Alvarez, Jonathan P. How
arXiv

paper | abstract

Traditional optimization-based planners, while effective, suffer from high computational costs, resulting in slow trajectory generation. A successful strategy to reduce computation time involves using Imitation Learning (IL) to develop fast neural network (NN) policies from those planners, which are treated as expert demonstrators. Although the resulting NN policies are effective at quickly generating trajectories similar to those from the expert, (1) their output does not explicitly account for dynamic feasibility, and (2) the policies do not accommodate changes in the constraints different from those used during training. To overcome these limitations, we propose Constraint-Guided Diffusion (CGD), a novel IL-based approach to trajectory planning. CGD leverages a hybrid learning/online optimization scheme that combines diffusion policies with a surrogate efficient optimization problem, enabling the generation of collision-free, dynamically feasible trajectories. The key ideas of CGD include dividing the original challenging optimization problem solved by the expert into two more manageable sub-problems: (a) efficiently finding collision-free paths, and (b) determining a dynamically-feasible time-parametrization for those paths to obtain a trajectory. Compared to conventional neural network architectures, we demonstrate through numerical evaluations significant improvements in performance and dynamic feasibility under scenarios with new constraints never encountered during training.

  Journal
image not found

EVORA: Deep Evidential Traversability Learning for Risk-Aware Off-Road Autonomy
Xiaoyi Cai, Siddharth Ancha, Lakshay Sharma, Philip R. Osteen, Bernadette Bucher, Stephen Phillips, Jiuguang Wang, Michael Everett, Nicholas Roy, Jonathan P. How
T-RO 2024

webpage | paper | abstract | video

Traversing terrain with good traction is crucial for achieving fast off-road navigation. Instead of manually designing costs based on terrain features, existing methods learn terrain properties directly from data via self-supervision to automatically penalize trajectories moving through undesirable terrain, but challenges remain in properly quantifying and mitigating the risk due to uncertainty in the learned models. To this end, we present evidential off-road autonomy (EVORA), a unified framework to learn uncertainty-aware traction model and plan risk-aware trajectories. For uncertainty quantification, we efficiently model both aleatoric and epistemic uncertainty by learning discrete traction distributions and probability densities of the traction predictor’s latent features. Leveraging evidential deep learning, we parameterize Dirichlet distributions with the network outputs and propose a novel uncertainty-aware squared Earth Mover’s Distance loss with a closed-form expression that improves learning accuracy and navigation performance. For risk-aware navigation, the proposed planner simulates state trajectories with the worst-case expected traction to handle aleatoric uncertainty and penalizes trajectories moving through terrain with high epistemic uncertainty. Our approach is extensively validated in simulation and on wheeled and quadruped robots, showing improved navigation performance compared to methods that assume no slip, assume the expected traction, or optimize for the worst-case expected cost.

Energy-Aware, Collision-Free Information Gathering for Heterogeneous Robot Teams
Xiaoyi Cai, Brent Schlotfeldt, Kasra Khosoussi, Nikolay Atanasov, George J. Pappas, Jonathan P. How
T-RO 2023

paper | abstract | video

This paper considers the problem of safely coordinating a team of sensor-equipped robots to reduce uncertainty about a dynamical process, where the objective trades off information gain and energy cost. Optimizing this trade-off is desirable, but leads to a non-monotone objective function in the set of robot trajectories. Therefore, common multi-robot planners based on coordinate descent lose their performance guarantees. Furthermore, methods that handle non-monotonicity lose their performance guarantees when subject to inter-robot collision avoidance constraints. As it is desirable to retain both the performance guarantee and safety guarantee, this work proposes a hierarchical approach with a distributed planner that uses local search with a worst-case performance guarantees and a decentralized controller based on control barrier functions that ensures safety and encourages timely arrival at sensing locations. Via extensive simulations, hardware-in-the-loop tests and hardware experiments, we demonstrate that the proposed approach achieves a better trade-off between sensing and energy cost than coordinate-descent-based algorithms.

image not found

The Robotarium: Automation of a Remotely Accessible, Multi-Robot Testbed
Sean Wilson, Paul Glotfelter, Siddharth Mayya, Gennaro Notomista, Yousef Emam, Xiaoyi Cai, Magnus Egerstedt
RA-L 2021

paper | abstract

The cost, in terms of both time and money, of instantiating a physical testbed can be prohibitive. To help resolve this issue, the Robotarium offers a free, remotely accessible robotics lab to users around the world. Since allowing the general public to use it, hundreds of users have submitted thousands of experiments. The current and accelerating experiment submission rate poses an operational challenge that cannot be handled through manual or human supervised execution without devoting a full time operator to the platform. A solution to this problem is enable the Robotarium to operate autonomously: improving the robustness and reliability of the system while reducing required human intervention to diagnose and recover from failures. In this pursuit, the hardware, software, and algorithms deployed on the Robotarium have undergone numerous developments, including a new differential-drive robot, the use of modern virtualization techniques for the software infrastructure, and the inclusion of robust constraint-satisfaction methods for long-term safe operation. Over the past year of autonomous operation, these advances have resulted in 0.76% of the 3402 submitted remote experiments failing and requiring human intervention to recover from. This paper details these development efforts and best practices that have been learned automating a remote-access testbed to keep up with the experimental demand of a large, active, and growing userbase.

image not found

A Sequential Composition Framework for Coordinating Multirobot Behaviors
Pietro Pierpaoli, Anqi Li, Mohit Srinivasan, Xiaoyi Cai, Samuel Coogan, Magnus Egerstedt
T-RO 2020

paper | abstract

A number of coordinated behaviors are proposed for achieving specific tasks for multirobot systems. However, since most applications require more than one such behavior, one needs to be able to compose together sequences of behaviors while respecting local information flow constraints. Specifically, when the interagent communication depends on interrobot distances, these constraints translate into particular configurations that must be reached in finite time in order for the system to be able to transition between the behaviors. To this end, we develop a distributed framework based on finite-time convergence control barrier functions that enables a team of robots to adjust its configuration in order to meet the communication requirements for the different tasks. In order to demonstrate the significance of the proposed framework, we implemented a full-scale scenario where a team of eight planar robots explore an urban environment in order to localize and rescue a subject.

A Distributed Pipeline for Scalable, Deconflicted Formation Flying
Parker C. Lusk, Xiaoyi Cai, Samir Wadhwania, Aleix Paris, Kaveh Fathian, Jonathan P. How
RA-L 2020

paper | abstract | code | video

Reliance on external localization infrastructure and centralized coordination are main limiting factors for formation flying of vehicles in large numbers and in unprepared environments. While solutions using onboard localization address the dependency on external infrastructure, the associated coordination strategies typically lack collision avoidance and scalability. To address these shortcomings, we present a unified pipeline with onboard localization and a distributed, collision-free formation control strategy that scales to a large number of vehicles. Since distributed collision avoidance strategies are known to result in gridlock, we also present a distributed task assignment solution to deconflict vehicles. We experimentally validate our pipeline in simulation and hardware. The results show that our approach for solving the optimization problem associated with formation control gives solutions within seconds in cases where general purpose solvers fail due to high complexity. In addition, our lightweight assignment strategy leads to successful and quicker formation convergence in 96-100% of all trials, whereas indefinite gridlocks occur without it for 33-50% of trials. By enabling large-scale, deconflicted coordination, this pipeline should help pave the way for anytime, anywhere deployment of aerial swarms.

Coordination of Robot Teams Over Long Distances: From Georgia Tech to Tokyo Tech and Back---An 11,000-km Multirobot Experiment
Riku Funada, Xiaoyi Cai, Gennaro Notomista, Made W.S. Atman, Junya Yamauchi, Masayuki Fujita, Magnus Egerstedt
CSM 2020

paper | abstract | video

Coordinated control of multirobot systems across long distances is challenging because robots experience greater communication delays as the spaces between them grow. The presence of delays, if not properly addressed, may lead to oscillatory behaviors and even instabilities. Passivity-based methods are often utilized to alleviate the adverse effects of time delays, thanks to the several useful properties that passivity has when addressing large systems with many subcomponents, which are possibly difficult to explicitly model (for example, human–swarm interaction, which involves complex human dynamics). This tutorial article introduces two passivity-based methods for networked control systems with delays and validates them through a joint experiment between the Georgia Institute of Technology and Tokyo Institute of Technology on the coordination of multirobot teams across more than 11,000 km, with a maximum one-way communication delay of approximately 300 ms. In this experiment, robots belonging to subteams are in close proximity to each other, while the subteams themselves are separated by significant distances. Therefore, the communication delays between subteams are not negligible, thus causing performance degradation. Together, the subteams must achieve a series of increasingly complex tasks, such as meeting at one point and assembling certain formations while interacting with human operators.

  Conference

Probabilistic Traversability Model for Risk-Aware Motion Planning in Off-Road Environments
Xiaoyi Cai, Michael Everett, Lakshay Sharma, Philip R. Osteen, Jonathan P. How
IROS 2023

paper | abstract | code | video

A key challenge in off-road navigation is that even visually similar terrains or ones from the same semantic class may have substantially different traction properties. Existing work typically assumes no wheel slip or uses the expected traction for motion planning, where the predicted trajectories provide a poor indication of the actual performance if the terrain traction has high uncertainty. In contrast, this work proposes to analyze terrain traversability with the empirical distribution of traction parameters in unicycle dynamics, which can be learned by a neural network in a self-supervised fashion. The probabilistic traction model leads to two risk-aware cost formulations that account for the worst-case expected cost and traction. To help the learned model generalize to unseen environment, terrains with features that lead to unreliable predictions are detected via a density estimator fit to the trained network's latent space and avoided via auxiliary penalties during planning. Simulation results demonstrate that the proposed approach outperforms existing work that assumes no slip or uses the expected traction in both navigation success rate and completion time. Furthermore, avoiding terrains with low density-based confidence score achieves up to 30% improvement in success rate when the learned traction model is used in a novel environment.

image not found

RAMP: A Risk-Aware Mapping and Planning Pipeline for Fast Off-Road Ground Robot Navigation
Lakshay Sharma, Michael Everett, Donggun Lee, Xiaoyi Cai, Philip Osteen, Jonathan P. How
ICRA 2023

paper | abstract | video

A key challenge in fast ground robot navigation in 3D terrain is balancing robot speed and safety. Recent work has shown that 2.5D maps (2D representations with additional 3D information) are ideal for real-time safe and fast planning. However, the prevalent approach of generating 2D occupancy grids through raytracing makes the generated map unsafe to plan in, due to inaccurate representation of unknown space. Additionally, existing planners such as MPPI do not consider speeds in known free and unknown space separately, leading to slower overall plans. The RAMP pipeline proposed here solves these issues using new mapping and planning methods. This work first presents ground point inflation with persistent spatial memory as a way to generate accurate occupancy grid maps from classified pointclouds. Then we present an MPPI-based planner with embedded variability in horizon, to maximize speed in known free space while retaining cautionary penetration into unknown space. Finally, we integrate this mapping and planning pipeline with risk constraints arising from 3D terrain, and verify that it enables fast and safe navigation using simulations and hardware demonstrations.

image not found

Risk-Aware Off-Road Navigation via a Learned Speed Distribution Map
Xiaoyi Cai, Michael Everett, Jonathan Fink, Jonathan P. How
IROS 2022

paper | abstract

Motion planning in off-road environments requires reasoning about both the geometry and semantics of the scene (e.g., a robot may be able to drive through soft bushes but not a fallen log). In many recent works, the world is classified into a finite number of semantic categories that often are not sufficient to capture the ability (i.e., the speed) with which a robot can traverse off-road terrain. Instead, this work proposes a new representation of traversability based exclusively on robot speed that can be learned from data, offers interpretability and intuitive tuning, and can be easily integrated with a variety of planning paradigms in the form of a costmap. Specifically, given a dataset of experienced trajectories, the proposed algorithm learns to predict a distribution of speeds the robot could achieve, conditioned on the environment semantics and commanded speed. The learned speed distribution map is converted into costmaps with a risk-aware cost term based on conditional value at risk (CVaR). Numerical simulations demonstrate that the proposed risk-aware planning algorithm leads to faster average time-to-goals compared to a method that only considers expected behavior, and the planner can be tuned for slightly slower, but less variable behavior. Furthermore, the approach is integrated into a full autonomy stack and demonstrated in a high-fidelity Unity environment and is shown to provide a 30\% improvement in the success rate of navigation.

image not found

Non-Monotone Energy-Aware Information Gathering for Heterogeneous Robot Teams
Xiaoyi Cai*, Brent Schlotfeldt*, Kasra Khosoussi, Nikolay Atanasov, George J. Pappas, Jonathan P. How
ICRA 2021 (*equal contributions)

paper | abstract | video

This paper considers the problem of planning trajectories for a team of sensor-equipped robots to reduce uncertainty about a dynamical process. Optimizing the trade-off between information gain and energy cost (e.g., control effort, distance travelled) is desirable but leads to a non-monotone objective function in the set of robot trajectories. Therefore, common multi-robot planning algorithms based on techniques such as coordinate descent lose their performance guarantees. Methods based on local search provide performance guarantees for optimizing a non-monotone submodular function, but require access to all robots' trajectories, making it not suitable for distributed execution. This work proposes a distributed planning approach based on local search and shows how lazy/greedy methods can be adopted to reduce the computation and communication of the approach. We demonstrate the efficacy of the proposed method by coordinating robot teams composed of both ground and aerial vehicles with different sensing/control profiles and evaluate the algorithm's performance in two target tracking scenarios. Compared to the naive distributed execution of local search, our approach saves up to 60% communication and 80--92% computation on average when coordinating up to 10 robots, while outperforming the coordinate descent based algorithm in achieving a desirable trade-off between sensing and energy cost.

image not found

LION: Lidar-Inertial Observability-Aware Navigator for Vision-Denied Environments
Andrea Tagliabue*, Jesus Tordesillas*, Xiaoyi Cai*, Angel Santamaria-Navarro, Jonathan P. How, Luca Carlone, Ali-akbar Agha-mohammadi
ISER 2020 (*equal contributions)

paper | abstract | video

State estimation for robots navigating in GPS-denied and perceptually-degraded environments, such as underground tunnels, mines and planetary sub-surface voids, remains challenging in robotics. Towards this goal, we present LION (Lidar-Inertial Observability-Aware Navigator), which is part of the state estimation framework developed by the team CoSTAR for the DARPA Subterranean Challenge, where the team achieved second and first places in the Tunnel and Urban circuits in August 2019 and February 2020, respectively. LION provides high-rate odometry estimates by fusing high-frequency inertial data from an IMU and low-rate relative pose estimates from a lidar via a fixed-lag sliding window smoother. LION does not require knowledge of relative positioning between lidar and IMU, as the extrinsic calibration is estimated online. In addition, LION is able to self-assess its performance using an observability metric that evaluates whether the pose estimate is geometrically ill-constrained. Odometry and confidence estimates are used by HeRO, a supervisory algorithm that provides robust estimates by switching between different odometry sources. In this paper we benchmark the performance of LION in perceptually-degraded subterranean environments, demonstrating its high technology readiness level for deployment in the field.

Passivity-Based Decentralized Control of Multi-Robot Systems With Delays Using Control Barrier Functions
Gennaro Notomista, Xiaoyi Cai, Junya Yamauchi, Magnus Egerstedt
MRS 2019

paper | abstract | video

In this paper, we present a solution to the problem of coordinating multiple robots across a communication channel that experiences delays. The proposed approach leverages control barrier functions in order to ensure that the multi-robot system remains dissipative. This is achieved by encoding the dissipativity-preserving condition as a set invariance constraint. This constraint is then included in an optimization problem, whose objective is that of modifying, in a minimally invasive fashion, the nominal input to the robots. The formulated optimization problem is decentralized in the sense that, in order to be solved, it does not require the individual robots to have access to global information. Moreover, thanks to its convexity, each robot can solve it using fast and efficient algorithms. The effectiveness of the proposed control framework is demonstrated through the implementation of a formation control algorithm in presence of delays on a team of mobile robots.


Template modified from this