Cross‑Platform Surgical Skill Transfer Using Multi‑Modal Learning
Foundation models can significantly improve robotic surgical systems by enabling real‑time perception, adaptive control, and autonomous decision‑making.
About the Project
This project explores a novel framework for robotic surgical systems that enables the transfer of surgical skills across different robotic platforms. Leveraging multi‑modal learning, our approach integrates endoscopic imagery and synchronized joint angle data to bridge the gap between platforms with limited sensor information.
By combining self‑supervised learning, domain adaptation, and foundation models, the framework paves the way for more generalizable and autonomous surgical robotics.
Multi-platform Data Collection
We collected a comprehensive dataset of synchronized endoscopic videos and joint angle data from multiple robotic surgical platforms; HeroSurg and da Vinci S. This dataset serves as the foundation for training our multi‑modal learning models, enabling them to learn cross‑platform representations of surgical skills.
Trial Overview
Key Innovations
- Multi‑Modal Representation: Fusion of visual and kinematic data for robust skill modeling.
- Cross‑Platform Transfer: Domain adaptation techniques enable generalization across robotic systems.
- Self‑Supervised Joint Estimation: Learning joint motion from video frames when direct data is unavailable.
- Real & Simulated Validation: Evaluated on both real surgical data and physics‑based simulation environments.
This framework has strong implications for robotic surgery training, autonomous skill transfer, and cross‑device compatibility. It reduces reliance on platform‑specific data and accelerates the adoption of AI‑driven robotic surgical assistants. The methodology also extends to teleoperation, rehabilitation, and industrial manipulation.
"Our goal is to enable robotic systems to learn once and adapt everywhere, ensuring skill transfer between different surgical platforms seamlessly."