Most existing robotic surgery systems adopt a human-in-the-loop paradigm, often with
the surgeon directly teleoperating the robotic system. Adding intelligence to these
robots would enable higher-level control, such as supervised autonomy or even full
autonomy. However, artificial intelligence (AI) requires large amounts of training
data, which is currently lacking.
This work proposes SurgSync, a multi-modal data
collection framework with offline and online synchronization to support both training
and real-time inference, respectively. The framework is implemented on a da Vinci
Research Kit (dVRK) and introduces (1) dual-mode (online/offline-matching)
synchronized recorders, (2) a modern stereo endoscope to achieve image quality on
par with clinical systems, and (3) additional sensors such as a side-view camera
and a novel capacitive contact sensor to provide ground truth contact data.
The framework also incorporates a post-processing toolbox for tasks such as depth
estimation, optical flow, and a practical kinematic reprojection method using
Gaussian heatmap. User studies with participants of varying skill level are performed
with ex-vivo tissue to provide clinically realistic data, and a network for surgical
skill assessment is employed to demonstrate utilization of the collected data.
Through the user study experiments, we obtained a dataset of
214 validated instances across multiple canonical training tasks.
All software and data will be made available to the research community.