User Study & Data Collection
We conduct a user study with 13 human subjects (3 female, 10 male) spanning three skill levels — Novice (N), Experienced (E), and Professional (P). Tasks are performed on phantoms and ex-vivo tissues for clinical realism.
The resulting dataset comprises 214 validated instances across four canonical surgical training tasks. Each instance provides time-aligned:
- Stereo endoscope images (1080p, 60 Hz)
- Side-view camera images (1080p, 30 Hz)
- ECM & PSM kinematic data (6D Cartesian + gripper)
- Tool-tissue contact ground truth (binary)
- Timestamps across all modalities
All modalities are frame-aligned by our synchronized recorders, enabling direct multi-modal model training without additional interpolation. (the recording frequency can be found in the Results section)