API Reference¶
Architecture overview¶
Raiden is structured as a pipeline of loosely coupled modules. Data flows from hardware through recording and conversion into a flat, policy-ready dataset.
Hardware (ZED / RealSense cameras, YAM arms, SpaceMouse)
│
▼
CameraConfig ──► Camera (ZedCamera / RealSenseCamera)
RobotController ──► i2rt MotorChainRobot / IK via PyRoki + J-Parse
│
▼
Recorder ──► data/raw/<task>/<episode>/
│ metadata.json
│ robot_data.npz
│ cameras/<name>.svo2 or .bag
▼
Converter ──► data/processed/<task>/<episode>/
│ rgb/<camera>/<frame>.jpg
│ depth/<camera>/<frame>.npz (uint16 mm)
│ lowdim/<camera>/<frame>.npz (intrinsics, extrinsics, action)
▼
Visualizer (Rerun)
Modules¶
CameraConfig¶
Loads ~/.config/raiden/camera.json and maps semantic camera names (e.g.
left_wrist) to hardware serial numbers, camera types (zed / realsense),
and roles (scene / left_wrist / right_wrist). Both ZED and RealSense
cameras can be freely mixed in a single session.
Camera¶
Thin wrappers around the ZED SDK (ZedCamera) and the Intel RealSense SDK
(RealSenseCamera). Both expose a uniform interface: open(), grab() →
(rgb, depth), intrinsics, close(). Depth is returned as uint16
millimetres. ZED cameras additionally support SVO2 recording and playback.
RobotController¶
Manages YAM follower and leader arms over CAN via i2rt. In SpaceMouse mode it runs a real-time IK loop using PyRoki and J-Parse to convert end-effector velocity commands into joint targets, with manipulability-aware damping near singularities. Handles bimanual and single-arm configurations, e-stop integration, and optional foot-pedal control.
Recorder¶
Orchestrates a full recording session: opens cameras once and keeps them alive
across episodes, spawns per-episode threads for teleoperation, camera capture
(30 Hz), and robot joint logging (~100 Hz). Writes SVO2/bag files and
robot_data.npz to data/raw/.
Converter¶
Offline post-processing step (rd convert). Reads raw SVO2/bag recordings,
synchronizes multi-camera streams by timestamp, extracts JPEG frames and depth
maps, interpolates joint poses onto the camera timeline, and writes the
per-frame lowdim.npz files that bundle intrinsics, per-frame extrinsics,
the action vector, and the language instruction. Supports three depth backends:
RealSense IR, ZED SDK NEURAL_LIGHT, and Fast Foundation Stereo (with optional
TensorRT acceleration).
Visualizer¶
Loads a converted recording and streams it into a Rerun viewer: RGB and depth images on a timeline, 3-D point clouds, robot joint overlays, and the action trajectory.
Calibration¶
Hand-eye calibration (rd calibrate) for wrist cameras and static extrinsic
estimation for scene cameras. Writes ~/.config/raiden/calibration_results.json with
T_cam2ee per wrist camera and T_base2cam for scene cameras, plus a
bimanual_transform mapping the right-arm base into the left-arm base frame.
FFSDepthPredictor / FFSTrtDepthPredictor¶
Depth estimation backends wrapping
Fast Foundation Stereo.
FFSDepthPredictor runs inference in PyTorch; FFSTrtDepthPredictor uses
compiled TensorRT FP16 engines for faster throughput. Both are used
transparently by the Converter — TRT engines are preferred when present.
Data conventions¶
| Item | Convention |
|---|---|
| World frame | Left-arm base frame |
| Extrinsics | T_world_cam (4×4 float64, row-major in npz) |
| Depth | uint16, millimetres |
| Joint layout | [r_joint×6, r_grip×1, l_joint×6, l_grip×1] |
| Action | pos(3) + rot_mat_flat(9) + gripper(1) per arm |
| Right wrist camera | Mounted upside-down; images rotated 180° at capture time |