API Reference¶

Architecture overview¶

Raiden is structured as a pipeline of loosely coupled modules. Data flows from hardware through recording and conversion into a flat, policy-ready dataset.

Hardware (ZED / RealSense cameras, YAM arms, SpaceMouse)
    │
    ▼
CameraConfig ──► Camera (ZedCamera / RealSenseCamera)
RobotController ──► i2rt MotorChainRobot / IK via PyRoki + J-Parse
    │
    ▼
Recorder  ──► data/raw/<task>/<episode>/
    │              metadata.json
    │              robot_data.npz
    │              cameras/<name>.svo2 or .bag
    ▼
Converter ──► data/processed/<task>/<episode>/
    │              rgb/<camera>/<frame>.jpg
    │              depth/<camera>/<frame>.npz   (uint16 mm)
    │              lowdim/<camera>/<frame>.npz  (intrinsics, extrinsics, action)
    ▼
Visualizer  (Rerun)

Modules¶

`CameraConfig`¶

Loads ~/.config/raiden/camera.json and maps semantic camera names (e.g. left_wrist) to hardware serial numbers, camera types (zed / realsense), and roles (scene / left_wrist / right_wrist). Both ZED and RealSense cameras can be freely mixed in a single session.

`Camera`¶

Thin wrappers around the ZED SDK (ZedCamera) and the Intel RealSense SDK (RealSenseCamera). Both expose a uniform interface: open(), grab() → (rgb, depth), intrinsics, close(). Depth is returned as uint16 millimetres. ZED cameras additionally support SVO2 recording and playback.

`RobotController`¶

Manages YAM follower and leader arms over CAN via i2rt. In SpaceMouse mode it runs a real-time IK loop using PyRoki and J-Parse to convert end-effector velocity commands into joint targets, with manipulability-aware damping near singularities. Handles bimanual and single-arm configurations, e-stop integration, and optional foot-pedal control.

`Recorder`¶

Orchestrates a full recording session: opens cameras once and keeps them alive across episodes, spawns per-episode threads for teleoperation, camera capture (30 Hz), and robot joint logging (~100 Hz). Writes SVO2/bag files and robot_data.npz to data/raw/.

`Converter`¶

Offline post-processing step (rd convert). Reads raw SVO2/bag recordings, synchronizes multi-camera streams by timestamp, extracts JPEG frames and depth maps, interpolates joint poses onto the camera timeline, and writes the per-frame lowdim.npz files that bundle intrinsics, per-frame extrinsics, the action vector, and the language instruction. Supports three depth backends: RealSense IR, ZED SDK NEURAL_LIGHT, and Fast Foundation Stereo (with optional TensorRT acceleration).

`Visualizer`¶

Loads a converted recording and streams it into a Rerun viewer: RGB and depth images on a timeline, 3-D point clouds, robot joint overlays, and the action trajectory.

`Calibration`¶

Hand-eye calibration (rd calibrate) for wrist cameras and static extrinsic estimation for scene cameras. Writes ~/.config/raiden/calibration_results.json with T_cam2ee per wrist camera and T_base2cam for scene cameras, plus a bimanual_transform mapping the right-arm base into the left-arm base frame.

`FFSDepthPredictor` / `FFSTrtDepthPredictor`¶

Depth estimation backends wrapping Fast Foundation Stereo. FFSDepthPredictor runs inference in PyTorch; FFSTrtDepthPredictor uses compiled TensorRT FP16 engines for faster throughput. Both are used transparently by the Converter — TRT engines are preferred when present.

Data conventions¶

Item	Convention
World frame	Left-arm base frame
Extrinsics	`T_world_cam` (4×4 float64, row-major in npz)
Depth	`uint16`, millimetres
Joint layout	`[r_joint×6, r_grip×1, l_joint×6, l_grip×1]`
Action	`pos(3) + rot_mat_flat(9) + gripper(1)` per arm
Right wrist camera	Mounted upside-down; images rotated 180° at capture time

API Reference¶

Architecture overview¶

Modules¶

CameraConfig¶

Camera¶

RobotController¶

Recorder¶

Converter¶

Visualizer¶

Calibration¶

FFSDepthPredictor / FFSTrtDepthPredictor¶