FlowDiT
RGB, point, and text tokens to multi-frame 3D flow
The model encodes observations and task instructions, predicts future 3D flow, and feeds that flow into a policy for action generation.
World Model
A lightweight flow world model that predicts future multi-frame 3D flows and guides real-time robotic manipulation.
01
Interactive task examples show flow-conditioned planning for manipulation scenes.
02
RGB, points, and language are fused to predict future 3D flow and guide action.
03
Simulation videos, real-robot videos, and benchmark tables are reproduced below.
Planning in 3D Space
RoboFlow4D treats world modelling as a closed loop between observation, prediction, and execution. Given a visual sequence and language instruction, the model predicts future multi-frame 3D flow that describes how task-relevant geometry should move.
The original project page includes interactive 3D demos for household manipulation tasks such as Moka Pot, Drawer, Book to Caddy, and Push Cube. The local page keeps the task structure and links back to the live project resources.
Pipeline
FlowDiT
The model encodes observations and task instructions, predicts future 3D flow, and feeds that flow into a policy for action generation.
Closed loop
RoboFlow4D acts as a predictive planner, while the action policy executes conditioned on both robot state and explicit flow.
Average success-rate gains reported over base policies on LIBERO and ManiSkill3.
Reported planning speedup compared with modular flow-planning pipelines.
Goal-oriented planning latency aimed at real-time robot deployment.
Simulation Videos
LIBERO Object
LIBERO Object
LIBERO Spatial
LIBERO Spatial
LIBERO Goal
LIBERO Goal
LIBERO Long
LIBERO Long
Real-World Videos
Real robot
Pick up the brown cup and insert it into the black cup.
Real robot
Place an object into the target workspace with flow-guided control.
Real robot
Open the top drawer, place the red cube inside, and close it.
Real robot
Pick up the red cube and place it on the blue cube.
Quantitative Results
| Method | Spatial | Object | Goal | Long | Average |
|---|---|---|---|---|---|
| Octo | 78.9 | 85.7 | 84.6 | 51.1 | 75.1 |
| SpatialVLA | 88.2 | 89.9 | 78.6 | 55.5 | 78.1 |
| 4D-VLA | 88.9 | 95.2 | 90.9 | 79.1 | 88.6 |
| DP | 81.6 | 91.5 | 78.4 | 64.0 | 78.9 |
| DP + RoboFlow4D | 89.8 | 93.2 | 85.2 | 72.0 | 85.1 |
| DiT | 84.2 | 96.3 | 85.4 | 68.8 | 83.7 |
| DiT + RoboFlow4D | 90.2 | 97.0 | 88.4 | 75.2 | 87.7 |
Real-world DP gain
RoboFlow4D improves DP real-robot average success while reducing average completion time in the reported tasks.
Real-world DiT gain
The same flow guidance improves DiT across pick-and-place, stack, assemble, and drawer scenarios.
Deployment
The system is designed to make predictive 3D motion practical inside a robot control loop.