Video
Cortex in action
Cortex coordinate long-horizon manipulation through subtask planning, compact memory, grounded execution, and online progress verification.
Method
A dual-system interface that both planners and executors can trust.
Cortex treats the subtask-memory pair as the contract between System-2 planning and System-1 execution. The planner is constrained to executable skills, and the executor receives local, physically grounded commands instead of a brittle global task description.
Executable Skill Space
Free-form instructions are standardized into 32 canonical primitives, reducing kinematic hallucinations and making planner outputs routable by the VLA harness.
Tractable Metadata
Subtasks include object attributes, spatial relations, counts, and reachability priors so the high-level plan matches what the robot can physically execute.
Event-balanced Training
Training balances ongoing execution frames and boundary transition frames, teaching the planner when to hold a command and when to update memory.
Asynchronous Loop
System-2 runs at a slower reasoning rate while System-1 executes continuously, with harness logic for command mapping, holding, and timeout recovery.
Data
Scalable metadata construction for long-horizon manipulation.
Cortex builds a standardized interface from public real-world data, public simulation data, self-collected robot demonstrations, and procedural generation. The pipeline annotates subtask sequences, aligns boundaries, and injects execution priors.
Subtask Annotation
Visualized temporal alignment across heterogeneous trajectories.
Each clip overlays the active subtask above the current frame and uses color-coded timeline segments to expose subtask boundaries over the trajectory.
Results
State-of-the-art long-horizon autonomy across planning and control.
Cortex improves both open-loop System-2 planning quality and closed-loop task execution, with the strongest gains on tasks that require memory, subtask transitions, and physical grounding.
System-2 Planning
Structured subtasks keep long-horizon reasoning grounded.
Average total score across spatial, long-horizon, and counting evaluations.
Closed-loop Simulation
Progress verification improves full-task success.
Cortex preserves high success as task horizon grows, especially on long-horizon splits.
Memory-heavy Manipulation
Cortex keeps object order, counts, and prior state across task memory.
RMBench success rates over seven manipulation tasks, each evaluated with 100 rollouts.
Real World
Zero-shot deployment on complex physical workflows.
Cortex transfers to an ARX ACONE dual-arm setup and enables long-horizon chemistry and washing workflows by combining a generalist VLM planner with a short-horizon subtask-conditioned VLA executor.
20-trial Real-world Average
| Method | Chemical Task | Washing Task |
|---|---|---|
| π0.5 | 0% SR, 2.5 / 14 | 0% SR, 3.7 / 14 |
| πmem | 0% SR, 4.1 / 14 | 0% SR, 6.5 / 14 |
| Cortex | 65% SR, 11.0 / 14 | 55% SR, 10.5 / 14 |
| Human + πmemsub | 75% SR, 12.2 / 14 | 70% SR, 11.6 / 14 |
Citation
BibTeX
@misc{peng2026cortex,
title={Cortex: A Bidirectionally Aligned Embodied Agent Framework for Long-horizon Manipulation},
author={Jiaqi Peng and Xiqian Yu and Delin Feng and Yuqiang Yang and Wenzhe Cai and Jing Xiong and Ganlin Yang and Jinliang Zheng and Jiafei Cao and Xueyuan Wei and Jiangmiao Pang and Yuan Shen and Tai Wang},
year={2026},
url={https://steinate.github.io/cortex.github.io}
}