Cortex: A Bidirectionally Aligned Embodied Agent Framework

Video

Cortex in action

Cortex coordinate long-horizon manipulation through subtask planning, compact memory, grounded execution, and online progress verification.

Closed-loop long-horizon manipulation. Cortex keeps task progress explicit, routes executable subtasks, and verifies transitions before moving forward.

Rollout highlights

From language goals to verified robot progress.

The rollout emphasizes the interface between a reasoning planner and a subtask-conditioned robot executor, especially where memory and physical progress must stay synchronized.

Executable subtasks Plans are expressed as grounded, robot-actionable skills.

Compact memory The system preserves prior states across multi-stage workflows.

Online verification Transitions are held until visual evidence supports completion.

Method

A dual-system interface that both planners and executors can trust.

Cortex treats the subtask-memory pair as the contract between System-2 planning and System-1 execution. The planner is constrained to executable skills, and the executor receives local, physically grounded commands instead of a brittle global task description.

Cortex architecture with instruction, observation, memory, VLM orchestration, and VLA execution. — **Bidirectionally aligned subtask interface.** The VLM updates memory and streams subtasks; the VLA consumes the active subtask as a grounded local objective.

1

Executable Skill Space

Free-form instructions are standardized into 32 canonical primitives, reducing kinematic hallucinations and making planner outputs routable by the VLA harness.

2

Tractable Metadata

Subtasks include object attributes, spatial relations, counts, and reachability priors so the high-level plan matches what the robot can physically execute.

3

Event-balanced Training

Training balances ongoing execution frames and boundary transition frames, teaching the planner when to hold a command and when to update memory.

4

Asynchronous Loop

System-2 runs at a slower reasoning rate while System-1 executes continuously, with harness logic for command mapping, holding, and timeout recovery.

Data

Scalable metadata construction for long-horizon manipulation.

Cortex builds a standardized interface from public real-world data, public simulation data, self-collected robot demonstrations, and procedural generation. The pipeline annotates subtask sequences, aligns boundaries, and injects execution priors.

Cortex dataset construction with public real-world data, simulation data, self-collected data, procedural data, annotation pipeline, and interface properties. — **Metadata and interface standardization.** The dataset pipeline aligns raw trajectories to executable subtasks and adds grounding signals for both executability and tractability.

Subtask Annotation

Visualized temporal alignment across heterogeneous trajectories.

Each clip overlays the active subtask above the current frame and uses color-coded timeline segments to expose subtask boundaries over the trajectory.

1 / 3

Agibot trajectory. Subtask labels and colored progress bars reveal action boundaries in a 4:3 rollout.

Behavior trajectory. A square-view rollout demonstrates consistent subtask annotation under different video geometry.

Galaxea trajectory. Wide-format visualization shows subtask switching for a household interaction sequence.

Automatic annotation pipeline for subtask segmentation and temporal alignment. — Automatic subtask annotation and temporal alignment over long-horizon demonstrations.

Event-balanced sampling improves average total score while using fewer training samples. — Event-balanced sampling improves planning score while reducing sample count near redundant intra-task frames.

Results

State-of-the-art long-horizon autonomy across planning and control.

Cortex improves both open-loop System-2 planning quality and closed-loop task execution, with the strongest gains on tasks that require memory, subtask transitions, and physical grounding.

Step-level avg. 8.32 out of 10

Episode-level avg. 7.81 closed-loop planning

LIBERO-Long 95.5% zero-shot success

RoboTwin 2.0 86.8% overall success

System-2 Planning

Structured subtasks keep long-horizon reasoning grounded.

Average total score across spatial, long-horizon, and counting evaluations.

Cortex subtask interface

Step 8.32

Episode 7.81

GPT-5 foundation VLM

Step 6.27

Episode 7.23

Gemini 3.1 Pro foundation VLM

Step 6.92

Episode 6.86

Qwen3-VL-8B foundation VLM

Step 6.74

Episode 6.29

8.74 Counting 8.16 Long-horizon 8.05 Spatial

Closed-loop Simulation

Progress verification improves full-task success.

Cortex preserves high success as task horizon grows, especially on long-horizon splits.

LIBERO-Long zero-shot Success rate

Cortex95.5

OpenVLA-OFT94.5

MemoryVLA93.4

π_0.592.4

Gemini 3.1 Pro91.0

RoboTwin 2.0 Short / long / overall

MethodShortLongOverall

Cortex86.088.086.8

π_0.582.683.082.7

X-VLA77.166.372.8

π₀61.572.665.9

Memory-heavy Manipulation

Cortex keeps object order, counts, and prior state across task memory.

RMBench success rates over seven manipulation tasks, each evaluated with 100 rollouts.

7-task average Success rate

Cortex61.9

Mem-041.7

π_0.512.6

X-VLA12.1

ACT7.6

DP6.0

Observe and Pick Up14%+10 vs Mem-0

Rearrange Blocks100%+11 vs Mem-0

Put Back Block100%+10 vs Mem-0

Swap Blocks99%+32 vs Mem-0

Swap T63%+49 vs Mem-0

Battery Try37%+9 vs Mem-0

Press Button20%only non-zero

Real World

Zero-shot deployment on complex physical workflows.

Cortex transfers to an ARX ACONE dual-arm setup and enables long-horizon chemistry and washing workflows by combining a generalist VLM planner with a short-horizon subtask-conditioned VLA executor.

Chemical liquid stirring real-world rollout with fourteen subtasks. — **Chemical liquid stirring.** Cortex preserves procedure order over fourteen stages and switches only after visual evidence supports completion.

Beaker washing real-world rollout with subtask prediction and execution process. — Beaker washing requires remembering bottle state, cap state, beaker location, and washing progress.

Local execution failure where Cortex keeps retrying the stopper grasp until success. — When a stopper grasp fails, Cortex holds the current subtask and memory until the grasp is verified.

20-trial Real-world Average

Method	Chemical Task	Washing Task
π_0.5	0% SR, 2.5 / 14	0% SR, 3.7 / 14
π_mem	0% SR, 4.1 / 14	0% SR, 6.5 / 14
Cortex	65% SR, 11.0 / 14	55% SR, 10.5 / 14
Human + π_mem^sub	75% SR, 12.2 / 14	70% SR, 11.6 / 14

Citation

BibTeX

@misc{peng2026cortex,
  title={Cortex: A Bidirectionally Aligned Embodied Agent Framework for Long-horizon Manipulation},
  author={Jiaqi Peng and Xiqian Yu and Delin Feng and Yuqiang Yang and Wenzhe Cai and Jing Xiong and Ganlin Yang and Jinliang Zheng and Jiafei Cao and Xueyuan Wei and Jiangmiao Pang and Yuan Shen and Tai Wang},
  year={2026},
  url={https://steinate.github.io/cortex.github.io}
}