🔒 roboplayground.ai/create
RoboPlayground
New Task
My Tasks
Gallery
Sort colored cubes into distinct piles by color
Generating ColorBlockPileSorting... Task compiled. 9 ColoredCubes (3 red, 3 green, 3 blue) spawn in a mixed pile. Success requires compact same-color groups. Goal state: three color-sorted piles of cubes Goal state snapshot — validated ✔
Now add numbered and lettered blocks — sort by type
Modifying task... Adding SemanticCube assets... Task updated. 3 ColoredCubes + 3 numbered + 3 lettered SemanticCubes. Three type‑sorted piles with semantic readability checks. Goal state: color, number, and letter piles Goal state snapshot — validated ✔
generated_task.py
class ColorBlockPileSorting(RoboEvalEnv):
    """Sort colored cubes into
    distinct piles by color."""

    _GROUP_MAX_DIST = 0.12
    _MIN_SEPARATION = 0.18

    def _initialize_env(self):
        self.table = self._preset.get_props(Table)[0]
        self.red_cubes = [ColoredCube(self._mojo) for _ in range(3)]
        self.green_cubes = [ColoredCube(self._mojo) for _ in range(3)]
        self.blue_cubes = [ColoredCube(self._mojo) for _ in range(3)]
        self.cubes = self.red_cubes + self.green_cubes + self.blue_cubes

        for c in self.red_cubes:
            c.set_color("red")
        for c in self.green_cubes:
            c.set_color("green")
        for c in self.blue_cubes:
            c.set_color("blue")

    def _on_reset(self):
        rng = np.random.default_rng(self.seed)
        center = np.array([0.55, 0.00])
        for cube in self.cubes:
            xy = center + rng.normal(scale=0.035, size=(2,))
            cube.set_pose(position=np.array([*xy, 0.97]))

    def _success(self):
        red_spread = self._max_pairwise(self.red_cubes)
        green_spread = self._max_pairwise(self.green_cubes)
        blue_spread = self._max_pairwise(self.blue_cubes)
        compact = all(s <= self._GROUP_MAX_DIST
            for s in [red_spread, green_spread, blue_spread])
        r_c = self._centroid(self.red_cubes)
        g_c = self._centroid(self.green_cubes)
        b_c = self._centroid(self.blue_cubes)
        ordered = r_c[0] < g_c[0] < b_c[0]
        return compact and ordered

Our Team

1University of Washington    2Allen Institute for AI    * Equal contribution    † Equal advising

Abstract

Evaluation of robotic manipulation systems has largely relied on fixed benchmarks authored by a small number of experts, where task instances, constraints, and success criteria are predefined and difficult to extend. This paradigm limits who can shape evaluation and obscures how policies respond to user-authored variations in task intent, constraints, and notions of success.

We present RoboPlayground, a framework that enables users to author executable manipulation tasks using natural language within a structured physical domain. Natural language instructions are compiled into reproducible task specifications with explicit asset definitions, initialization distributions, and success predicates. Each instruction defines a structured family of related tasks, enabling controlled semantic and behavioral variation while preserving executability and comparability. A user study shows that the language-driven interface is easier to use and imposes lower cognitive workload than programming-based and code-assist baselines. Evaluating learned policies on language-defined task families reveals generalization failures not apparent under fixed benchmark evaluations. Finally, we show that task diversity scales with contributor diversity rather than task count alone, enabling evaluation spaces to grow continuously through crowd-authored contributions.

System Overview

System Overview

Language-Driven Task Authoring

Users express task intent, constraints, and success criteria in natural language. Each instruction is compiled into an executable task specification with explicit asset definitions, initialization distributions, and success predicates — enabling reproducible evaluation without writing a single line of code.

Generated Tasks
Rainbow Cube Line
compositional · bimanual
Block Stacking
hierarchical · bimanual
Purple Circle + Teal
spatial · hierarchical

RainbowCubeLineArrangementTask

by Carter Ung · Feb 6, 2026

Arrange seven distinct colored cubes (red, orange, yellow, green, blue, indigo, violet) into a single straight line on the table in rainbow order. All cubes should be aligned and evenly spaced along the line.

Additional: Place a white cube on top of every cube whose color starts with a vowel (orange, indigo).

Table ColoredCube both compositional bimanual

Preview

Initial state Initial State
Goal state Goal State

Generated Code

Results

We evaluate RoboPlayground along three axes: usability, diagnostic value for policy generalization, and scalability of task creation.

Try It Out!

RoboPlayground lets anyone author executable manipulation tasks using natural language. No simulation expertise required — just describe what you want, and the system compiles it into a reproducible, validated task specification. Everything is open-source.

@misc{wang2026roboplaygrounddemocratizingroboticevaluation,
      title={RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains},
      author={Yi Ru Wang and Carter Ung and Evan Gubarev and Christopher Tan and Siddhartha Srinivasa and Dieter Fox},
      year={2026},
      eprint={2604.05226},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2604.05226},
}

Please contact Carter Ung or Yi Ru Wang.