Skip to main content

Stop hand-tuning RL environments

Describe your task. Sofy generates the environment, trains your agent, detects reward hacking, convergence failures, and action collapse — then fixes them automatically.

Generate. Train. Diagnose. Improve.

The full loop — from text prompt to improved agent — in 4 steps.

Describe Your Problem

Plain language in, Gymnasium environment out. Locomotion, manipulation, inventory, trading — any task where an agent needs to learn from interaction.

45

Watch It Train

Training runs locally on your machine. Real-time metrics stream to the dashboard as your agent learns.

Reward Plateauhigh

Stuck at local optimum after 8k steps

Sparse Signalmedium

Only 3% of episodes get positive reward

Action Collapselow

Agent uses only 2 of 5 available actions

See What's Wrong

After training, Sofy analyzes metrics to find reward plateaus, sparse signals, action collapse, and other failure patterns.

Version
v1
Before
After

Auto-Improve

The system generates environment variants targeting each detected failure. Train again, compare results, iterate automatically.

For Developers

Three lines to generate. Three more to train and diagnose.

main.py
from sofy import Sofy

sofy = Sofy()
env = sofy.generate("Quadruped robot — walk forward, minimize energy")
result = sofy.run(env, algorithm="PPO", total_timesteps=500_000)

# v1: reward=-12.3 | action_collapse detected
# v2: reward=45.7  | instability detected
# v3: reward=89.2  | healthy
print(result.best_environment)
Python SDK
Runs locally
SB3 + Gymnasium

The Core Loop

Anyone can generate. Knowing what to change is the hard part.

Where We're Going

The diagnosis engine is the constant. The simulation layer grows.

Now

RL Environment SDK

Text-to-environment generation with automated failure diagnosis and iterative improvement. 4 failure detectors, component-based shaping, parallel variant training.

Next

Physics-Backed Environments

MuJoCo and PyBullet integration for robotics, manipulation, and locomotion tasks. Contact dynamics, material properties, and sensor models.

Vision

Simulation Infrastructure

Multi-physics orchestration, 3D asset generation, and adaptive training curricula. Full synthetic training infrastructure for embodied AI.

Early Access

Be one of the first teams to use Sofy. Free during early access.

Early AccessFree

$0while in early access

Full platform access for teams shaping the future of RL training.


  • Unlimited environment generations
  • Full auto-improvement loop
  • All 4 failure detectors
  • Parallel variant training
  • Gymnasium-compatible output
  • Direct access to the founding team

Frequently Asked Questions

Common questions about Sofy.

See how Sofy can solve your problem

Describe a problem. Get a trained agent.

Book a Demo