Stop hand-tuning RL environments
Describe your task. Sofy generates the environment, trains your agent, detects reward hacking, convergence failures, and action collapse — then fixes them automatically.
Generate. Train. Diagnose. Improve.
The full loop — from text prompt to improved agent — in 4 steps.
Describe Your Problem
Plain language in, Gymnasium environment out. Locomotion, manipulation, inventory, trading — any task where an agent needs to learn from interaction.
Watch It Train
Training runs locally on your machine. Real-time metrics stream to the dashboard as your agent learns.
Stuck at local optimum after 8k steps
Only 3% of episodes get positive reward
Agent uses only 2 of 5 available actions
See What's Wrong
After training, Sofy analyzes metrics to find reward plateaus, sparse signals, action collapse, and other failure patterns.
Auto-Improve
The system generates environment variants targeting each detected failure. Train again, compare results, iterate automatically.
For Developers
Three lines to generate. Three more to train and diagnose.
from sofy import Sofy
sofy = Sofy()
env = sofy.generate("Quadruped robot — walk forward, minimize energy")
result = sofy.run(env, algorithm="PPO", total_timesteps=500_000)
# v1: reward=-12.3 | action_collapse detected
# v2: reward=45.7 | instability detected
# v3: reward=89.2 | healthy
print(result.best_environment)The Core Loop
Anyone can generate. Knowing what to change is the hard part.
Generate
Describe any RL task — locomotion, manipulation, inventory, trading, scheduling. Sofy generates a complete Gymnasium environment with observation spaces, action spaces, and reward functions.
Where We're Going
The diagnosis engine is the constant. The simulation layer grows.
RL Environment SDK
Text-to-environment generation with automated failure diagnosis and iterative improvement. 4 failure detectors, component-based shaping, parallel variant training.
Physics-Backed Environments
MuJoCo and PyBullet integration for robotics, manipulation, and locomotion tasks. Contact dynamics, material properties, and sensor models.
Simulation Infrastructure
Multi-physics orchestration, 3D asset generation, and adaptive training curricula. Full synthetic training infrastructure for embodied AI.
Early Access
Be one of the first teams to use Sofy. Free during early access.
Early AccessFree
Full platform access for teams shaping the future of RL training.
- Unlimited environment generations
- Full auto-improvement loop
- All 4 failure detectors
- Parallel variant training
- Gymnasium-compatible output
- Direct access to the founding team
Frequently Asked Questions
Common questions about Sofy.