About Sofy
From plain text to trained RL agent. We generate Gymnasium environments, diagnose training failures, and auto-improve — so you can focus on the problem, not the simulation.
What We Do
Sofy generates Gymnasium-compatible RL environments from natural language. Describe any decision problem — inventory management, stock trading, resource allocation, scheduling — and the system produces a complete environment with observation spaces, action spaces, and reward functions.
But generation is just the start. Sofy watches your agent train, identifies failure patterns (reward plateaus, sparse signals, action collapse), and generates targeted environment variants to close the gap. The environment evolves with the agent. That adaptive loop is the product.
How It Works
Building RL environments by hand is slow. Engineering teams spend weeks crafting reward functions, tuning observation spaces, and debugging why agents fail to learn. When training stalls, figuring out whether the problem is the environment or the algorithm takes even longer.
Sofy is the orchestration layer. It coordinates language models and domain analysis to produce complete environments. Training runs locally on your machine via the Python SDK, and works with any RL framework — Stable Baselines3, RLlib, CleanRL, or your custom training loop.
After training, the system analyzes your metrics, diagnoses failure patterns in plain English, and generates environment variants targeting each issue. Train again, compare results, iterate. The cycle repeats until your agent hits its performance target.
Where We're Going
Today, Sofy handles decision problems with discrete and continuous action spaces. Next, we are bringing physics-backed environments into the loop — MuJoCo and PyBullet integration for robotics, manipulation, and locomotion tasks.
The long-term vision is full simulation infrastructure: multi-physics orchestration, 3D asset generation, and adaptive training curricula for embodied AI. The diagnosis engine — the core of what makes Sofy different — works across all environment types.
Why We Built This
Everyone can generate an environment. Knowing what to change when training fails is the hard part. Teams waste weeks debugging reward functions and observation spaces by trial and error.
Sofy automates that diagnosis. We are not building a physics engine or a training framework. We are building the layer that turns a failed training run into a better environment — automatically.