About Sofy

From plain text to trained RL agent. We generate Gymnasium environments, diagnose training failures, and auto-improve — so you can focus on the problem, not the simulation.

What We Do

Sofy generates Gymnasium-compatible RL environments from natural language. Describe any decision problem — inventory management, stock trading, resource allocation, scheduling — and the system produces a complete environment with observation spaces, action spaces, and reward functions.

But generation is just the start. Sofy watches your agent train, identifies failure patterns (reward plateaus, sparse signals, action collapse), and generates targeted environment variants to close the gap. The environment evolves with the agent. That adaptive loop is the product.

How It Works

Building RL environments by hand is slow. Engineering teams spend weeks crafting reward functions, tuning observation spaces, and debugging why agents fail to learn. When training stalls, figuring out whether the problem is the environment or the algorithm takes even longer.

Sofy is the orchestration layer. It coordinates language models and domain analysis to produce complete environments. Training runs locally on your machine via the Python SDK, and works with any RL framework — Stable Baselines3, RLlib, CleanRL, or your custom training loop.

After training, the system analyzes your metrics, diagnoses failure patterns in plain English, and generates environment variants targeting each issue. Train again, compare results, iterate. The cycle repeats until your agent hits its performance target.

Where We're Going

Today, Sofy handles decision problems with discrete and continuous action spaces. Next, we are bringing physics-backed environments into the loop — MuJoCo and PyBullet integration for robotics, manipulation, and locomotion tasks.

The long-term vision is full simulation infrastructure: multi-physics orchestration, 3D asset generation, and adaptive training curricula for embodied AI. The diagnosis engine — the core of what makes Sofy different — works across all environment types.

Why We Built This

Everyone can generate an environment. Knowing what to change when training fails is the hard part. Teams waste weeks debugging reward functions and observation spaces by trial and error.

Sofy automates that diagnosis. We are not building a physics engine or a training framework. We are building the layer that turns a failed training run into a better environment — automatically.

Founders

Stef Virgil

Co-founder

Marian Diaconescu

Co-founder