Question 1

How is this different from asking ChatGPT to write a Gymnasium environment?

Accepted Answer

ChatGPT can generate environment code. So can Sofy. The difference is what happens after. Sofy trains your agent on the generated environment, detects when training fails — reward hacking, action collapse, convergence failure — and automatically generates improved environments targeting each failure. Generation is step 1 of 4.

Question 2

What kind of environments can Sofy generate?

Accepted Answer

Sofy generates Gymnasium-compatible environments for any RL task you can describe in text — locomotion, manipulation, inventory management, trading, scheduling, resource allocation, and more. You describe the task in plain language, and the system produces a complete environment with observation spaces, action spaces, and reward functions.

Question 3

How does the auto-improvement loop work?

Accepted Answer

After your agent trains, Sofy analyzes the training metrics with 4 statistical detectors: reward hacking, convergence failure, action collapse, and instability. It then generates environment variants targeting each detected issue — reshaped rewards, adjusted observation spaces, modified terminal conditions. The best variant becomes the new baseline. The loop repeats until the verdict is healthy.

Question 4

What training frameworks are supported?

Accepted Answer

Sofy outputs standard Gymnasium environments that work with any RL framework: Stable Baselines3, RLlib, CleanRL, TorchRL, or your custom training loop. Install with pip, import, and train. No vendor lock-in.

Question 5

Does training run on your servers?

Accepted Answer

No. Training runs entirely on your machine via the SDK. The server handles environment generation (LLM orchestration) and failure analysis. Your model weights, training data, and infrastructure stay with you.

Question 6

What about physics simulation and robotics?

Accepted Answer

MuJoCo integration for robotics and locomotion tasks is in active development. The current SDK handles any Gymnasium-compatible environment, and the diagnosis engine — the core of what makes Sofy different — works across all environment types.

Stop hand-tuning RL environments

Generate. Train. Diagnose. Improve.

Describe Your Problem

Watch It Train

See What's Wrong

Auto-Improve

For Developers

The Core Loop

Generate

Diagnose

Regenerate

Generate

Where We're Going

RL Environment SDK

Physics-Backed Environments

Simulation Infrastructure

Early Access

Frequently Asked Questions

Describe a problem. Get a trained agent.