.@willccbb (Research Lead, Prime Intellect) on how RL environments really work: “An environment is essentially an eval. You’ve got input tasks, a harness, and at the end it scores how your model or agent performs. That’s the setup we use for both evals and RL training.” He adds that the future isn’t just about “getting 100,000 GPUs in one giant cluster.”
19.27K