Software Engineer, RL Training Infra
crypto:infraquant-researchIC4Research
Compensation
$295k–$445k base / year (USD)
About the Team
The Post-Training Frontiers team creates the frontier agents OpenAI ships to the world. We do the reinforcement learning training for the agentic models we ship in Codex, ChatGPT, and the API (from o1 to 5.5).
Our role consists of (1) shepherding all integrations that should go into the final RL run and deciding what can make it in, (2) babysitting and scaling the final run, and (3) building the research and infra for horizontal integrations, such as improving function calling, factuality, multi-agent capabilities, memory, calibrated thinking, etc.
About the Role
This role focuses on keeping our frontier RL training runs fast, reliable, and unblocked. You will work across engineering and infrastructure problems as they emerge, from scaling and orchestration issues to inference bottlenecks, numerical problems, and hardware failures, as well as supporting large horizontal integrations in the big run, like multi-agent capabilities or memory. This is a role for a strong generalist who quickly learns anything needed for the task, has high attention to detail, debugs deeply, and is motivated by fixing the highest-impact problem in front of the team.
In this role, you will:
- Keep large-scale RL training runs moving by jumping into the most urgent engineering and infrastructure problems.
- Debug issues across training systems, inference, orchestration, scaling, and distributed infrastructure.
- Solve hard technical problems at the boundary between research and engineering: scaling experiments, improving training reliability, debugging distributed systems, reducing latency and cost, and making new capabilities robust under real workloads.
- Improve reliability and efficiency for RL training runs.
- Help researchers who are developing infra-heavy integrations, such as multi-agent capabilities or memory.
- Turn recurring operational issues into better tools, systems, processes, or abstractions.
- Work closely with research, infrastructure, and partn