Mirrai Careers
Resume BuilderCareer Test
InsightsPricing
Get Started Free
Jobs/Senior AI Researcher- Reinforcement learning (f/m/d)

Senior AI Researcher- Reinforcement learning (f/m/d)

Aleph Alpha

Heidelberg Remote Full-time Posted 3mo ago
Apply on company site
OUR MISSION Aleph Alpha is one of the few companies in Europe with end-to-end in-house model development including pre- and post-training. We’re building models that have general-purpose capabilities, but also specifically excel at addressing the needs of our customers. We're growing our post-training team in Heidelberg (or hybrid in Germany) and are looking for an AI Researcher who combines a deep theoretical understanding of reinforcement learning methods with a desire to improve on the state of the art and improve model capabilities in large-scale training. Team Culture At Aleph Alpha, we foster a culture built on ownership, autonomy, and empowerment. Teams and individual contributors are trusted to take responsibility for their work and drive meaningful impact. We maintain a flat organizational structure with efficient, supportive management that enables quick decision‑making, open communication, and a strong sense of shared purpose. ABOUT THE ROLE As a Senior AI Researcher for reinforcement learning you will shape and improve the underlying RL methodology, maintain a high-quality training code-base, and conduct large-scale experiments to hill-climb our performance benchmarks. This role is for you if you both have a strong theoretical background on RL and the engineering drive to bring these methods into production and improve on the methods as part of the reinforcement learning team. In your day-to-day you will conduct large-scale reinforcement learning experiments, derive hypotheses from the results, and iterate on both the implementation and methodology based on the observations. Together with a collaborative team, you will have direct impact on the models that we ship to our customers. This role is for Aleph Alpha Research GmbH. YOUR RESPONSIBILITIES * Hill-climb in large-scale training: Conduct large-scale LLM training runs, analyze evaluation scores in depth, propose hypotheses for improvement and directly implement them in order to maximize performance on our benchmarks. * Theoretical innovation: Stay at the bleeding edge of RL research. You will identify, implement, and iterate on novel approaches to multi-turn reinforcement learning. * Scale our training infrastructure: Identify bottlenecks in our training setup and optimize our RL training loops for large-scale training. * Cross-functional collaboration: Partner with our other post-training teams to turn raw feedback into actionable training signals, ensuring that our RL iterations lead to measurable improvements in downstream performance. YOUR PROFILE Basic Qualifications * A deep understanding of Reinforcement Learning theory and how it relates to modern RL methods. * Experience with multi-node LLM training (ideally using RL). You understand how to scale multi-node RL trainings and can reason about and implement distributed algorithms. * Familiarity with statistical methods for evaluation and experiment design. * Ability to reason about what an evaluation/environment measures and whether it matters - not just run benchmarks, but understand them. * Strong Python skills and comfort with ML tooling (especially torch distributed) * Willingness to relocate to Heidelberg or travel regularly (potentially weekly). Preferred Qualifications * PhD in reinforcement learning or equivalent research experience. * A history of contributions to top-tier venues (NeurIPS, ICML, ICLR, etc.) specifically regarding RL. * Experience evaluating LLM models and crafting environments for training. COMPENSATION AND BENEFITS * Become part of an AI revolution! * 30 days of paid vacation * Access to a variety of fitness & wellness offerings via Wellhub * Mental health support through nilo.health * Substantially subsidized company pension plan for your future security * Subsidized Germany-wide transportation ticket * Budget for additional technical equipment * Flexible working hours for better work-life balance and hybrid working model * Virtual Stock Option Plan * JobRad® Bike Lease

See how well you match this job

Upload your resume and we’ll score your fit for this role and 6 similar roles — then tailor your CV to it with AI. Free, no credit card.

Check your match

Similar jobs

  • Senior AI Researcher- Pre-training (f/m/d)

    alephalpha

    Remote
  • Senior AI Researcher - Pre-training Data (m/f/d)

    Aleph Alpha

    Remote
  • Senior AI Software Engineer – Model Training (f/m/d)

    alephalpha

    Remote
  • Senior AI Software Engineer – Model Training (f/m/d)

    Aleph Alpha

    Remote
  • Senior Reinforcement Learning Engineer

    anybotics

    Zurich, Switzerland
  • Researcher, Synthetic RL

    OpenAI

    San Francisco$295k–$445k
Apply on company site

Want more roles like this? Browse fresh jobs or tailor your resume with AI.

Mirrai Careers

AI-powered career platform: build resumes, match jobs, and plan your career.

Product

  • All Tools
  • Resume Builder
  • Career Test
  • Pricing

Legal

  • Privacy Policy
  • Terms of Service
  • Fair Use Policy

Company

MIRRAI CHAT LTD (Company No. 16403306)

71-75 Shelton Street, Covent Garden

London, WC2H 9JQ, UNITED KINGDOM

[email protected]

© 2026 Mirrai Careers. All rights reserved.