Machine Learning Engineer - Reinforcement Learning Ajouter aux favoris

Paris

About the AI Studio

The AI Studio's mission is to find the fastest possible path to an autonomous supply chain. We're developing AI agents, learning systems, training models, and more to overcome the biggest challenges remaining in the global supply chain.

In short, we are having a lot of fun.

Your Mission In This Role

We're looking for an ambitious ML Engineer focused on LLMs, agents, and reinforcement learning to help build the training, evaluation, and tooling systems behind robust AI decision-making products.

You'll work across LLM fine-tuning, agent environments, reward modeling, evaluations, data pipelines, and AI workflow tooling. The role is hands-on: designing experiments, shipping production code, improving model behaviour, and building the infrastructure that lets us learn quickly from both automated and human feedback.

You'll help shape how we use LLMs inside agentic systems, how we evaluate model and agent performance, and how we turn feedback into better training data and better behaviour.

This role requires mandatory RL training experience with LLMs, including designing and iterating on rewards, reviewing LLM traces, identifying reward hacking or shortcut behaviour, and understanding when the reward signal, environment, or training process needs to change.

Responsibilities:

Design and implement LLM-powered agent environments for supply chain decision-making
Fine-tune, adapt, and evaluate LLMs for domain-specific reasoning and decision support
Design, test, and iterate on reward functions that capture the behaviors we want from LLM agents
Review LLM traces and rollouts to understand model reasoning, failure modes, reward hacking, and shortcut behaviour
Identify when an LLM is exploiting the reward function, escaping the intended RL process, or optimizing for proxy metrics instead of the real objective
Improve reward models, environment design, prompts, tools, and feedback loops based on observed model behaviour
Build evaluation frameworks to measure model quality, agent performance, robustness, and failure modes
Create data pipelines for training, fine-tuning, preference data, synthetic data generation, and human feedback collection
Develop tooling that improves how the team builds, tests, debugs, and deploys AI-assisted workflows
Experiment with RL, RLHF, RLAIF, reward shaping, policy optimization, and agent training techniques
Document what works, what fails, and why, so we can compound our learnings over time
Stay close to the frontier of LLMs, agents, evaluations, and applied AI engineering

We want to talk if you:

You've trained or fine-tuned LLMs
Are excited about AI-assisted tools and getting the most out of them
Build & customize your own AI workflows
Have experience working with AI agents and RL environments in production
Are proficient in Python and PyTorch
Can balance research exploration with shipping working code
Hands on experience with RL techniques (reward shaping, policy optimization, RLHF)
Thrive in fast-moving environments where priorities shift
Care about craft in your work
Are curious about why things work, not just that they work

Bonus points if:

You have experience with human-in-the-loop ML systems
You've built evaluation frameworks for open-ended tasks
You're familiar with supply chain, logistics, or operations domains
You have a side project that shows you can't stop tinkering

Our Values

If you want to know the heart of a company, take a look at their values. Ours unite us. They are what drive our success - and the success of our customers. Does your heart beat like ours? Find out here: Core Values

All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or protected veteran status.

Publié le 2026-06-27

Emplois Recommandés

cherche famille d'accueil pour handicapé à Paris (75012)

Paris 12e

Avez-vous une chambre vacante chez vous ? vous vous situé sur Paris ou proche de la banlieue? Avez-vous quelques heures le matin et le soir pour prendre soin d'un jeune adulte handicapé? Nous recru…

Voir les Détails

Publié le 2026-06-27

Développeur Fullstack Senior (H/F)

Extia

Paris

Vous souhaitez rejoindre une entreprise qui place l’humain au cœur de ses préoccupations ? On vous attend chez Extia ! Société de conseil spécialisée dans les métiers de l’IT et du digital, Extia p…

Voir les Détails

Publié le 2026-06-18

Operations Specialist - Alternance

Polestar

Paris

Would you like to play an active role in the launch of a new car brand in France? Do you like project management and enjoy shaping new things? Are you looking for a nice workplace to combine your stu…

Voir les Détails

Publié le 2026-06-18

Chef de rang H/F

NJJ Hospitality

Paris

Un nouveau lieu d’exception de 3500 m2 vient d'ouvrir ses portes dans l'un des plus beaux quartiers de Paris. Alliant charme historique, élégance contemporaine et esprit d’innovation, ce lieu exce…

Voir les Détails

Publié le 2026-06-16

Chargé d'affaires GEIQ H/F

1talent

Paris

Le GEIQ Impact embauche directement les salariés en contrat d’alternance (ou professionnalisation) avec une certification à la clé, puis les met à disposition des entreprises adhérentes pour les form…

Voir les Détails

Publié le 2026-05-22

Chargé·e d’Administration des Ventes

Numberly

Paris

Numberly est reconnu comme l’un des meilleurs spécialistes mondiaux du Data Marketing avec près de 500 collaborateurs et 11 bureaux dans le monde au service de plus de 300 clients de premier plan (L'…

Voir les Détails

Publié le 2026-05-19

CTO | IA | CDI | Paris 8 | 140k - 200k + BSPCE

Urban Linker

Paris

LA SOCIéTé Je recrute pour une scale-up technologique confidentielle en forte croissance qui développe une plateforme SaaS B2B basée sur l'intelligence artificielle. Le produit La société dé…

Voir les Détails

Publié le 2026-04-08

Médecin du travail F/H

AP-HP : Assistance Publique - Hôpitaux de Paris

Paris

Schéma horaire : Jour Temps de travail : Administratif - Forfait Télétravail : Non Les avantages de l'AP-HP Rémunération attractive selon expérience (grille revalorisée en janvier 2022)…

Voir les Détails

Publié le 2026-06-03