Senior ML Infrastructure / ML DevOps Engineer Ajouter aux favoris
- Operate and scale GPU-heavy clusters used daily by the R&D team for large-scale training and low-latency inference.
- Design, build, and automate the ML platform rather than just run pre-defined playbooks.
- Work across multiple major cloud providers, solving interesting problems in networking, scheduling, and cost/performance optimization at scale.
- Design, operate, and scale GPU and CPU clusters for ML training and inference (Slurm, Kubernetes, autoscaling, queueing, quota management).
- Automate infrastructure provisioning and configuration using infrastructure-as-code (Terraform, CloudFormation, cluster-tooling) and configuration management.
- Build and maintain robust ML pipelines (data ingestion, training, evaluation, deployment) with strong guarantees around reproducibility, traceability, and rollback.
- Implement and evolve ML-centric CI/CD: testing, packaging, deployment of models and services.
- Own monitoring, logging, and alerting across training and serving: GPU/CPU utilization, latency, throughput, failures, and data/model drift (Grafana, Prometheus, Loki, CloudWatch).
- Work with terabyte-scale datasets and the associated storage, networking, and performance challenges.
- Partner closely with ML engineers and researchers to productionize their work, translating experimental setups into robust, scalable systems.
- Participate in on-call rotation for critical ML infrastructure and lead incident response and post-mortems when things break.
You are
- Former or current Linux / systems / network administrator who is comfortable living in the shell and debugging at OS and network layers (systemd, filesystems, iptables/security groups, DNS, TLS, routing).
- 5+ years of experience in DevOps/SRE/Platform/Infrastructure roles running production systems, ideally with high-performance or ML workloads.
- Deep familiarity with Linux as a daily driver, including shell scripting and configuration of clusters and services.
- Strong experience with workload management, containerization, and orchestration (Slurm, Docker, Kubernetes) in production environments.
- Solid understanding of CI/CD tools and workflows (GitHub Actions, GitLab CI, Jenkins, etc.), including building pipelines from scratch.
- Hands-on cloud infrastructure experience (AWS, GCP, Azure), especially around GPU instances, VPC/networking, storage, and managed ML services (e.g., SageMaker HyperPod, Vertex AI).
- Proficiency with infrastructure as code (Terraform, CloudFormation, or similar) and a bias toward automation over manual operations.
- Experience with monitoring and logging stacks (Grafana, Prometheus, Loki, CloudWatch, or equivalents).
- Familiarity with ML pipeline and experiment orchestration tools (MLflow, Kubeflow, Airflow, Metaflow, etc.) and with model/version management.
- Solid programming skills in Python, plus the ability to read and debug code that uses common ML libraries (PyTorch, TensorFlow) even if you are not a full-time model developer.
- Strong ownership mindset, comfort with ambiguity, and enthusiasm for scaling and hardening critical infrastructure for an ML-heavy environment.
- Willingness to learn.
Why You Should Apply
- Intellectually stimulating work environment. Be a pioneer: you get to work with realtime data processing & AI.
- Work in one of the hottest AI startups, with exciting career prospects. Team members are distributed across the world.
- Responsibilities and ability to make significant contribution to the company' success
- Inclusive workplace culture
- Type of contract : Permanent employment contract
- Preferable joining date : Immediate.
- Compensation : based on profile and location.
- Location : Remote work. Possibility to work or meet with other team members in one of our offices: Palo Alto, CA; Paris, France or Wroclaw, Poland. Candidates based anywhere in the EU, United States, and Canada will be considered.
Emplois Recommandés
Vendeuse / Vendeur - F/H
Description du poste Nous recherchons un profil vente en contrat CDI Temps Partiel 30H pour un de nos magasins parisiens. Vous êtes le premier point de contact de l’enseigne Courir au sein de v…
Contremaître itinérant - IDF (F/H)
Rejoignez ENGIE et embarquez pour la plus importante aventure du siècle ! Si, comme nous, vous avez une vision optimiste de la transition énergétique pour les Hommes et la planète et que …
Chef de rang (H/F)
À propos de nous Pour un nouvel âge d’or de la brasserie traditionnelle française La Nouvelle Garde c’est une bande de jeunes qui défendent frénétiquement l’art de vivre à la française au tra…
Responsable Recrutement & ADP
Contexte Tigermilk est un groupe de restaurants en croissance. L’équipe se structure pour tenir un haut niveau d’exigence opérationnelle tout en améliorant l’expérience collaborateur et la fidélis…
SCRUM MASTER SENIOR FLUENT ANGLAIS
Le poste : SCRUM MASTER SENIOR BACKGROUND TECHLEAD MOE Mission 100% en remote - garantir le respect de la méthodologie Agile au sein de l'équipe Scrum - planifier et animer les cérémonies ag…
VIE - Bac+3 - Assistant responsable de la livraison des travaux F/H
Poste ouvert aux personnes en situation de handicap. Vous souhaitez développer votre carrière dans le domaine du nucléaire ? Vous souhaitez contribuer à l'élaboration de solutions innovantes pour rel…
Alternance Chargé d'Accueil bilingue - Paris (F/H)
L’ISCOD, spécialiste de la formation en Digital Learning, recherche pour son entreprise partenaire, Chaîne d'hôtels haut de gamme , un(e)Chargé d'accueil bilingue en contrat d'apprentissage , pour p…
Senior CX Operations Manager SaaS (f/m/x)
At Shiftmove , we’re building the next generation of connected mobility products that empower businesses to make smarter, data-driven decisions. Our goal is to make complex operations intuitive and…
Infirmière H/F - Paris 75
Présentation de l'entrepriseRetrouvez plus de 4000 offres d'emploi santé sur notre site et application mobile Jober Group. Profitez d'un réseau de 2000 partenaires sur toute la France, d'une équip…
Commercial B to B dans l'habitat - H/F
Développer du business, créer du lien et concrétiser des projets : c’est ce qui vous motive ? Vous avez le goût du challenge, le sens du conseil et l’envie de performer dans un environnement exigea…