AI Research Engineer - Datadog AI Research (DAIR)
- Observability Foundation Models - Building state-of-the-art models for advanced forecasting, anomaly detection, and multi-modal telemetry analysis (logs, metrics, traces, etc.). These models will also provide the foundation for our agents (described below) to natively analyze telemetry data.
- Site Reliability Engineering (SRE) Autonomous Agents - Creating AI agents to automatically detect, diagnose, and resolve incidents in production environments, pushing the boundaries of multi-step planning, reasoning, and domain-specific knowledge.
- Production Code Repair Agents - Developing agents and models that leverage code, logs, runtime data, and other signals to identify, fix, and even preempt performance issues and security vulnerabilities in production code.
- Build and operate datasets, training and evaluation pipelines, benchmarks, and internal tooling
- Implement models, run experiments at scale, and profile for reliability, performance, and cost
- Orchestrate distributed training and distributed RL with Ray, including scheduling, scaling, and failure recovery
- Make the research stack observable, reproducible, and easier to use
- Establish rigorous automated benchmarks and regression tests for forecasting, anomaly detection, multi-modal analysis, agents, and code repair tasks
- Collaborate with Research Scientists, Product, and Engineering to integrate advanced AI capabilities into Datadog's product ecosystem and to harden prototypes into reliable services
- Contribute high-quality code, documentation, and open-source artifacts that enable the community and internal teams to reproduce, extend, and evaluate results
- You have strong software engineering skills with experience in domains such as observability, SRE, or security
- You have depth in distributed computing and ML systems for training and inference at scale; experience with Ray, Slurm, or similar frameworks is a plus
- You are proficient in Python, familiar with a systems language (e.g., Rust, C++, or Go), and you are comfortable with modern cloud and data infrastructure
- You have practical experience implementing and operating ML training and inference systems (e.g., PyTorch or JAX), including containerization, orchestration, and GPU acceleration
- You are familiar with efficient training, fine-tuning, and inference techniques for large foundation models
- You can explain design and performance trade-offs clearly to both technical and non-technical audiences
- You have a strong interest in open-science and open-source contributions, including establishing rigorous benchmarks and sharing artifacts with the community
- You have a demonstrated ability to bridge cutting-edge research prototypes and real-world product applications, ideally with large foundation models, generative AI agents, or domain-specific LLM deployments
- You are passionate about pushing the boundaries of AI while maintaining a strong focus on customer impact, scalability, and responsible deployment of new technologies
- You have hands-on experience with GPU programming and optimization, including experience in CUDA
- You have experience writing production data pipelines and applications
- You have experience supporting or contributing to research publications
- Competitive global benefits
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP)
- Opportunity to collaborate closely with colleagues across the Datadog offices in New York City and Paris
- Opportunity to attend and present at conferences and meetups
- Intra-departmental mentor and buddy program for in-house networking
- An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups)
Emplois Recommandés
Alternant Service Clientèle - H/F
MASSENA, leader dans le secteur de l'éducation et de la formation professionnelle, est à la recherche de talents dynamiques pour rejoindre notre équipe de service clientèle. Notre mission est de four…
Senior Account Executive - Financial sector
Who are we? Equinix is the world's digital infrastructure company, shortening the path to connectivity to enable the innovations that enrich our work, life and planet. A place where bold ideas are…
Worldwide Controlling Intern
Balenciaga seeks a Worldwide Controlling Intern in Paris to support the financial controlling team. The role involves optimizing financial reports, analyzing subsidiary results, and participating in b…
Senior Digital Analytics Consultant (H/F) Paris
A propos de fifty-five : fifty-five est une data-company d'un genre nouveau qui aide les marques à exploiter les données pour améliorer le marketing, les médias et l'expérience client grâce à une co…
Développeur(se) SharePoint F/H
Développeur(se) SharePoint F/H Description de poste Développement front-end, Back-end, Fullstack,... Ca vous parle ? Nous rejoindre, c’est intégrer une communauté technique animée aussi bien …
Support et Projets Reporting Legal et Financier
Le besoin Renfort de l'équipe support & projets pour le Reporting Legal et Financier Renfort de l'équipe support & projets pour le Reporting Legal et Financier (rapports annuels, semestriels, S…
Fullstack Engineer
About us Today, the mid-market segment (300 to 5k employee companies) is vastly underserved when it comes to their procurement processes. These companies struggle with outdated and old fashioned t…
Assistant(e) Marketing Digital de Luxe en alternance
L'École Conte recherche pour une maison de luxe renommée un(e) étudiant(e) dynamique en première année de Marketing du Luxe pour intégrer notre équipe en tant qu'Assistant(e) Marketing Digital de Luxe…
HOMUNITY - Apprenticeship - Marketing Digital (H/F)
Vous souhaitez rejoindre une fintech filiale de Tikehau Capital, avec une ambition forte : devenir le leader de référence de l’investissement immobilier 100 % en ligne ? Notre mission chez Homuni…