Anuncio original

At JetBrains, code is our passion. Ever since we started back in 2000, we have been striving to make the strongest, most effective developer tools on earth. By automating routine checks and corrections, our tools speed up production, freeing developers to grow, discover, and create.

We're looking for a Research Engineer who will own the training stack and model architecture for our Mellum LLM family. Your job is easier said than done: make training faster, cheaper, and more stable at a large scale. You'll profile, design, and implement changes to the training pipeline - from architecture to custom GPU kernels, as needed.

As part of our team, you will:

  • Be responsible for improving end-to-end performance for multi-node LLM pre-training and post-training pipelines.
  • Profile hotspots (Nsight Systems/Compute, NVTX) and fix them using compute/comm overlap, kernel fusion, scheduling, etc.
  • Design and evaluate architecture choices (depth/width, attention variants including GQA/MQA/MLA/Flash-style, RoPE scaling/NTK, and MoE routing and load-balancing).
  • Implement custom ops (Triton and/or CUDA C++), integrate via PyTorch extensions, and upstream when possible.
  • Push memory/perf levers: FSDP/ZeRO, activation checkpointing, FP8/TE, tensor/pipeline/sequence/expert parallelism, NCCL tuning.
  • Harden large runs by building elastic and fault-tolerant training setups, ensuring robust checkpointing, strengthening reproducibility, and improving resilience to preemption.
  • Keep the data path fast using streaming and sharded data loaders and tokenizer pipelines, as well as improve overall throughput and cache efficiency.
  • Define the right metrics, build dashboards, and deliver steady improvements.
  • Run both pre-training and post-training (including SFT, RLHF, and GRPO-style methods) efficiently across sizable clusters.

We'll be happy to bring you on board if you have:

  • Strong PyTorch and PyTorch Distributed experience, having run multi-node jobs with tens to hundreds of GPUs.
  • Hands-on experience with Megatron-LM/Megatron-Core/NeMo, DeepSpeed, or serious FSDP/ZeRO expertise.
  • Real profiling expertise (Nsight Systems/Compute, nvprof) and experience with NVTX-instrumented workflows.
  • GPU programming skills with Triton and/or CUDA, and the ability to write, test, and debug kernels.
  • A solid understanding of NCCL collectives, as well as topology and fabric effects (IB/RoCE), and how they show up in traces.

Our ideal candidate would have experience with:

  • FlashAttention-2 and 3, CUTLASS and CuTe, TransformerEngine and FP8, Inductor, AOTAutograd, and torch.compile.
  • MoE at scale (expert parallel, router losses, capacity management) and long-context tricks (ALiBi/YaRN/NTK scaling).
  • Kubernetes or SLURM at scale, placement and affinity tuning, as well as AWS, GCP, and Azure GPU fleets.
  • Web-scale data plumbing (streaming datasets, Parquet and TFRecord, tokenizer perf), eval harnesses, and benchmarking.
  • Safety and post-training methods, such as DPO, ORPO, GRPO, and reward models.
  • Inference ecosystems such as vLLM and paged KV.

#LI-KP1

We are an equal opportunity employer

We know great ideas can come from anyone, anywhere. That's why we do our best to create an open and inclusive workplace - one that welcomes everyone regardless of their background, identity, religion, age, accessibility needs, or orientation.

We process the data provided in your job application in accordance with the Recruitment Privacy Policy.

Remoto

Developer Advocate (AIR)

Amsterdam, Netherlands; Berlin, Germany; Madrid; Munich, Germany; Remote, United States
1m
Remoto

Product Manager – Agent Interoperability (ACP)

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Remote, Germany; Warsaw, Poland; Yerevan, Armenia
1m

QA Engineer (AI Assistant Features)

Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia
1m

Performance QA Engineer (JCP Core)

Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia
1m
Remoto

Program Manager

Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Remote, Germany; Warsaw, Poland; Yerevan, Armenia
2m

Developer Experience Lead (JetBrains Cloud Platform)

Amsterdam, Netherlands; Berlin, Germany; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland
2m

Senior AI/ML Engineer (Spectrum)

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Warsaw, Poland; Yerevan, Armenia
2m

Security Engineer, Identity and Access Management (IAM)

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia
2m
Remoto

Technical Project Manager (JetBrains Cloud Platform)

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Remote, Germany; Warsaw, Poland; Yerevan, Armenia
2m

Senior Software Developer (ReSharper)

Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Warsaw, Poland
2m
Remoto

Project Maintainer – DPAI Arena Evaluation Infrastructure

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Remote, Germany; Warsaw, Poland; Yerevan, Armenia
2m

Senior Data Engineer (Kineto)

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Warsaw, Poland; Yerevan, Armenia
2m

Responsable de Proyectos | Infraestructura civil y urbanización

Madrid, (Hybrid)
1d

Tunnelling & Geotechnical Modelling Engineer

Madrid, (Hybrid)
1d

Process Engineer

Granollers
2d

Process Engineering Trainee

Rubí
2d

Jefe/a Proyecto QA

Madrid
2d

Operador/a sistemas 24x7 Madrid

Madrid
2d

Jigs and Tools Engineer (Temp Agency)

Cadiz Area
2d

Quality material and NDT (NDT L2 Q Performer) - S19

Illescas
2d

Work Preparation Process Engineer HTP SA

Getafe Area
2d

Operations Engineer

Bilbao, Torre Iberdrola
2d

Presales Junior – Soluciones Tecnológicas y Defensa

Torrejón de Ardoz
2d

Head of Tech Delivery and Assurance

Madrid; Amsterdam, Netherlands
6d
Híbrido

Senior Strategic Account Executive - Global

London / London, London, United Kingdom / Ireland / Dublin, Leinster, Ireland / UK / Belgium / Brussels, Brussels, Belgium / Luxembourg / Luxembourg, Luxembourg, Luxembourg / Netherlands / Amsterdam, North Holland, Netherlands / Denmark / Copenhagen, Capital Region, Denmark / Norway / Oslo, Oslo, Norway / Sweden / Stockholm, Stockholm, Sweden / Spain / Madrid, Community of Madrid / Barcelona / Barcelona, Catalonia
1m
Híbrido

Senior Enterprise SDR

Zurich / Zurich, Zurich, Switzerland / Cologne / Cologne, Northrhine Westfalia, Germany / Munich / Munich, Bavaria, Germany / Dusseldorf / Dusseldorf, Northrhine Westfalia, Germany / Germany / Austria / Vienna, Vienna, Austria / Belgium / Brussels, Brussels, Belgium / Luxembourg / Luxembourg, Luxembourg, Luxembourg / Netherlands / Amsterdam, North Holland, Netherlands / Spain / Madrid, Community of Madrid / Barcelona / Barcelona, Catalonia / UK / London, London, United Kingdom / London
1m
Remoto

Staff Engineer — Data Platform

London / Amsterdam / Europe / Belgium / France / Germany / Sweden / Spain / Portugal / Italy
1m
Remoto

Engineering Manager – Data Platform

London / Amsterdam / Ireland / France / Europe / Germany / Belgium / Netherlands / Poland / Spain
1m

Regional Technical Director - DC Operations

Ireland, Dublin, Dublin / Netherlands, Noord-Holland, Amsterdam / Spain, Madrid, Madrid / United Kingdom, London, London
1m
Remoto

Senior Platform Engineer — AI Agent Infrastructure

Argentina / Bogota / Chile / Mexico / Colombia / Buenos Aires / Europe / Lima / Paraguay / Spain / Amsterdam / Belgium / Brazil / Germany / Italy
1m
Remoto

Developer Advocate (AIR)

Amsterdam, Netherlands; Berlin, Germany; Madrid; Munich, Germany; Remote, United States
1m
Remoto

Product Manager – Agent Interoperability (ACP)

Amsterdam, Netherlands; Belgrade, Serbia; Berlin, Germany; Limassol, Cyprus; Madrid; Munich, Germany; Paphos, Cyprus; Prague, Czech Republic; Remote, Germany; Warsaw, Poland; Yerevan, Armenia
1m

Director of Product (all genders)

Amsterdam; Berlin; València, Valencia
1m
Híbrido

Product Engineer (Mobile) - Accounting Domain (Swift and/or Kotlin)

Paris / Amsterdam / Athens / Vienna / Milan / Lisbon / Belgrade / Brussels / Berlin / Barcelona
1m

Internship Business

Zurich / Zurich, Zurich, Switzerland / Belgium / Brussels, Brussels, Belgium / Luxembourg / Luxembourg, Luxembourg, Luxembourg / Netherlands / Amsterdam, North Holland, Netherlands / Austria / Vienna, Vienna, Austria / Germany / Munich, Bavaria, Germany / Poland / Warsaw, Masovia, Poland / Spain / Madrid, Community of Madrid
1m

Candidatura gestionada por JetBrains