We're a small team based out of New York, Stockholm and San Francisco, and have raised over $23M. Our team includes creators of popular open-source projects (e.g., Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.

The Role

We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!

Details

Work in-person, in our NYC, San Francisco or Stockholm office
Full medical, dental, vision insurance
Competitive salary and equity

Requirements

5+ years of experience writing high-quality production code.
Experience working with torch, huggingface libraries, modern inference engines (vLLM or TensorRT).
Familiarity with Nvidia GPU architecture and CUDA.
Familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc.)
Experience with ML performance engineering (tell us a story of when you pushed GPU utilization higher!)

See more open positions at Modal Labs

Privacy policy Cookie policy

Subscribe to the Newsletter

thanks, you're subscribed.

sorry, something went wrong.

Listen to the podcast