Member of Technical Staff - ML Performance
Modal Labs
About Us:
At Modal, we build foundational technology, including an optimized container runtime, a GPU-aware scheduler, and a distributed file system.
We're a small team based out of New York, Stockholm and San Francisco, and have raised over $23M. Our team includes creators of popular open-source projects (e.g., Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.
The Role
We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal’s container runtime to push language and diffusion models towards higher throughput and lower latency, we’d love to hear from you!
Details
Work in-person, in our NYC, San Francisco or Stockholm office
Full medical, dental, vision insurance
-
Competitive salary and equity
Requirements
5+ years of experience writing high-quality production code.
Experience working with torch, huggingface libraries, modern inference engines (vLLM or TensorRT).
Familiarity with Nvidia GPU architecture and CUDA.
Familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc.)
Experience with ML performance engineering (tell us a story of when you pushed GPU utilization higher!)