Site Reliability Engineer
Clarifai
This job is no longer accepting applications
See open jobs at Clarifai.See open jobs similar to "Site Reliability Engineer" Lux Capital.About the Company
Clarifai is a leading, full-lifecycle deep learning AI platform for computer vision, natural language processing, audio recognition, and large language models (LLM). We help organizations transform unstructured images, video, text, and audio data into structured data at a significantly faster and more accurate rate than humans would be able to do on their own. Founded in 2013 by Matt Zeiler, Ph.D. Clarifai has been a market leader in AI since winning the top five places in image classification at the 2013 ImageNet Challenge. Clarifai continues to grow with employees remotely based throughout the United States, Canada, Brazil, India, and Estonia.
We have raised $100M in funding to date, with $60M coming from our most recent Series C, and are backed by industry leaders like Menlo Ventures, Union Square Ventures, Lux Capital, New Enterprise Associates, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm and Osage.
Your Impact
Clarifai’s platform is a kubernetes-native distributed system that requires the orchestration of many components. Efficiently serving and training large neural networks presents unique design and infrastructure challenges. You will be critical to solving these challenges both in the context of the cloud and in on premise environments. Additionally, you will be responsible for our broader cloud infrastructure and development tools and environments.
The Opportunity
You will be joining the growing SRE team at Clarifai. Our work includes:
Use terraform and other IaC to manage our environments in AWS, GCP and our private cloud
Build kubernetes resources and tooling for deployments to the cloud and on premise
Build and support environments for artificial intelligence research engineers, scientists, and developers
Build and support our CI/CD pipeline in conversation with the rest of the engineering team
Requirements
- 5+ years of infrastructure/Site reliability/DevOps experience
BS/BA in Computer Science or related degree
Knowledge of Kubernetes, Docker/Containerd
Experience using one of the major cloud providers for a project (AWS, GCP, Azure)
Basic knowledge of Terraform or other IaC tools
Basic SRE knowledge of linux, ip networks
Willingness to learn and grow in a startup environment
Ability to overlap with US East Coast (NYC) until 12pm (Eastern / NYC Time)
Great to Have
Knowledge of basic Microservice Architecture principles
Knowledge of basic security engineering principles
Experience with relational databases, message queues, key value stores
Experience writing python, golang, or any other popular programming language
Familiarity with any RPC framework
Familiarity with Helm, Tilt
Familiarity with Github actions/Argo CD
This job is no longer accepting applications
See open jobs at Clarifai.See open jobs similar to "Site Reliability Engineer" Lux Capital.