Senior Technical Operations Engineer
Zoox
IT, Operations
Foster City, CA, USA
Posted on Jun 28, 2025
The IT Platform team at Zoox is expanding to include IT Technical Operations for our commercial service, with a focus on real-time command center support, including monitoring services, and embracing Site Reliability Engineering (SRE) principles. As a Senior Technical Operations Engineer, you will join a new Technical Operations Engineering team, while integrating with existing operations teams, to ensure the stability and success of live robot missions. This role is an opportunity to shape the future of Zoox's real-time operations, drive strategic initiatives, and implement innovative solutions that enhance reliability and performance.
In this role, you will:
- Real-Time Command Center Operations & Strategy: Develop and support a 24/7 real-time technical team, serving as the primary point of contact for IT’s mission-critical operations in our Fusion (Operations) Centers. Execute the strategic vision for IT TechOps, ensuring alignment with business objectives, while implementing scalable processes for real-time troubleshooting and operational support to maintain seamless live operations. Initial scope will be focused heavily on WAN, POP, and IoT support. Integration with existing operational teams will be key.
- Real-Time Monitoring & Incident Response: Oversee and optimize operational observability for Zoox’s active robot fleet’s IT dependencies. Prioritizing proactive issue detection, and rapid incident response, this role will implement and manage real-time monitoring solutions, ingesting multiple data sources, integrating with stakeholders on best practices for observability, enhancing automation, and improving operational efficiency.
- Technical Strategy: Deliver strong technical support for deployed WAN, POP, and IoT solutions. Later expanding support into other areas, such as AWS EKS environments. Ensure system reliability and performance. Leverage expertise in networking, cloud computing, and troubleshooting to maintain operational efficiency and resolve issues effectively. Triage and troubleshooting issues from desktop, datacenter, network, provider, cloud and software.
- Collaboration: Coordinate with multiple technical and support teams to integrate a seamless real-time IT operations framework, enhancing capabilities through cross-team alignment. Proactively discover, document, and optimize IT technical operations processes, ensuring workflows are well-defined, refined, and automated for efficiency and scalability. Strong communication skills and ability to lead escalation calls to resolution.
- Site Reliability & Continuous Improvement: Champion SRE principles to enhance system reliability, scalability, and performance by implementing self-healing mechanisms, automated incident response, and continuous improvement initiatives that adapt to evolving business and technical needs. Closed lopped communication with engineering and SRE teams who own the broader solutions (e.g. WAN, internal apps).
Qualifications:
- 10+ years of IT experience, including 5+ years in real-time operations
- Expert-level experience in building and managing real-time IT operations and processes
- Proven track record of success in critical real-time operations (e.g., life-safety, financial, transportation)
- Demonstrable knowledge of networking concepts and troubleshooting, including OSI model, TCP/IP, network security, IoT and Cellular experience is highly desirable
- Ability to lead observability initiatives, including building of real-time dashboards
- Full-stack knowledge: understanding of production IT environments and end-to-end service delivery
- Exposure to operating production environments in AWS; EKS also beneficial