Senior Technical Operations Engineer (Tier 3)
Kela Technologies
Senior Technical Operations Engineer (Tier 3)
- Engineering
- Senior
- Full-time
Description
We’re looking for a Senior Technical Operations Engineer (Tier 3) to serve as Kela’s last line of defense before R&D, handling our most complex and high-impact production issues.
In this role, you’ll take full ownership of deep technical investigations, act as the primary escalation point for unresolved incidents, and help drive operational excellence across our production environments.
You’ll play a key part in shaping Operations processes, automating operational tasks, and analyzing patterns across incidents to prevent recurrence. This is a highly visible, hands-on position that requires both technical depth and excellent communication skills.
What You’ll Be Doing:
- Lead deep-dive root cause investigations across infrastructure, application, and network layers.
- Take ownership of critical, time-sensitive customer issues, driving resolution under pressure.
- Collaborate directly with R&D on cases requiring code-level fixes or changes.
- Lead live incidents and SEVs: manage real-time troubleshooting, coordinate with internal teams, and provide timely customer updates.
- Facilitate and document technical post-incident retrospectives, contributing to RCA reports and driving follow-up action items.
- Identify and correlate recurring issues across incidents to uncover systemic weaknesses and chronic problems.
- Drive cross-functional initiatives aimed at eliminating root causes and improving production stability.
- Develop and execute preventive maintenance tasks: system health checks, resource monitoring, configuration audits, and more.
- Work closely with the Observability team to improve alert quality, proactive issue detection, and incident response readiness.
- Implement and maintain operational runbooks, diagnostic scripts, and internal tools to streamline support workflows.
- Build automation solutions to reduce manual tasks and accelerate troubleshooting.
- Contribute to self-service tools for Tier 1 and Tier 2 teams to reduce escalation volume.
Act as the technical point of contact for customers during escalations, delivering clear, timely, and empathetic communication.
Requirements
Must Have:
- 5+ years of deep, hands-on experience in Tier 3 Technical Support, Production Operations, or Systems Engineering roles.
- Proven track record of resolving complex, customer-facing production issues in mission-critical environments.
- Strong expertise with Linux, networking fundamentals, and distributed system troubleshooting.
- Extensive experience leading live incidents and war rooms, with a focus on real-time impact mitigation.
- Solid working knowledge of log analysis, metrics monitoring, and observability tools such as Grafana, ELK, and Datadog.
- Proficiency in scripting languages (Python, Bash, or equivalent) for automation and tooling.
- Excellent communication skills: able to translate complex technical issues for both technical teams and non-technical stakeholders.
- Strong sense of ownership, urgency, and customer empathy, especially under pressure.