Senior Infrastructure & TechOps Engineer

Kela Technologies

Kela Technologies

Other Engineering

Tel Aviv-Yafo, Israel

Posted on Apr 29, 2026

Senior Infrastructure & TechOps Engineer

  • Delivery
  • Tel Aviv (TLV)
  • Full-time

Description

As a Senior Production Infrastructure Engineer at Kela, you'll own the operational backbone that keeps our mission-critical production systems running across every production site we deploy. You'll sit inside the TechOps group on the Infrastructure team, working alongside the Professional Services and Support teams to make sure that what we build inhouse actually performs - reliably, observably, and at scale - in the field.

This role is about more than keeping the lights on. You'll design the monitoring, incident response, change management, and automation that let the rest of the organization move fast without breaking production. You'll build the systems that detect problems before customers do, the workflows that resolve them when they happen anyway, and the infrastructure that lets us push firmware, configuration, and software changes to field devices safely and at scale.

What You'll Do

  • Own monitoring and observability across the production fleet. Operate the Prometheus, Grafana, and centralized logging stack so problems surface early and the right signal reaches the right person.
  • Build the incident response system - workflows, runbooks, and tooling that let TechOps resolve issues fast across infrastructure, network, and application layers - and run the post-incident process that turns each event into a permanent fix.
  • Drive change management for production: firmware updates to field devices, software releases, configuration changes. Track every rollout, measure outcomes, and make the process safer and faster over time.
  • Automate aggressively. Infrastructure provisioning, auto-healing, deployment verification, tests, health checks. If it's manual today and shouldn't be, you fix it.
  • Define and track reliability maturity. Choose the signals that matter at our stage - SLOs, error budgets, MTTR, change failure rate, deployment frequency - instrument them, and use them to focus the team's investments.
  • Build deployment pipelines that bridge our hardware and software stack, in close work with the firmware and platform teams.
  • Integrate systems and data flows across the platform so we get the full value of what we already collect.
  • Close the gaps between how systems are designed, how they actually behave, and what the team needs to operate them.

Participate in the TechOps on-call rotation.

Requirements

What We're Looking For

  • 4+ years of hands-on experience in production infrastructure, Ops, SRE, or DevOps, supporting distributed systems where downtime has real consequences.
  • Experience with Linux - confident using the command line and diagnosing issues in production environments.
  • Hands on experience working with kubernetes
  • Hands-on experience with Prometheus, Grafana, and centralized logging.
  • Infrastructure-as-code with Terraform or Ansible.
  • Solid networking - VPN, routing, firewalls. Understanding the bits and bites of complex networking architectures.
  • Proven experience designing and running incident response processes in production.
  • Ability to learn new technologies fast and work across unfamiliar layers of the stack.

Who You Are

  • Technically curious. You open the box, read the source, go find the next thing to learn.
  • Creative. You find the angle nobody else saw and the solution that's both simple and right.
  • A team player. You make the people around you better and you share what you know.
  • In flow with the work. You hold complexity, switch contexts, and keep moving without losing the thread.

Quick to learn from mistakes - yours and other people's. Every incident is material for the next iteration.