Site Reliability Engineer - Remote
Pager is revolutionizing the traditional healthcare journey and aims to serve as your "doctor in the family." As a virtual care collaboration platform, we provide a convenient, connected care experience that covers your health from all angles. Our goal is to simplify the healthcare decision-making process by enhancing accessibility, reducing costs, and making care straightforward and understandable.
Our platform merges hi-tech AI automation with hi-touch concierge services, creating an integrated, full-service experience. This includes triage, telemedicine, e-prescriptions, appointment scheduling, after-care follow-up, care advocacy, and customer service. Our omnichannel communications platform connects the healthcare ecosystem, bringing together a comprehensive care team of nurses, doctors, pharmacists, coordinators, advocates, and more in one place. We proudly serve over 23 million people across the United States and Latin America, partnering with leading payers, providers, and employers.
We are seeking a highly skilled and experienced Site Reliability Engineer (SRE) to join our dynamic and innovative team. As an SRE, you will play a crucial role in ensuring the high availability, performance, and reliability of our large-scale applications running on various cloud platforms, including AWS, GCP, and Azure. Your expertise in infrastructure as code (Terraform) and utilization of APM tools will be instrumental in optimizing our systems and delivering exceptional user experiences.
- Collaborate with cross-functional teams to design, implement, and maintain scalable and fault-tolerant cloud infrastructure on AWS, GCP, and Azure.
- Utilize infrastructure as code (Terraform) to automate the provisioning and configuration of cloud resources, ensuring consistency and efficiency across environments.
- Monitor application performance, system health, and latency using APM tools such as New Relic or DataDog to proactively identify and resolve issues.
- Create and maintain detailed runbooks to standardize incident response procedures and facilitate faster resolution.
- Conduct post-mortems for incidents, analyzing root causes, and implementing preventive measures to enhance system resilience.
- Conduct periodic load testing, performance analysis, and capacity planning to optimize application performance and resource utilization.
- Implement and manage automated CI/CD pipelines to enable seamless and efficient deployments.
- Participate in incident response and on-call rotations to maintain the highest level of system availability and reliability.
- Develop and enforce best practices for security, monitoring, and disaster recovery to safeguard data and minimize downtime.
- Collaborate with development teams to identify performance bottlenecks, troubleshoot issues, and provide technical guidance.
- Automate repetitive tasks and streamline operational processes using scripting languages (Python, Bash) and configuration management tools.
- Drive continuous improvement initiatives to enhance system performance, scalability, and stability.
- Stay up-to-date with industry trends, emerging technologies, and best practices related to Site Reliability Engineering.
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Minimum of 5 years of hands-on experience as a Site Reliability Engineer or similar role, managing cloud infrastructure and supporting large-scale applications.
- Strong expertise in working with cloud providers such as AWS, GCP, and Azure.
- Proficient in infrastructure as code (IAC) tools, with a focus on Terraform for managing cloud resources.
- Demonstrated experience with APM tools like New Relic and DataDog for monitoring and optimizing application performance.
- Familiarity with continuous integration and continuous deployment (CI/CD) practices and tools.
- Solid understanding of high availability, fault tolerance, and disaster recovery strategies for mission-critical applications.
- Proven experience in automating tasks, configuration management, and scripting using Python, Bash, or similar technologies.
- Strong problem-solving skills and the ability to think critically under pressure.
- Excellent communication and teamwork skills to collaborate effectively with diverse teams.
- Relevant certifications in AWS, GCP, Azure, or related technologies will be a plus.
What Sets You Apart:
- APM expertise: Proven ability to utilize APM tools to monitor and optimize application performance.
- Runbook proficiency: Skilled in creating comprehensive runbooks for efficient incident resolution.
- Application growth: Proactively identify and implement optimizations for scalable application growth.
- Collaboration: Strong team player, facilitating cross-functional efforts and fostering seamless communication.
In summary, your combination of APM knowledge, runbook expertise, application growth skills, and collaborative nature make you an exceptional Site Reliability Engineer, perfectly positioned to contribute significantly to the success of our company's mission-critical systems.
Offers are contingent upon the successful completion of a background check. This may include but is not limited to substance testing, education, employment, references, state and federal licensure and certifications, criminal history, Office of the Inspector General (OIG) and General Services Administration (GSA) exclusions checks.
For Colorado, Nevada, and New York-based employment: In accordance with the Pay Transparency laws the pay range for this position is $142,000 to $177,000. The compensation package may include stock options, plus a range of medical, dental, vision, financial, generous PTO, stipends for professional development, and wellness benefits. Final compensation for this role will be determined by various factors such as a candidate's relevant work experience, skills, certifications, and geographic location. The range listed only applies to Colorado, Nevada, and New York.
At Pager, we value diversity and always treat all employees and job applicants based on merit, qualifications, competence, and talent. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.