My job alerts

Senior Site Reliability Engineer (SRE)

Leapwork

This job is no longer accepting applications

See open jobs at Leapwork.See open jobs similar to "Senior Site Reliability Engineer (SRE)" Headline.

Software Engineering

Gurugram, Haryana, India

Posted on Monday, January 15, 2024

Leapwork is a sophisticated piece of software, used by thousands of enterprise users every day, worldwide and across all industries. It is a hybrid application with both Windows, Mac, and Web components, and it integrates with other applications and operating systems on a low level.

We are seeking a seasoned and forward-thinking Senior Site Reliability Engineer (SRE) with a specialized focus on Microsoft Azure Cloud. As a Senior SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our Azure-based systems and applications. You will work closely with cross-functional teams to design, implement, and maintain resilient and scalable infrastructure, driving the optimization of our Azure cloud environment.

Responsibilities:

Collaborate with DevOps, Engineering, and Product teams to design, build, and deploy highly available and scalable solutions on the Azure Cloud platform.
Lead efforts to automate deployment, monitoring, and management processes using Infrastructure as Code (IaC) and configuration management tools.
Implement and maintain monitoring, alerting, and incident response processes to proactively identify and address performance issues and outages.
Develop and refine alerting systems to proactively identify and diagnose potential issues before they impact users.
Conduct thorough root cause analysis of incidents and develop strategies to prevent recurrence, continually enhancing the reliability of Azure services.
Design and implement disaster recovery and business continuity plans, ensuring data integrity and system availability in case of failures.
Optimize Azure resources for cost-effectiveness, monitoring and managing resource consumption, and providing recommendations for efficient resource utilization.
Collaborate with software engineers to influence architecture decisions, ensuring applications are designed for high availability, scalability, and operability.
Develop and maintain documentation related to system architecture, processes, and procedures to facilitate knowledge sharing and onboarding of team members.
Stay current with Azure cloud advancements, trends, and best practices, integrating relevant technologies to enhance SRE processes.
Mentor and provide technical guidance to junior SREs, fostering a culture of continuous learning and growth within the team.
Participate in on-call rotation for critical platform issues

Qualifications:

Bachelor's degree in Computer Science, Engineering, or a related technical field. Master's degree is a plus.
Proven experience (5+ years) working as an SRE with a specific focus on Microsoft Azure Cloud services.
Deep understanding of Azure services, including Azure Kubernetes Service (AKS), Azure App Service, Azure Functions, Azure Monitor, and Azure Resource Manager.
Proficiency in scripting and programming languages (e.g., PowerShell, Python) for automation, infrastructure management, and tool development.
Hands-on experience with containerization and orchestration technologies, such as Docker and Kubernetes, in an Azure context.
Strong incident management skills, with a data-driven and analytical approach to diagnosing complex issues.
Familiarity with Infrastructure as Code (IaC) tools (e.g., Terraform, ARM templates) and configuration management tools (e.g., Ansible, Chef, Puppet).
Excellent problem-solving skills, attention to detail, and a proactive attitude towards addressing operational challenges.
Effective communication and collaboration skills, with the ability to work across teams and influence technical decisions.
Experience with CI/CD pipelines and version control systems (e.g., Git).
Relevant Azure certifications (e.g., Microsoft Certified: Azure Solutions Architect Expert, Microsoft Certified: Azure DevOps Engineer Expert) are highly advantageous.
In-depth knowledge of monitoring and alerting tools like Grafana, Prometheus, Loki, and Tempo.
Analyze monitoring data to identify trends and root causes of incidents, leading to continuous improvement of system health.
A strong understanding of DevOps principles and automation practices.

Why Leapwork?

We are on an exciting journey of global growth—and this is your chance to get onboard. By joining our team, you’ll become part of a fast-paced international environment where you can grow, challenge yourself, and do what inspires you. Our motto is to work hard, but have fun while doing it, and we believe collaboration, social activities and celebration are key to success!

On top of having the greatest colleagues, we’ll provide you with top-class tools and lunch in our spacious office, located in the heart of Copenhagen, to help keep your performance high.

Our Leapwork principles

Our five key principles capture the essence of what it means to be a part of our world-class team! They are integral to how we approach our work and one another, and they serve as a roadmap to our continued growth, development, achievements, and success.

Customer first; We listen to our customers, understand their pain points and focus on what matters to them.
Lead from the front; Leading means guiding others towards the solutions to our challenges.
Get it done; We make commitments, follow through and deliver work we’re proud of.
Build excellence; We do our best work every day, holding ourselves and others to the highest standards.
Respectfully different; We treat each other with respect, always. We’re different, not indifferent.

This job is no longer accepting applications

See open jobs at Leapwork.See open jobs similar to "Senior Site Reliability Engineer (SRE)" Headline.

See more open positions at Leapwork

Find career opportunities in the Headline world.

Senior Site Reliability Engineer (SRE)