Lead Site Reliability Engineer
Job Fit Check
Base Career helps you apply smarter for this job.
Key skills for this role
About the Role
Project description Luxoft partner with next-generation digital bank, built from the ground up to deliver seamless, secure, and scalable financial services. Our platform is cloud-native, API-first, and focused on reliability, speed, and security.
Key Skills for This Role
Full Job Posting
Overview
Project description
Luxoft partner with next-generation digital bank, built from the ground up to deliver seamless, secure, and scalable financial services.
Our platform is cloud-native, API-first, and focused on reliability, speed, and security.
We are growing fast and looking for top-tier Site Reliability / Ops Engineers to join our core team and help run and scale our infrastructure.
As a Site Reliability Engineer, you will be responsible for maintaining and scaling our core infrastructure, ensuring our banking services remain available, secure, and performant.
You will work closely with development, product, and security teams to automate operations, manage cloud infrastructure, and uphold high availability standards.
Responsibilities
- Ownership
- Lead the design, operation, and continuous improvement of cloud infrastructure, Kubernetes platforms, and reliability practices across production environments.
- Direct and develop a team of 3-5 engineers, combining mentoring with clear delivery ownership, coaching, and performance leadership.
- Establish and drive standards for observability, deployment safety, incident management, self-service platform capabilities, and reusable golden-path engineering practices.
- Build automation across infrastructure provisioning, CI/CD workflows, and operational processes to improve consistency, resilience, and delivery efficiency.
- Collaboration
- Partner with engineering, product, platform, and security teams to improve reliability, scalability, and secure-by-default operations.
- Align stakeholders on platform standards, operational readiness, and adoption of engineering practices, using strong documentation and influence rather than relying only on formal authority.
- Provide clarity and direction in complex environments by balancing delivery needs, team development, and cross-functional priorities.
- Solutioning
- Solve complex reliability and infrastructure problems by balancing availability, security, performance, cost, and delivery speed.
- Guide technical decisions across AWS, multi-cluster Kubernetes, blue-green deployments, service mesh, and distributed production systems.
- Define and operationalize SLOs, SLIs, error budgets, monitoring, alerting, and post-incident improvement practices.
- Support resilient production systems through strong debugging, fault-tolerant design, and practical security and compliance controls.
Skills
- Must have
- 12+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Cloud Infrastructure, or related production engineering roles.
- 2+ years operating at Staff Engineer, Lead Engineer, or equivalent senior technical level.
- 2+ years supporting production-grade microservices environments at scale.
- Strong hands-on expertise with AWS, Kubernetes, multi-cluster operations, Terraform, Helm, kubectl, CI/CD, and tools such as Jenkins.
- Strong experience with observability and incident management tooling such as Prometheus, Grafana, and OpenSearch.
- Experience building self-service platform capabilities, reusable platform standards, and scalable operational practices.
- Strong understanding of Zero Trust architecture, OAuth2, ZTNA, IAM, secrets management, certificates, and access controls.
- Experience working in regulated or high-control environments with standards such as PCI DSS, ISO 27001, and MAS TRM.
- Experience supporting distributed systems and data platforms, including microservices reliability, PostgreSQL, Kafka, Cassandra, and fault-tolerant architectures.
- Strong leadership, decision-making, stakeholder influence, and technical documentation skills.
- Success KPIs
- Production platforms meet agreed reliability, availability, and recovery targets.
- Deployment and operational workflows become more automated, repeatable, and low risk.
- Platform standards and self-service practices are adopted across teams.
- Recurring incidents and operational toil are reduced through better engineering design and automation.
- Team capability, ownership, and execution quality improve through effective people leadership.
- The role delivers visible business and organizational impact, not only technical delivery.
Apply for this job in 1 click
Skip the repetitive application forms
Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.
Trusted by over 500,000 job seekers on Base Career
More from this employer
More jobs at Luxoft
Treasury Front Office Business Analyst
Dubai, UAE
Project Overview: We are embarking on a strategic transformation initiative to upgrade our client's Treasury Management System, FMC (Finmechanics Converge) . We are seeking experienced Treasury and Capital Markets profes
Data Migration BA
الرياض, KSA
Project description Large Bank in Saudi is looking for migration functional lead role for its Murex greenfield implementation project. ##### Responsibilities Develop MXML import and export workflows utilizing MXML Exchan
Treasury Murex Consultant-Environment Technical Support
Abu Dhabi, UAE
Project Description: Leading Bank in UAE is looking for murex consultants to support bank in their treasury division transformation. Responsibilities: Consultant involves in providing technical and functional expertise
Platform Integration Specialist
Abu Dhabi, UAE
Project Description: We are seeking a highly skilled Platform Integration Specialist to join our Platform Integration Team. The successful candidate will be responsible for implementing and managing complex system integr
Murex ERM Risk Specialist
Abu Dhabi Emirate, UAE
Project Description: The Senior Engineer will be a key contributor within the Global Markets Risk Service team, supporting the delivery and enhancement of risk systems and processes. This role involves close collaboratio
Palantir Foundry Engineer
Abu Dhabi, UAE
Project description We are seeking a Palantir Foundry & AIP Engineer with hands-on experience across the full Foundry ecosystem and Palantir’s Artificial Intelligence Platform (AIP). This role goes beyond data engineerin
Lead Data Platform Engineer (Palantir or Databricks)
Abu Dhabi, UAE
Project description Role Description - We are seeking an expert with deep proficiency as a Platform Engineer, possessing experience in data engineering. This individual should have a comprehensive understanding of both d
Lead QA Automation Engineer
Abu Dhabi Emirate, UAE
Project description We are looking for a Technical Lead QA Engineer to play a pivotal role in ensuring the quality and reliability of our data platforms. You will be responsible for developing and maintaining test automa
Treasury Front Office Business Analyst
Dubai, UAE
Data Migration BA
الرياض, KSA
Treasury Murex Consultant-Environment Technical Support
Abu Dhabi, UAE
Platform Integration Specialist
Abu Dhabi, UAE
Murex ERM Risk Specialist
Abu Dhabi Emirate, UAE
Palantir Foundry Engineer
Abu Dhabi, UAE
Lead Data Platform Engineer (Palantir or Databricks)
Abu Dhabi, UAE
Lead QA Automation Engineer
Abu Dhabi Emirate, UAE