Lead Site Reliability Engineer (SRE)
About This Role
Lead Site Reliability Engineer (SRE) — Job Description Role summary
Lead reliability, scalability, and operability of distributed systems by defining SRE strategy, building platform capabilities, and driving culture and processes that reduce toil and improve uptime.
Key responsibilities
- Lead design, implementation, and operation of highly available, scalable production systems across cloud and on-prem environments.
- Define and own SLOs/SLIs, error budgets, monitoring, and alerting strategies; drive SLI/SLO adoption across teams.
- Lead incident response, post-incident reviews, root-cause analysis, and remediation; implement preventative measures.
- Build and maintain observability stacks (metrics, logs, tracing) and dashboards (Prometheus, Grafana, ELK/EFK, OpenTelemetry).
- Architect and operate CI/CD and deployment platforms (ArgoCD, Spinnaker, GitHub Actions, GitLab CI) enabling safe, automated rollouts (canary, blue/green, feature flags).
- Design, implement, and maintain self-service platform tooling for developers (Kubernetes/EKS/GKE/AKS, service meshes, operators).
- Drive Infrastructure as Code practices (Terraform, Pulumi, CloudFormation) and manage infrastructure lifecycle, drift detection, and compliance.
- Automate operational runbooks, remediation, capacity planning, and routine maintenance to minimize manual toil.
- Own reliability-related security practices: secrets management, IAM, network policies, vulnerability scanning, and secure configurations.
- Mentor and grow SRE and platform engineers; lead hiring, performance reviews, and career development.
- Partner with engineering, product, and security teams to influence design decisions for fault tolerance and operability.
- Manage on-call rotations, escalation policies, and ensure adequate coverage; coordinate across teams during major incidents.
- Drive cost optimization, observability of cloud spend, and capacity forecasting.
Required qualifications
- 7+ years in site reliability, platform, or DevOps engineering roles with progressive leadership responsibility.
- Proven experience operating production distributed systems at scale on at least one major cloud provider (AWS, GCP, or Azure).
- Deep expertise with Kubernetes and container ecosystems; experience running large clusters and multi-cluster environments.
- Strong IaC experience (Terraform required; CloudFormation/Pulumi a plus).
- Extensive experience with observability tooling (Prometheus, Grafana, ELK/EFK, Open Telemetry) and incident management platforms (PagerDuty, Ops genie).
- Solid software engineering skills (Python, Go, or similar) for automation, tooling, and reliability engineering.
- Demonstrated experience setting and enforcing SLOs/SLIs and reducing MTTR through engineering practices.
- Experience with CI/CD systems and deployment strategies (Argo CD, Spinnaker, Flux, Git Ops).
- Strong systems, networking, and security fundamentals.
- Excellent leadership, communication, and stakeholder management skills; proven ability to influence across orgs.
- Experience mentoring engineers and leading cross-functional initiatives.
Job Types: Full-time, Permanent
Pay: QAR23.71 - QAR86.45 per hour
Expected hours: 40 per week
Work Location: In person
Similar Jobs
Lead Site Reliability Engineer
Avrioc Technologies · Abu Dhabi Emirate
⚙️ HIRING: 🚀 We’re Hiring | Senior SRE / DevOps Lead | Avrioc | UAE 🇦🇪 We’re looking for a seasoned DevOps \& Site Reliability Engineering (SRE) Lead to design, scale, and elevate our cloud infrastructure and o
4 days ago
Generate Resume ↗AFCAP V SWA Transient Aircraft Services: Lead Site Manager
KBR, Inc. · Doha
Title AFCAP V SWA Transient Aircraft Services: Lead Site Manager Belong, Connect, Grow, with KBR! Program Summary KBR, through the AFCAP V Program, assists the U.S. Air Force by offering Southwest Asia Transie
1 weeks ago
Generate Resume ↗AFCAP V SWA Transient Aircraft Services: Lead Site Manager
KBR · Doha
Title: AFCAP V SWA Transient Aircraft Services: Lead Site Manager*Belong, Connect, Grow, withKBR!* Program Summary KBR, through the AFCAP V Program, assists the U.S. Air Force by offering Southwest Asia Transient Air
2 weeks ago
Generate Resume ↗Technical Manager / Lead Site Architect (Buildings Projects)
Connexa Recruitment · Dubai
Connexa Recruitment are working in partnership with an award-winning International Architectural Design practice who are looking for an Technical Manager / Lead Site Architect (Buildings Projects) in there Dubai office.
2 weeks ago
Generate Resume ↗Team Lead Site Reliability Engineer
Sana Commerce · Dubai
At Sana Commerce we're committed to an inclusive environment and recognize that our diverse work\force is one of our greatest strengths. It all started in 2007, with a pizza and a plan. Sana Commerce is an e-commerce
1 months ago
Generate Resume ↗Stop applying blindly.
Start getting hired.
Base Career automates the hardest parts of job searching — apply smarter, not harder.
AI Resume in 60s
Your resume rewritten for this exact role using the job description as the brief.
ATS-Optimized
Get past automated screening filters with the right keywords matched to each job.
Application Tracker
Track every job, follow-up, and interview in one visual kanban board.
Free plan · No credit card required