{bc}

Site Reliability Engineer

Socium - Teams Done Differently, UAE6 days agomid-senior level
mid-senior levelcontract

Skills

engineeringdesignproject management

About This Role

General Description

We are seeking a highly experienced Senior Site Reliability Engineer (SRE) to support and optimize mission-critical cloud and on-premises platforms across Azure and air-gapped Kubernetes environments.

This role is responsible for ensuring the reliability, scalability, availability, and security of modern application platforms running across Azure Kubernetes Service (AKS) and self-managed Rancher RKE2 clusters.

The ideal candidate will have strong expertise in Kubernetes operations, GitOps-driven deployments, infrastructure automation, monitoring and observability, and incident management within highly secure and complex enterprise environments.

The successful candidate will work closely with engineering, security, and operations teams to support high-availability systems and continuously improve operational resilience across connected and disconnected environments.

Key Responsibilities

  • Ensure reliability, availability, and performance of services running across Azure AKS and air-gapped Kubernetes (Rancher RKE2) environments while meeting strict SLAs and operational requirements.
  • Maintain scalable, resilient, and secure Kubernetes platforms including ingress controllers, storage layers, and stateful workloads.
  • Automate deployments and operational processes using Python, Go, Bash, Terraform, Bicep, and Ansible.
  • Implement and manage GitOps workflows using ArgoCD and Kustomize across cloud and on-premises environments.
  • Operate and optimize CI/CD pipelines using Azure DevOps and GitHub Actions.
  • Manage container supply chains for connected and disconnected environments, including private registry mirroring and image scanning.
  • Monitor infrastructure and application performance using Azure Monitor, Prometheus, Grafana, and OpenTelemetry.
  • Proactively identify, troubleshoot, and resolve platform and application issues to minimize service disruption.
  • Lead incident response activities, root cause analysis, and post-incident reviews while driving permanent corrective actions.
  • Develop and enforce operational best practices related to reliability, security, compliance, and platform governance.
  • Collaborate with development, platform, infrastructure, and security teams to improve system architecture and operational maturity.
  • Participate in on-call rotations supporting critical production systems.
  • Utilize ITSM processes and tools for incident, problem, and change management.
  • Support Agile, Scrum, and ITIL-aligned operational practices and assist with audit and compliance requirements.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • Minimum 10 years of experience in Site Reliability Engineering, DevOps, or Platform Engineering roles.
  • Strong expertise in Microsoft Azure cloud environments, networking, and security.
  • Hands-on experience with:

• Azure Kubernetes Service (AKS)

  • Rancher RKE2 or equivalent air-gapped Kubernetes platforms
  • Docker and Kubernetes ecosystem technologies
  • Strong scripting and automation experience using:
  • Python
  • Go
  • Bash
  • Strong Infrastructure as Code (IaC)

experience

  • using:
  • Terraform
  • Bicep
  • Ansible
  • Experience with GitOps methodologies and tools including ArgoCD and Kustomize.
  • Strong CI/CD pipeline experience using Azure DevOps and/or GitHub Actions.
  • Experience with monitoring and observability tools including Azure Monitor, Prometheus, Grafana, and OpenTelemetry.
  • Proven experience managing production incidents, troubleshooting distributed systems, and performing root cause analysis.
  • Strong understanding of high-availability systems, operational resilience, and enterprise security practices.
  • Excellent communication, stakeholder management, and collaboration skills.

Preferred Skills

  • Experience supporting air-gapped or highly regulated enterprise environments.
  • Knowledge of container security, image scanning, and private registry management.
  • Familiarity with enterprise compliance and audit processes.
  • Experience working in Agile, Scrum, and ITIL-based operational environments.
  • Exposure to large-scale enterprise modernization or cloud transformation programs.

Your resume, rewritten for this exact role.

Sign up free — Base Career tailors your CV to this job description in 60 seconds.

01 / 05

Resume Tailored to This Job

Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.

Get My Tailored Resume

Free · No card · 60 seconds

02 / 05

Cover Letter for This Role, Done

Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.

Get My Cover Letter

Free · No card · 60 seconds

03 / 05

See How Well You Fit This Role

See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.

Check My Fit Score

Free · No card · 60 seconds

04 / 05

Use Autofill When You Apply

Use Autofill When You Apply

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.

Tailor Resume First

Free · No card · 60 seconds

05 / 05

Track It. Follow Up at the Right Time.

Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.

Track My Applications

Free · No card · 60 seconds

Similar Jobs

Site Reliability Specialist (Remote)

Hire Feed · Abu Dhabi

contract

Role: Site Reliability Specialist (Remote) Location: Remote (Work from Anywhere) * Payout: $40-$70/hour Role Overview: We are hiring for one of our clients, seeking a Site Reliability Engineer to work on a contractor ba

Skills

ReliabilityRemoteSite

SITE RELIABILITY ENGINEER - DevOps (SRE)

Teletronics · Dubai

Entryfulltime

Position Overview: We are seeking a skilled and experienced DevOps Engineer to join one of our teams. The ideal candidate will be a specialist in AWS cloud services, with extensive knowledge and hands-on experience in Te

Skills

engineeringdesignproject management

SITE RELIABILITY ENGINEER - DevOps (SRE)

Teletronics · Dubai

Entryfulltime

Position Overview: We are seeking a skilled and experienced DevOps Engineer to join one of our teams. The ideal candidate will be a specialist in AWS cloud services, with extensive knowledge and hands-on experience in Te

Skills

engineeringdesignproject management

Site Reliability Engineer (SRE) (m/f/d)

Halian | Managed Services, Recruitment Agency & Contract Staffing · Abu Dhabi Emirate

Mid-Seniorfulltime

Site Reliability Engineer (SRE) Role Overview We are seeking a Site Reliability Engineer to ensure the resilience, performance, and production readiness of cloud-based AI systems. Key Responsibilities Implement resilien

Skills

engineeringdesignproject management

Linux Site Reliability Engineering Lead

Synechron · Abu Dhabi Emirate

Mid-Seniorfulltime

Synechron is a leading digital consulting firm with 16500+ collaborative employees across 43 global offices across 18 countries. From our solid financial services industry foundation, we have become a prominent global di

Skills

engineeringdesignproject management

Senior Site Reliability Engineer, Wikimedia Enterprise

Jobgether · Abu Dhabi

Mid-Seniorfulltime

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer, Wikimedia Enterprise in United Arab Emirates. This role sits at the intersection of la

Skills

engineeringdesignproject management

Senior Site Reliability Engineer

Jobgether ·

Mid-Senior

This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Site Reliability Engineer in United Arab Emirates. This role offers the opportunity to contribute to one of the m

Skills

engineeringdesignproject management

Lead Site Reliability Engineer

Avrioc Technologies · Abu Dhabi Emirate

Mid-Seniorfulltime

⚙️ HIRING: 🚀 We’re Hiring | Senior SRE / DevOps Lead | Avrioc | UAE 🇦🇪 We’re looking for a seasoned DevOps & Site Reliability Engineering (SRE) Lead to design, scale, and elevate our cloud infrastructure and observabi

Skills

AWSAzureGCP

AI-DNA Senior Site Reliability Engineer

IgniteTech · Abu Dhabi

Mid-Seniorfulltime

When the alarm fires, you are the first responder — but by the time you reach it, the AI agents you built and maintain have already validated hypotheses against years of prior incidents, parsed the logs and code paths, f

Skills

engineeringdesignproject management

2.2K+

Cover Letters & Follow-ups

1.8K+

Resumes Tailored

190.5K+

Jobs Tracked

Trusted by professionals at

PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Free plan · No credit card required