{bc}

Principal Site Reliability Engineer

Core42Abu Dhabi Emirate, UAE4 days agoMid-Senior
Mid-Seniorfulltime

Skills

engineeringdesignproject management

About This Role

About Us

Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally.

Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs.

With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning itself at the forefront of AI innovation in the Middle East and beyond.

The Opportunity

We are seeking a Principal Site Reliability Engineer to architect and lead the evolution of our globally distributed infrastructure supporting AI and private cloud workloads.

This is a high-impact technical leadership role focused on building scalable, resilient, and self-healing platforms through advanced automation and AIOps.

You will act as a technical authority, partnering with engineering, product, and leadership teams to drive autonomous service delivery, improve reliability, and enable large-scale AI innovation.

Platform Architecture & Strategy

  • Define and lead the long-term roadmap for infrastructure, CI/CD, and Kubernetes platforms
  • Design scalable, distributed systems aligned with AI/ML and HPC workloads
  • Establish standards for infrastructure-as-code and platform engineering

Automation & AIOps

  • Design and implement AI-driven automation and self-healing systems
  • Develop autonomous workflows for incident remediation and capacity optimisation
  • Evolve observability into predictive AIOps capabilities

Kubernetes & Infrastructure Engineering

  • Architect high-performance Kubernetes environments for multi-tenancy and GPU-intensive workloads
  • Optimize infrastructure for performance, scalability, and cost efficiency
  • Support advanced scheduling and orchestration frameworks for AI workloads

Observability & Reliability

  • Build and enhance observability platforms integrating metrics, logs, and tracing
  • Define SLOs/SLIs aligned with business outcomes
  • Lead root cause analysis (RCA) and promote reliability best practices including error budgets

Leadership & Technical Excellence

  • Act as the escalation point for complex system issues
  • Mentor and develop SRE and DevOps teams, driving a culture of excellence
  • Lead architectural reviews and contribute to internal Centers of Excellence

Cross-Functional Collaboration

  • Partner with product and engineering teams to balance innovation with reliability
  • Translate technical challenges into business impact for senior stakeholders
  • Influence infrastructure and platform strategy across the organisation

Required

  • **10+ years of experience**
  • in Site Reliability Engineering, Platform Engineering, or Systems Architecture
  • Proven experience designing and operating
  • large-scale distributed systems
  • Deep expertise in
  • Kubernetes environments
  • (EKS, GKE, or bare metal), including GPU workloads
  • Strong programming skills in

Python, Go, or Rust

  • Extensive experience with
  • Terraform, Helm, and infrastructure-as-code practices
  • Strong understanding of observability systems (metrics, logging, tracing)

Preferred

  • Experience with
  • AI/ML infrastructure
  • , including model serving and data pipelines
  • Familiarity with scheduling frameworks (e.g., Ray, Kueue, Volcano)
  • Experience building
  • automation or AI-driven operational tools
  • Certifications such as

CKA, AWS/Azure Solutions Architect

  • Experience influencing technical strategy across large organisations
  • What we’re looking for
  • A highly experienced and forward-thinking engineer with deep technical expertise and a passion for building resilient, scalable systems.
  • You are a strong problem solver, an influential leader, and a strategic thinker who can drive innovation while maintaining operational excellence.
  • What working at Core42 offers
  • With a diverse team of 1,100+ employees from 68 nationalities, we foster an inclusive, innovative and collaborative environment.
  • At Core42, we foster a culture grounded in trust, accountability and high performance.
  • We are united by our values: Grit, where we overcome challenges with resilience and determination, Passion, which drives us to pursue excellence in everything we do, and Impact, as we aim to inspire progress and create meaningful change.
  • Our team members thrive in an environment where each person’s contributions propel us forward, and together, we commit to achieving extraordinary results.
  • Competitive Salary: We offer an attractive salary package based on your skills and experience
  • Yearly Bonus: In recognition of your contributions, you will receive a performance-based annual bonus
  • Exclusive Discount Cards: Access special benefits with Esaad and Fazaa cards, offering discounts across a wide range of services
  • Premium Family Insurance: We provide comprehensive health coverage, including dental, vision and life insurance, ensuring the well-being of you and your family
  • Learning & Development: We offer access to top-tier learning platforms to help you grow in your career. Learn at your own pace with unlimited access to premium courses

Your resume, rewritten for this exact role.

Sign up free — Base Career tailors your CV to this job description in 60 seconds.

01 / 05

Resume Tailored to This Job

Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.

Get My Free Resume

Free · No card · 60 seconds

02 / 05

Cover Letter for This Role, Done

Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.

Get My Cover Letter

Free · No card · 60 seconds

03 / 05

See How Well You Fit This Role

See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.

Check My Fit Score

Free · No card · 60 seconds

04 / 05

Apply in One Click

Apply in One Click

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.

Start Applying Faster

Free · No card · 60 seconds

05 / 05

Track It. Follow Up at the Right Time.

Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.

Track My Applications

Free · No card · 60 seconds

Similar Jobs

Principal Site Reliability Engineer

TALENTMATE · Abu Dhabi

Mid-Seniorfulltime

Job Description About Us Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients t

Skills

CI/CDGitKubernetes

Principal Site Reliability Engineer

Core42 · Abu Dhabi

Mid-Seniorfulltime

About Us Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovere

Skills

CI/CDGitKubernetes

2.2K+

Cover Letters & Follow-ups

1.8K+

Resumes Tailored

190.5K+

Jobs Tracked

Trusted by professionals at

PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Free plan · No credit card required