{bc}

Lead Engineer - HPC Operations

Core42Abu Dhabi, UAEYesterdayMid-Senior
Mid-Seniorfulltime

Skills

engineeringdesignproject management

About This Role

Overview

Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally.

Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs.

With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning itself at the forefront of AI innovation in the Middle East and beyond.

The opportunity

We are seeking a highly skilled Lead Engineer – HPC Operations to oversee the daily operations and support of high-performance computing clusters designed to power large-scale AI and ML workloads.

This role ensures stable, secure, and high-performing infrastructure leveraging technologies such as Slurm, Kubernetes, and modern MLOps platforms.

The ideal candidate will bring deep technical expertise in HPC and a strong operational mindset to drive continuous improvement and automation across globally distributed environments.

Responsibilities

  • will extend to collaborating with multidisciplinary teams, leading complex projects, implementing cutting-edge technologies, and providing mentorship to operations engineers.
  • Your key responsibilities
  • Oversee the daily operational management of HPC infrastructure, including compute, storage, networking, and scheduler components (e.g., Slurm, Kubernetes, etc.).
  • Drive efforts to optimize the efficiency and performance of HPC systems, ensuring maximum resource utilization and minimizing downtime.
  • Serve as the primary technical escalation point for L2 support teams, ensuring rapid and effective resolution of incidents and service requests.
  • Continuously monitor system health, performance, and resource utilization using advanced monitoring tools (e.g., Prometheus, Grafana, DCGM).
  • Manage user environments for AI/ML workloads, including container orchestration (e.g., Docker, Kubernetes) and workflow tools (e.g., MLflow, Kubeflow).
  • Define and enforce job scheduling policies, priorities, and partitions within Slurm and/or Kubernetes environments to ensure resource fairness, efficiency, and workload optimization.
  • Lead root cause analysis (RCA) of operational issues, contributing to post-mortem documentation and driving continuous improvement initiatives.
  • Provide mentorship and technical guidance to junior engineers, fostering skills development and knowledge sharing across teams. Participate in on-call rotation as necessary.
  • Ensure adherence to security and operational policies, assisting in audits and maintaining documentation for change and incident management processes.
  • What we’re looking for
  • (a) Required skills / qualifications
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
  • Minimum of 8 years of experience in HPC operations, systems engineering, or DevOps roles, with at least 2 years in a leadership or ownership capacity.
  • Advanced expertise in configuring, optimizing, and maintaining complex HPC environments, including hardware, software, and storage systems.
  • Hands-on experience managing Slurm clusters and/or Kubernetes-based environments for AI/ML workloads.
  • In-depth knowledge of GPU resource management, workload schedulers, and performance tuning for AI/ML workloads.
  • Proficiency with monitoring and observability frameworks such as Prometheus, Grafana, and DCGM.
  • Strong scripting and automation skills, including Python, Bash, Ansible, and Terraform.
  • Solid understanding of Linux (RHEL/CentOS/Ubuntu), networking technologies (RDMA, InfiniBand, RoCE), and storage solutions (NFS, Lustre, Ceph).

Your resume, rewritten for this exact role.

Sign up free — Base Career tailors your CV to this job description in 60 seconds.

01 / 05

Resume Tailored to This Job

Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.

Get My Free Resume

Free · No card · 60 seconds

02 / 05

Cover Letter for This Role, Done

Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.

Get My Cover Letter

Free · No card · 60 seconds

03 / 05

See How Well You Fit This Role

See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.

Check My Fit Score

Free · No card · 60 seconds

04 / 05

Apply in One Click

Apply in One Click

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.

Start Applying Faster

Free · No card · 60 seconds

05 / 05

Track It. Follow Up at the Right Time.

Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.

Track My Applications

Free · No card · 60 seconds

Similar Jobs

Lead Engineer Project - Site

TALENTMATE · Abu Dhabi

Mid-Seniorfulltime

Job Description Title Lead Engineer, Project - Site "Belong, Connect, Grow, with KBR! The KBR team of teams delivers future-forward science, technology and engineering solutions and mission-critical services that help go

Skills

engineeringdesignproject management

Lead Engineer, Project - Site

KBR · Abu Dhabi

fulltime

Title: Lead Engineer, Project - Site"Belong, Connect, Grow, with KBR! The KBR team of teams delivers future-forward science, technology and engineering solutions and mission-critical services that help governments and co

Skills

engineeringdesignproject management

Lead Engineer, Project - Site

KBR · Abu Dhabi

fulltime

Title: Lead Engineer, Project - Site"Belong, Connect, Grow, with KBR! The KBR team of teams delivers future-forward science, technology and engineering solutions and mission-critical services that help governments and co

Skills

engineeringdesignproject management

Lead Engineer - Storage & Data Protection

Core42 · Abu Dhabi Emirate

Mid-Seniorfulltime

About Us Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovere

Skills

engineeringdesignproject management

Virtual Recruitment Event | Lead Engineer - Qatar Executive | Qatar Airways Group

Qatar Airways ·

Mid-Seniorfulltime

General Information Ref # 2600003P Location: Qatar-Doha Job family: Engineering Closing Date: 2026-05-31 Description About the role Directly responsible for the serviceability and airworthiness of the Qatar Executive (QE

Skills

Recruitment

Lead Engineer

Schneider Electric · Dubai

Mid-Seniorfulltime

Who are we and what we stand for? Our purpose and mission are what guides us and represents our promise to all our stakeholders – customers, partners, employees, influencers, shareholders, and communities. Schneider’s pu

Skills

engineeringdesignproject management

Lead Engineer

Schneider Electric · Dubai

Mid-Seniorfulltime

Who are we and what we stand for? Our purpose and mission are what guides us and represents our promise to all our stakeholders – customers, partners, employees, influencers, shareholders, and communities. Schneider’s pu

Skills

engineeringdesignproject management

Lead Engineer – CRM Platform

RAKBANK · Abu Dhabi

Mid-Seniorfulltime

Job Description What You will be doing: Power App Development Designing apps using Dataverse tables, forms, views, and dashboards. Understanding app navigation, sitemap configuration, and responsive design. Dataverse

Skills

engineeringdesignproject management

Lead engineer, Construction - Mechanical

Penspen · Abu Dhabi

Mid-Seniorfulltime

Penspen is looking for a Lead engineer, Construction - Mechanical for one of the PMC projects in Abu Dhabi. A Lead Mechanical Construction Engineer oversees site execution, technical submittals, and MEP installations (HV

Skills

engineeringdesignproject management

2.2K+

Cover Letters & Follow-ups

1.8K+

Resumes Tailored

190.5K+

Jobs Tracked

Trusted by professionals at

PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Free plan · No credit card required