AI Infrastructure Engineer
Skills
About This Role
Overview
The AI Infrastructure Engineer is a platform specialist responsible for architecting, building, and operating high-performance AI infrastructure to support advanced AI workloads, including LLMs, GenAI, Computer Vision, and MLOps.
This role will focus on managing GPU clusters (NVIDIA A100/H100), deploying and maintaining Red Hat OpenShift AI (RHODS), and ensuring secure, scalable, and cost-efficient AI platforms across SDD’s Sovereign Cloud and hybrid/multi-cloud environments.
The engineer will enable enterprise-grade AI adoption for 200+ government entities.
GPU & AI Platform Architecture
Design and implement GPU-based compute clusters.
Define reference architectures for LLM hosting, Vector Databases, MLOps, and high-performance storage/networking.
Fully operational GPU-based AI infrastructure.
GPU Cluster Uptime and Performance Utilization.
Reduction in Cost per Training/Inference Workload.
GPU Cluster Operations
Install, configure, and optimize core components: CUDA, cuDNN, NCCL, NVIDIA Drivers, and GPU Operators.
Implement GPU partitioning, scheduling, and performance tuning for high-end GPUs (e.g., A100/H100).
High-availability architecture for all AI workloads.
Complete documentation and runbooks.
OpenShift AI (RHODS) Management
Deploy, configure, and maintain the Red Hat OpenShift AI (RHODS) platform for multi-tenant use.
Manage the integration of NVIDIA GPU Operator for efficient GPU scheduling and support Data Scientists with Notebooks, Training, and Inference Endpoints.
Production-ready OpenShift AI (RHODS) platform.
AI Project Onboarding Speed.
LLM & Model Serving
Build and manage infrastructure for hosting and serving open-source LLM frameworks (Llama, Falcon, Mistral) and supporting RAG pipelines, LoRA adapters, and Vector Databases (Milvus, pgvector).
Multi-model LLM serving environment for entities.
MLOps Pipeline Success Rate and Deployment Frequency.
MLOps & Automation
Implement IaC (Terraform, Ansible) and GitOps for the automated lifecycle management of the AI platform (node onboarding, scaling, model rollout/rollback).
Build robust MLOps pipelines for data prep, training, evaluation, and monitoring (using tools like MLflow/Kubeflow).
Infrastructure automation via Terraform & Ansible.
Automation Coverage for AI Infrastructure.
& Experience
- Experience: 7–12 years in Cloud Infrastructure, DevOps, ML Infrastructure, or Platform Engineering.
• Deep Hands-On Expertise
- GPU Systems (NVIDIA A100/H100), Linux, Containers, and Kubernetes.
- OpenShift AI (RHODS) or equivalent Kubernetes GPU orchestration.
- LLM Hosting (Llama, Mistral, Falcon, etc.) and supporting Vector Databases/RAG systems.
- Strong Experience In: TensorFlow, PyTorch, Hugging Face, Distributed Training (DDP, Deep Speed), and ML Ops Stacks (ML flow, Kubeflow).
Essential Skills & Competencies
- Technical: Deep understanding of GPU compute, HPC architectures, and ML performance profiling. Strong skills in IaC (Terraform/Ansible), CI/CD, and OpenShift/Kubernetes operators.
- Soft Skills: Strong troubleshooting, optimization, and performance engineering mindset. Excellent cross-functional collaboration and documentation skills.
Preferred Certifications
- NVIDIA Deep Learning / AI Infrastructure Certification
- Red Hat OpenShift AI specialization
- Kubernetes CKA/CKAD
- Azure AI or Oracle Cloud AI certifications
- Terraform & Ansible certifications
Your resume, rewritten
for this exact role.
Sign up free — Base Career tailors your CV to this job description in 60 seconds.
01 / 05
Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.
Free · No card · 60 seconds
02 / 05
Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.
Free · No card · 60 seconds
03 / 05
See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.
Free · No card · 60 seconds
04 / 05
Apply in One Click

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.
Free · No card · 60 seconds
05 / 05
Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.
Free · No card · 60 seconds
Similar Jobs
Project Manager - AI Infrastructure
Open Innovation AI · Abu Dhabi Emirate
Company Description Open Innovation AI is a global technology company that specializes in developing advanced solutions for managing AI workloads. Its flagship product, the Open Innovation Cluster Manager (OICM), orchest
Skills
1 weeks ago
Apply Now↗Apply Now ↗Solution Engineering- Cloud And AI Infrastructure
TALENTMATE · Dubai
Overview Job Description Are you curious, enthusiastic about infrastructure, and ready to solve complex challenges in the AI era? Join us as a Cloud & AI Solution Engineer focused on the Azure Platform for commercial cus
Skills
1 weeks ago
Apply Now↗Apply Now ↗DevOps Engineer - AI Infrastructure & GPU Orchestration
NEXUS AIDC INC · Dubai
Company Description NEXUS is revolutionizing the data center industry with the first AI-native Data Center Operating System. Addressing the growing complexity of AI-driven workloads and infrastructure, our platform unifi
Skills
2 weeks ago
Apply Now↗Apply Now ↗Senior Software Engineer – AI Infrastructure
Kraken ·
Building the Future of Crypto Our Krakenites are a world-class team with crypto conviction, united by our desire to discover and unlock the potential of crypto and blockchain technology. What makes us different? Kraken i
3 weeks ago
Apply Now↗Apply Now ↗AI Infrastructure Engineer (GPU) - Remote EMEA
Pragmatike · Dubai
Location: Fully remote (EMEA timezone) Start date: ASAP Languages: Fluent English required Industry: Cloud Computing / AI / European Deep-Tech SaaS About The Role Pragmatike is recruiting on behalf of a fast-scaling, wel
Skills
3 weeks ago
Apply Now↗Apply Now ↗Cloud Solution Architecture- Cloud & AI Infrastructure
Microsoft ·
Overview With more than 45,000 employees and partners worldwide, the Customer Experience and Success (CE&S) organization is on a mission to empower customers to accelerate business value through differentiated customer e
Skills
3 weeks ago
Apply Now↗Apply Now ↗Project Manager - AI Infrastructure
Open Innovation AI · Abu Dhabi Emirate
Company Description Open Innovation AI is a global technology company that specializes in developing advanced solutions for managing AI workloads. Its flagship product, the Open Innovation Cluster Manager (OICM), orchest
Skills
1 months ago
Apply Now↗Apply Now ↗Solutions Architect - AI Infrastructure
Open Innovation AI · Abu Dhabi Emirate
Company Overview Open Innovation AI is a global technology company that specializes in developing advanced solutions for managing AI workloads. Its flagship product, the Open Innovation Cluster Manager (OICM), orchestrat
Skills
1 months ago
Apply Now↗Apply Now ↗Senior Azure AI Infrastructure Architect
Acenet consulting · Abu Dhabi
Experience: 10 to 15 years Location: Abu Dhabi Job code: 101484 Posted on: Apr 17, 2026 About Us: AceNet Consulting is a fast-growing global business and technology consulting firm leveraging a consultative approach, dee
Skills
1 months ago
Apply Now↗Apply Now ↗2.2K+
Cover Letters & Follow-ups
1.8K+
Resumes Tailored
190.5K+
Jobs Tracked
Trusted by professionals at
Stop applying blindly.
Start getting hired.
Base Career automates the hardest parts of job searching — apply smarter, not harder.
AI Resume in 60s
Your resume rewritten for this exact role using the job description as the brief.
ATS-Optimized
Get past automated screening filters with the right keywords matched to each job.
Application Tracker
Track every job, follow-up, and interview in one visual kanban board.
Free plan · No credit card required