{bc}
linkedin

AI Infrastructure Engineer

Dautom
, UAE
Mid-Senior
engineeringdesignproject managementmaintenancequality controltechnical
Free

Job Fit Check

Base Career helps you apply smarter for this job.

?%
Ready to Scan

Key skills for this role

engineeringdesignproject management
Smart Apply

Full Job Posting

Overview

The AI Infrastructure Engineer is a platform specialist responsible for architecting, building, and operating high-performance AI infrastructure to support advanced AI workloads, including LLMs, GenAI, Computer Vision, and MLOps.

This role will focus on managing GPU clusters (NVIDIA A100/H100), deploying and maintaining Red Hat OpenShift AI (RHODS), and ensuring secure, scalable, and cost-efficient AI platforms across SDD’s Sovereign Cloud and hybrid/multi-cloud environments.

The engineer will enable enterprise-grade AI adoption for 200+ government entities.

Gpu & Ai Platform Architecture

Design and implement GPU-based compute clusters.

Define reference architectures for LLM hosting, Vector Databases, MLOps, and high-performance storage/networking.

Fully operational GPU-based AI infrastructure.

GPU Cluster Uptime and Performance Utilization.

Reduction in Cost per Training/Inference Workload.

Gpu Cluster Operations

Install, configure, and optimize core components: CUDA, cuDNN, NCCL, NVIDIA Drivers, and GPU Operators.

Implement GPU partitioning, scheduling, and performance tuning for high-end GPUs (e.g., A100/H100).

High-availability architecture for all AI workloads.

Complete documentation and runbooks.

Openshift Ai (Rhods) Management

Deploy, configure, and maintain the Red Hat OpenShift AI (RHODS) platform for multi-tenant use.

Manage the integration of NVIDIA GPU Operator for efficient GPU scheduling and support Data Scientists with Notebooks, Training, and Inference Endpoints.

Production-ready OpenShift AI (RHODS) platform.

AI Project Onboarding Speed.

Llm & Model Serving

Build and manage infrastructure for hosting and serving open-source LLM frameworks (Llama, Falcon, Mistral) and supporting RAG pipelines, LoRA adapters, and Vector Databases (Milvus, pgvector).

Multi-model LLM serving environment for entities.

MLOps Pipeline Success Rate and Deployment Frequency.

Mlops & Automation

Implement IaC (Terraform, Ansible) and GitOps for the automated lifecycle management of the AI platform (node onboarding, scaling, model rollout/rollback).

Build robust MLOps pipelines for data prep, training, evaluation, and monitoring (using tools like MLflow/Kubeflow).

Infrastructure automation via Terraform & Ansible.

Automation Coverage for AI Infrastructure.

Required Qualifications & Experience

  • Experience: 7–12 years in Cloud Infrastructure, DevOps, ML Infrastructure, or Platform Engineering.

• Deep Hands-On Expertise

  • GPU Systems (NVIDIA A100/H100), Linux, Containers, and Kubernetes.
  • OpenShift AI (RHODS) or equivalent Kubernetes GPU orchestration.
  • LLM Hosting (Llama, Mistral, Falcon, etc.) and supporting Vector Databases/RAG systems.
  • Strong Experience In: TensorFlow, PyTorch, Hugging Face, Distributed Training (DDP, Deep Speed), and ML Ops Stacks (ML flow, Kubeflow).

Essential Skills & Competencies

  • Technical: Deep understanding of GPU compute, HPC architectures, and ML performance profiling. Strong skills in IaC (Terraform/Ansible), CI/CD, and OpenShift/Kubernetes operators.
  • Soft Skills: Strong troubleshooting, optimization, and performance engineering mindset. Excellent cross-functional collaboration and documentation skills.

Preferred Certifications

  • NVIDIA Deep Learning / AI Infrastructure Certification
  • Red Hat OpenShift AI specialization
  • Kubernetes CKA/CKAD
  • Azure AI or Oracle Cloud AI certifications
  • Terraform & Ansible certifications

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Sarah M.James T.Maya R.

Trusted by over 500,000 job seekers on Base Career

Start Free Today

More from this employer

More jobs at Dautom