indeed

Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

DeepSource Technologies

الرياض, KSA

fulltime

Senior

5 months ago

Cloud ComputingInfrastructure as Code (IaC)CI/CDKubernetesDockerAnsible

Apply

Free

Job Fit Check

Base Career helps you apply smarter for this job.

Ready to Scan

Key skills for this role

Cloud ComputingInfrastructure as Code (IaC)CI/CD

Smart Apply

Full Job Posting

Role Overview

We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh.

In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.

Key Responsibilities

Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
Manage and operate GPU orchestration tools and platforms such as:
+ Nvidia Base Command Manager (critical)

+ Nvidia AI Enterprise Suite

+ Nvidia GPU and Network Operators
+ Nvidia NIMs and Blueprints
Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
+ Slurm (critical)
+ Vanilla Kubernetes
Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.
**Requirements**
Required Skills & Experience* Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
Hands-on experience with:
+ Nvidia Base Command Manager

+ Nvidia GPU/Network Operators, NIMs, Blueprints

Strong experience with Slurm and/or Kubernetes orchestration.
Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
Excellent troubleshooting and performance-tuning skills.
Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
Strong understanding of networking, security, resource allocation, and cluster management best practices.

Preferred Qualifications

Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Trusted by over 500,000 job seekers on Base Career

Start Free Today

More jobs at DeepSource Technologies

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

جدة, KSA

contract

Role Overview: The Infrastructure SME plays a critical role in ensuring that the underlying IT infrastructure fully supports Business Continuity and Disaster Recovery (BCDR) objectives, with a strong focus on DR Automati

3 days agoView →

Business Continuity & Disaster Recovery Lead - KSA

Saudi Arabia, KSA

Executive

automation lifecycle (scope, validation, execution, testing ). · Collaborate with automation teams to ... opportunities to increase automation, reduce manual effort, and minimize downtime...

3 days agoView →

Senior Cybersecurity Engineer - VM - Saudi Nationals - Jeddah, KSA

Saudi Arabia, KSA

Senior

is seeking a Senior Cybersecurity Engineer specializing in Vulnerability Management (VM ... utilizing industry-leading tools. Review security configurations across DLP, XDR, PAM ...

3 days agoView →

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

الرياض, KSA

contract

3 days agoView →

Business Continuity & Disaster Recovery Lead - KSA

جدة, KSA

contract

The BCDR & DR Automation Lead is responsible for defining, governing, and executing the organization’s Business Continuity and Disaster Recovery strategy, with a strong focus on automation. The role ensures resilience of

4 days agoView →

Senior DBA - Saudi National - Riyadh, KSA

الرياض, KSA

Seniorfulltime

The Senior DBA Specialist is responsible for managing and maintaining enterprise databases to ensure high availability, performance, security, and reliability of business systems and applications. Key Responsibilities M

4 days agoView →

Senior DBA - Saudi National - Riyadh, KSA

جدة, KSA

Seniorfulltime

4 days agoView →

Application SME (BCDR & DR Automation) - KSA

جدة, KSA

contract

Role Overview: The Application SME is responsible for ensuring application and database readiness for Business Continuity and Disaster Recovery (BCDR), with a strong focus on automation. The role drives application-level

4 days agoView →

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

جدة, KSA

3 days agocontract

Business Continuity & Disaster Recovery Lead - KSA

Saudi Arabia, KSA

3 days ago

Senior Cybersecurity Engineer - VM - Saudi Nationals - Jeddah, KSA

Saudi Arabia, KSA

3 days ago

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

الرياض, KSA

3 days agocontract

Business Continuity & Disaster Recovery Lead - KSA

جدة, KSA

4 days agocontract

Senior DBA - Saudi National - Riyadh, KSA

الرياض, KSA

4 days agofulltime

Senior DBA - Saudi National - Riyadh, KSA

جدة, KSA

4 days agofulltime

Application SME (BCDR & DR Automation) - KSA

جدة, KSA

4 days agocontract

Senior AI Infrastructure & Platform Engineer - Riyadh,KSA

Job Fit Check

About the Role

Full Job Posting

Role Overview

Key Responsibilities

+ Nvidia AI Enterprise Suite

+ Nvidia GPU/Network Operators, NIMs, Blueprints

Preferred Qualifications

Apply for this job in 1 click

More jobs at DeepSource Technologies

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

Business Continuity & Disaster Recovery Lead - KSA

Senior Cybersecurity Engineer - VM - Saudi Nationals - Jeddah, KSA

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

Business Continuity & Disaster Recovery Lead - KSA

Senior DBA - Saudi National - Riyadh, KSA

Senior DBA - Saudi National - Riyadh, KSA

Application SME (BCDR & DR Automation) - KSA

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

Business Continuity & Disaster Recovery Lead - KSA

Senior Cybersecurity Engineer - VM - Saudi Nationals - Jeddah, KSA

Infrastructure Subject Matter Expert for BCDR & DR Automation - KSA

Business Continuity & Disaster Recovery Lead - KSA

Senior DBA - Saudi National - Riyadh, KSA

Senior DBA - Saudi National - Riyadh, KSA

Application SME (BCDR & DR Automation) - KSA