{bc}
indeed

Senior Infrastructure / HPC

VaporVM
الرياض, KSA
Senior
2 days ago
HpcInfrastructure
Free

Job Fit Check

Base Career helps you apply smarter for this job.

?%
Ready to Scan

Key skills for this role

HpcInfrastructure
Smart Apply

Full Job Posting

Senior HPC / Infrastructure Engineer

========================================

---------------------

We are seeking a highly experienced **Senior HPC / Infrastructure Engineer** with proven expertise in designing, deploying, and operating enterprise-scale High-Performance Computing (HPC) and AI infrastructure environments.

This role is ideal for a hands-on technical leader who has built and managed production-grade HPC platforms, GPU clusters, Kubernetes ecosystems, and AI infrastructure from the ground up.

The successful candidate will play a critical role in architecting, optimizing, and maintaining mission-critical compute environments that support advanced AI/ML, data science, and high-performance workloads.

---------------------------

  • **RHCE – Red Hat Certified Engineer (Active)**
  • **CKA – Certified Kubernetes Administrator (Active)**

• NVIDIA AI Blueprints

  • CUDA, GPU Drivers, and Performance Optimization

• CI/CD Pipeline Design & Implementation

  • Infrastructure Automation

------------------------

  • Design, deploy, and operate large-scale HPC and AI infrastructure environments from bare metal through workload orchestration.
  • Architect and manage NVIDIA GPU platforms, including BCM, AI Enterprise, GPU Operator, and AI service enablement.
  • Configure, optimize, and maintain Slurm scheduling environments for high-throughput and GPU-intensive workloads.
  • Design and operate highly available Kubernetes clusters supporting AI/ML, analytics, and containerized workloads.
  • Enable and support NVIDIA NIM services and AI Blueprint deployments for enterprise AI initiatives.
  • Administer and optimize RHEL and Ubuntu environments, ensuring stability, security, and performance.
  • Develop and maintain infrastructure automation frameworks and CI/CD pipelines for platform and application deployment.
  • Optimize performance across compute, GPU, storage, networking, and cluster resources.
  • Implement monitoring, observability, alerting, capacity planning, and operational best practices.
  • Enforce security, patch management, access controls, and compliance standards across the infrastructure stack.
  • Lead troubleshooting, root cause analysis, and resolution of complex infrastructure and platform issues.

---------------------

  • 10+ years of hands-on experience in HPC, Linux infrastructure, and enterprise platform engineering.
  • Proven track record of building and operating production-scale HPC, GPU, or AI infrastructure environments.
  • Deep expertise in Kubernetes, Slurm, Linux administration, and NVIDIA AI technologies.
  • Strong understanding of distributed systems, workload scheduling, cluster management, and performance optimization.
  • Experience supporting AI/ML, data science, and high-performance computing workloads at scale.
  • Strong analytical, troubleshooting, and problem-solving skills.
  • Ability to work across infrastructure, platform, automation, and AI enablement domains.
  • Demonstrated ownership mindset with a history of delivering reliable, scalable, and high-performing solutions.

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Sarah M.James T.Maya R.

Trusted by over 500,000 job seekers on Base Career

Start Free Today

More from this employer

More jobs at VaporVM