AI/ML Support Automation Analyst
Job Fit Check
Base Career helps you apply smarter for this job.
Key skills for this role
About the Role
Position Summary The AI/ML Support Automation Analyst will be a key member of the KSL AI Support Team, focusing on MLOps infrastructure, container orchestration, and workflow automation at a supercomputing scale.
Key Skills for This Role
Full Job Posting
Position Summary
The AI/ML Support Automation Analyst will be a key member of the KSL AI Support Team, focusing on MLOps
infrastructure, container orchestration, and workflow automation at a supercomputing scale.
Working under the
AI/ML Support Team Lead, this role is responsible for developing and maintaining secure, OCI-compliant container
images, robust CI/CD pipelines, and cloud-native MLOps workflows that enable researchers to efficiently deploy and
manage AI/ML workloads.
The Analyst will bridge the gap between cutting-edge Kubernetes-based infrastructure
and the diverse needs of the research community, contributing to governance, technical enablement, and
community development initiatives.
1MLOps and Container Development
- Providing timely and useful user support via telephone, walk-in, email, and ticketing system submissions
- for all types of inquiries.
- Maintain high customer service standards in dealing with and responding to user issues and questions.
- Develop and maintain secure, OCI-compliant, and HPC-ready AI/ML and data science software container
- images
- Design and implement robust MLOps workflows and pipelines at supercomputing scale
- Develop and maintain CI/CD pipelines for reproducible infrastructure and workflow deployment
- Design and deploy APIs for AI/ML services and inference endpoints
- Implement and manage Kubernetes-based orchestration, including CNI, CSI, and service mesh
- configurations and optimization
- Deploy and maintain container registries (Harbor) and model registries (MLFlow, Kubeflow Model
2Governance and Compliance Support
- Assist in computational readiness reviews for AI research projects
- Assist in AI model and artifact control reviews to ensure compliance with institutional standards
- Provide consultation to users on efficient resource usage for AI/ML and MLOps workflows
- Ensure container images and workflows comply with security policies and best practices
- Support the implementation of usage monitoring and reporting systems
- 3Performance and Benchmarking
- Perform performance debugging and tuning of MLOps and cloud-native workflows
- Develop and maintain AI/ML and MLOps workload benchmarks for procuring new systems
- Create and maintain regression testing workloads for existing clusters
- Deploy and maintain observability and resource monitoring stacks using Prometheus, Grafana, NVIDIA
DCGM, and Grafana Loki
- Contribute to technology evaluation and benchmarking exercises for future infrastructure investments
- 4Training and Documentation
- Create comprehensive training content for users on MLOps platforms, Kubernetes, and containerization
- Develop and maintain high-quality user documentation for automation tools and workflows
- Support the delivery of workshops on CI/CD, container orchestration, and MLOps best practices
- Contribute to knowledge transfer initiatives within the KAUST research community
- Provide one-on-one consultation to researchers on efficient use of automation infrastructure
Competencies
- Experience
- Demonstrated experience developing robust and complex MLOps pipelines
- Hands-on experience with API design and deployment
- Experience developing robust and portable CI/CD pipelines for reproducible infrastructure and workflow
- deployment
- Experience supporting researchers or working in academic/research computing settings preferred
• Technical Skills - Essential
- Kubernetes: Strong expertise in Kubernetes, Container Network Interface (CNI), Container Storage
Interface (CSI), and Service Mesh
- MLOps: Experience developing and maintaining MLOps pipelines and workflows
- CI/CD: Proficiency in building CI/CD pipelines for infrastructure and application deployment
- Containerization: Experience building secure, OCI-compliant container images
- API Development: Experience in API design, development, and deployment
- Programming: Proficiency in Python; experience with Go, Bash scripting
- Linux: Strong Linux/Unix systems administration skills
• Technical Skills - Desired
- Experience with ArgoCD, Airflow, DASK, Spark for workflow orchestration
- Experience with Kubeflow, KServe, and Seldon for ML serving and pipelines
- Experience deploying and maintaining observability stacks (Prometheus, Grafana, NVIDIA DCGM, Grafana
Loki)
- Knowledge of Model Context Protocol (MCP) and agentic frameworks
- Experience deploying inference services at scale
- Experience deploying and maintaining container registries (Harbor) and model registries (MLFlow,
Kubeflow Model Registry, Artifact Hub)
- Experience with GitOps practices and Infrastructure as Code (Terraform, Ansible)
- Experience with HPC schedulers (SLURM) and HPC-cloud integration
- Soft Skills
- Strong problem-solving and analytical abilities
- Excellent written and verbal communication skills in English
- Customer service mindset with patience for supporting diverse skill levels
- Ability to work independently and as part of a collaborative team
- Strong documentation and knowledge-sharing practices
- Cultural sensitivity for working in an international environment
Preferred Qualifications
- Experience in national laboratories or major research computing facilities
- Experience with GPU scheduling and resource management in Kubernetes
- Background in DevOps or Site Reliability Engineering (SRE)
- Contributions to open-source cloud-native or MLOps projects
- Publications or presentations on MLOps, Kubernetes, or automation topics
- Knowledge of Saudi Arabia's Vision 2030 and national AI initiatives
- Additional certifications: AWS/Azure/GCP, Terraform, NVIDIA DLI
Qualifications
- Bachelor's or master’s degree in computer science, Data Science, Computational Science, Artificial
- Intelligence, or a related field
- Certifications such as CKA (Certified Kubernetes Administrator), CKAD (Certified Kubernetes Application
- Developer), CKS (Certified Kubernetes Security Specialist), or CNPE (Certified Cloud Native Platform
- Engineer) are highly valued
Experience
- Minimum of 2 years of relevant experience
Apply for this job in 1 click
Skip the repetitive application forms
Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.
Trusted by over 500,000 job seekers on Base Career
More from this employer
More jobs at King Abdullah University of Science and Technology
Innovation Projects Advisor
Saudi Arabia, KSA
Position Summary The Innovation Associate is a dual-focused role that supports innovation programs operations and grant- funded projects monitoring. This position provides targeted grant management support, ensuring proj
Head of Utilities Electrical & Instrumentation Systems
Saudi Arabia, KSA
Job Purpose Responsible for managing the power system and electrical networks. Provides supervision and overall administrative leadership of all substations and electrical network operation and maintenance team members.
Business Coordinator
Saudi Arabia, KSA
Position Summary The Business Coordinator provides comprehensive administrative and operational support in BESE Dean’s Office. The role supports faculty, staff, and students across the Division by coordinating business t