AI/ML/DevOps Engineer
About This Role
AI/ML/DevOps Engineer — Abu Dhabi
Own the MLOps platform that powers enterprise AI at scale.
A leading Abu Dhabi-based holding group is hiring an AI/ML/DevOps Engineer to architect, operate, and continuously improve the end-to-end MLOps and LLMOps platform for a flagship enterprise AI programme. You'll be the technical authority reviewing, governing, and signing off CI/CD, data and model pipelines, infrastructure, deployment, security, and observability — ensuring secure, scalable, and compliant delivery across environments. Reports to the AI Product Manager within the AI Excellence Centre.
What you'll own:
- Own the end-to-end MLOps/LLMOps reference architecture: ingestion → validation → feature and embedding pipelines → training and fine-tuning → evaluation → registry → deployment → monitoring — including RAG and agentic workflows.
- Architect, review, and approve CI/CD for ML and LLM systems: code, data, prompt, and model artifact versioning; build and release pipelines (Azure DevOps / GitHub Actions); automated unit, integration, and contract testing; and promotion/rollback (blue-green / canary) across dev, test, and production.
- Define and govern AI platform foundations on Azure: IaC (Bicep/Terraform), AML workspaces, AKS GPU node pools and scheduling, private networking (VNet integration / Private Link), identity (Managed Identities / PIM), secrets (Key Vault), and encryption and data residency controls.
- Review and approve production deployment patterns for model and LLM serving (AKS / KServe / AML online endpoints), including containerization, inference optimization (batching, quantization where applicable), API management, autoscaling, resiliency, and RAG runtime components (vector store, retriever, re-ranker, cache).
- Own observability and reliability for AI services: OpenTelemetry tracing, prompt and inference logs (with PII controls), latency/throughput/cost metrics, SLOs/SLIs, model performance monitoring, data and model drift detection, and LLM evaluations (quality, hallucination checks, toxicity and safety guardrails) with incident playbooks.
- Establish and enforce MLOps/LLMOps governance: dataset lineage, data quality validation (schema and tests), feature store and model registry standards, artifact provenance (SBOM/SLSA), vulnerability scanning, approval gates for model and prompt releases, and compliance-aligned documentation for model risk (intended use, limitations, evaluation results).
- Enable delivery squads — including the primary delivery partner — with "golden path" templates (AML pipelines, RAG blueprints, evaluation harnesses), reusable IaC modules, and coding standards; run deep technical design and architecture reviews and sign off production readiness (capacity, security, observability, DR) for all AI releases.
- Support the Run & Operate model by enabling issue triage and minor enhancement workflows (ticket intake → fix → controlled release), ensuring changes follow the same release governance and quality gates.
- Own the Operational Acceptance Gate: no production release without runbooks, monitoring dashboards, incident playbooks, access model, and DR test evidence.
Scope clarity: you provide platform standards, review, and sign-off — you do not replace the delivery partner's engineering, but you enforce the "golden path" and production readiness bar.
What you bring:
- 8–10 years across DevOps, SRE, and/or ML Engineering with production systems on Azure.
- Hands-on experience with Azure ML, AKS, Azure DevOps or GitHub Actions, IaC, and containerization.
- Bachelor's in Computer Science, Engineering, or equivalent experience.
Core skills required:
- Python, YAML, Docker, Helm, KQL; GitOps (Argo/Flux) awareness.
- Security in CI/CD: SAST/DAST, supply-chain security (Sigstore), secrets management (Key Vault).
- Performance testing (k6 / JMeter), contract testing, and E2E testing.
- Cost optimization and capacity planning for GPU and CPU workloads.
- Strong grasp of model serving, inference optimization, and observability tooling.
Required certification:
- Microsoft Certified: DevOps Engineer Expert (AZ-400)
Preferred certifications:
- Microsoft Certified: Azure Administrator (AZ-104) or Solutions Architect (AZ-305)
- CKA or CKAD (Kubernetes)
Location: Abu Dhabi, UAE
Employment Type: Permanent,
Full-time
Experience: 8–10 years
Salary Range: 25,000 - 33,000 (AED per month)
Similar Jobs
AI/ML/DevOps Engineer
Faze 3 Consulting · Abu Dhabi
**AI/ML/DevOps Engineer** **Own the MLOps platform that powers enterprise AI at scale.** A leading Abu Dhabi\-based holding group is hiring an **AI/ML/DevOps Engineer** to architect, operate, and continuously improve the
5 days ago
Generate Resume ↗Stop applying blindly.
Start getting hired.
Base Career automates the hardest parts of job searching — apply smarter, not harder.
AI Resume in 60s
Your resume rewritten for this exact role using the job description as the brief.
ATS-Optimized
Get past automated screening filters with the right keywords matched to each job.
Application Tracker
Track every job, follow-up, and interview in one visual kanban board.
Free plan · No credit card required