{bc}
linkedin

Site Reliability Engineer

Salt
Abu Dhabi Emirate, UAE
contract
Mid-Senior
Today
engineeringdesignproject managementmaintenancequality controltechnical
Free

Job Fit Check

Base Career helps you apply smarter for this job.

?%
Ready to Scan

Key skills for this role

engineeringdesignproject management
Smart Apply

Full Job Posting

Site Reliability Engineer (SRE) – AI Platforms

πŸ“ Abu Dhabi

⏳ 6-month initial contract - with extensions

Help Build Ai Platforms That Never Sleep

We're partnering with one of the UAE's leading financial institutions to hire

2 Site Reliability Engineers (Sres)

to support the delivery of a next-generation

Agentic AI platform

powering the future of digital banking.

This is far more than a traditional SRE role.

You'll sit at the intersection of

Site Reliability Engineering, Test Automation, Cloud Infrastructure and AI Quality Assurance

, ensuring enterprise AI services are resilient, secure, scalable and production-ready.

You'll build the automated frameworks that validate AI outputs while maintaining the reliability and performance of cloud-native applications running across Azure and AWS.

As a

Site Reliability Engineer

, you'll take ownership of

Production Readiness

across enterprise AI applications and cloud infrastructure.

You'll develop automated testing frameworks, define reliability standards, implement observability, and ensure AI-powered services consistently meet the performance, security and compliance standards expected within a highly regulated banking environment.

Working alongside AI Engineers, DevOps Engineers, Software Developers and Cloud Architects, you'll play a critical role in ensuring the platform performs reliably at scale.

Site Reliability Engineering

  • Design and implement resilient, highly available cloud platforms.
  • Build automated recovery and self-healing capabilities.
  • Conduct chaos engineering exercises to validate system resilience.
  • Perform load and stress testing to ensure applications can support banking-scale workloads.

Automated Testing & Quality Engineering

  • Develop automated regression testing frameworks using Python.
  • Integrate automated testing into CI/CD pipelines.
  • Validate both application functionality and infrastructure deployments.
  • Ensure every release meets enterprise quality standards before production.

Ai Quality & Validation

  • Build automated frameworks to evaluate AI model responses.
  • Validate prompt behaviour and AI-generated outputs.
  • Measure AI performance using established evaluation techniques and custom benchmarks.
  • Help improve the reliability and consistency of enterprise AI services.

Observability & Monitoring

  • Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
  • Develop dashboards and proactive monitoring solutions.
  • Configure intelligent alerting to identify issues before they impact customers.
  • Analyse platform health, reliability and performance trends.

Security & Compliance

  • Automate security testing and compliance validation.
  • Ensure AI services comply with banking security, privacy and data residency requirements.
  • Support continuous platform governance and operational excellence.

Cloud Optimisation

  • Identify inefficient infrastructure and unnecessary cloud spend.
  • Improve platform performance while optimising operational costs.
  • Support FinOps initiatives across AI workloads.

Essential Skills & Experience

  • Strong commercial experience in Site Reliability Engineering, Platform Engineering or Test Automation
  • Expert Python development skills
  • Experience with automation frameworks including

PyTest, Selenium or Robot Framework

  • Hands-on experience with Azure and/or AWS cloud platforms
  • Experience building automated CI/CD testing pipelines
  • Strong understanding of cloud networking, scalability and reliability
  • Experience with infrastructure validation and Infrastructure-as-Code concepts
  • Knowledge of monitoring and observability tools including:
  • Azure Monitor
  • AWS CloudWatch
  • Grafana
  • Prometheus
  • Experience with performance testing tools such as
  • K6 or JMeter

Ai Experience

  • We're looking for engineers who have worked on
  • AI, Generative AI, Conversational AI or Agentic AI
  • projects and understand how to evaluate AI system performance.
  • Experience with any of the following would be highly beneficial:
  • Prompt Engineering
  • LLM evaluation

β€’ Retrieval-Augmented Generation (RAG)

  • AI benchmarking
  • ROUGE/BLEU scoring
  • AI quality evaluation frameworks such as

DeepEval, Ragas or LangSmith

  • Nice to Have
  • Bash scripting
  • GitHub Actions
  • Terraform (validation and governance)
  • Serverless cloud architectures
  • FinOps or cloud cost optimisation
  • Experience working within highly regulated industries such as banking or financial services
  • Why Join?
  • This is a chance to work on one of the most ambitious enterprise AI programmes in the Middle East.

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Sarah M.James T.Maya R.

Trusted by over 500,000 job seekers on Base Career

Start Free Today

More from this employer

More jobs at Salt