{bc}

Site Reliability Engineer - Observability

Mirai Arabian International Company LimitedRiyadh, KSA2 weeks agoSenior
Seniorparttime

Skills

engineeringdesignproject managementmaintenancequality controltechnical
Generate Resume for this Job
Via NaukriGulf·

About This Role

We are hiring an SRE focused on observability, automation, and runtime reliability for AI platforms and internal agentic systems. This is not a generic SOC role. It is an engineering role for someone who builds telemetry, automates findings-to-fix loops, improves production readiness, and keeps AI systems measurable, resilient, and controllable in production.

Tech stack

  • Python for automation and workflow integration
  • Observability tooling: metrics, logs, traces, OpenTelemetry, Datadog or adjacent stacks
  • AWS logging, telemetry, IAM-aware diagnostics, and infrastructure scripting
  • CI/CD integration for runtime checks, rollback drills, and policy validation
  • Nice to have: Wiz, CrowdStrike, Orca, GuardDuty, WAF / RASP-style controls, MCP / agent telemetry

Responsibilities

  • Design and operate the telemetry and observability layer for AI platforms, including audit trails, tool-call logs, correlation IDs, traces, and runtime visibility across service boundaries.
  • Build automated findings-to-fix loops for AI and cloud platforms, integrating signals from tooling such as Wiz, Astrix, or future AI security products into pragmatic remediation workflows.
  • Implement reliability and hardening controls for internal AI systems, including alerting, health checks, rollback drills, kill-switch validation, rate limiting, and drift detection.
  • Codify detections, policies, and operational checks as code where they reduce toil, prevent regressions, and improve platform control.
  • Review platform and AI-application changes from a reliability and application-hardening perspective, especially around secrets, telemetry, external calls, risky MCP usage, and production readiness.
  • Own AI-platform-specific operational readiness and partner with central IT / EAS / SOC teams for escalations, handoffs, and shared incident workflows when needed.
  • Continuously improve production readiness through automation, post-incident learning, and repeatable playbooks for AI runtime issues.

Similar Jobs

Infrastructure & Site Reliability Engineer – Datacentre AI Engineering - Riyadh, KSA

Qualcomm · Riyadh

Company Qualcomm Middle East Information Technology Company LLC Job Area Engineering Group, Engineering Group > Software Test Engineering General Summary About Us Qualcomm is growing its presence in Riyadh and is hiring

Seniorfulltimeengineeringdesign

AI Infrastructure Nutanix Site Reliability Engineer

emagine · Riyadh

Job Title: AI Infrastructure Nutanix Site Reliability Engineer Location: Saudi Arabia Nationality: Saudi Nationals only Experience: 5+ years Job Overview: We are seeking an experienced AI Infrastructure Site Reliability

Mid-SeniorfulltimeAWSAzure

Nutanix AI Site Reliability Lead Engineer

emagine · Riyadh

Nationality: Saudi Nationals only We are seeking an experienced Site Reliability Lead Engineer to act as the on-site technical lead for Nutanix AI infrastructure environments. The role is responsible for driving reliabil

Mid-SeniorfulltimeDevOpsExcel

Site Reliability Engineering Officer

Takamol Holding · Riyadh

Job Description Job description : Provide support for application incidents across digital platforms, working closely with Platform Engineering, Application Development, and customer support teams to ensure timely resol

EntryfulltimeElasticsearchGit

Site Reliability Engineer

S2 Global · Riyadh

Overview S2 Global is seeking a skilled and motivated Site Reliability Engineer (SRE) to implement, maintain, and support deployments of our CertScan platform. As part of our systems engineering team, you will design and

Mid-Seniorfulltimeengineeringdesign

Senior Site Reliability Engineer

HALA · Riyadh

Who Are We HALA is a leading fintech player in the MENAP region that aims to redefine financial services and build the future bank of SMEs. HALA aims at empowering SMEs to start, run, and grow their businesses by providi

Mid-Seniorfulltimeengineeringdesign

Site Reliability Engineer (SRE)

PrimeGate for Communications and IT · Riyadh

About the Role: We are looking for a Site Reliability Engineer (SRE) with solid experience running production systems and working closely with development teams. The ideal candidate is comfortable with Linux, containers,

Mid-SeniorfulltimeCI/CDDevOps

Site Reliability Engineer (SRE)

Prime Gate · الرياض

About the Role: We are looking for a Site Reliability Engineer (SRE) with solid experience running production systems and working closely with development teams. The ideal candidate is comfortable with Linux, containers,

Mid-SeniorfulltimeCI/CDDevOps
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Start Today for Free

Free plan · No credit card required