Site Reliability Engineer - Observability

Mirai Arabian International Company LimitedRiyadh, KSA2 weeks agoSenior

Seniorparttime

Skills

engineeringdesignproject managementmaintenancequality controltechnical

Generate Resume for this Job

Via NaukriGulf·

About This Role

We are hiring an SRE focused on observability, automation, and runtime reliability for AI platforms and internal agentic systems. This is not a generic SOC role. It is an engineering role for someone who builds telemetry, automates findings-to-fix loops, improves production readiness, and keeps AI systems measurable, resilient, and controllable in production.

Tech stack

Python for automation and workflow integration
Observability tooling: metrics, logs, traces, OpenTelemetry, Datadog or adjacent stacks
AWS logging, telemetry, IAM-aware diagnostics, and infrastructure scripting
CI/CD integration for runtime checks, rollback drills, and policy validation
Nice to have: Wiz, CrowdStrike, Orca, GuardDuty, WAF / RASP-style controls, MCP / agent telemetry

Responsibilities

Design and operate the telemetry and observability layer for AI platforms, including audit trails, tool-call logs, correlation IDs, traces, and runtime visibility across service boundaries.
Build automated findings-to-fix loops for AI and cloud platforms, integrating signals from tooling such as Wiz, Astrix, or future AI security products into pragmatic remediation workflows.
Implement reliability and hardening controls for internal AI systems, including alerting, health checks, rollback drills, kill-switch validation, rate limiting, and drift detection.
Codify detections, policies, and operational checks as code where they reduce toil, prevent regressions, and improve platform control.
Review platform and AI-application changes from a reliability and application-hardening perspective, especially around secrets, telemetry, external calls, risky MCP usage, and production readiness.
Own AI-platform-specific operational readiness and partner with central IT / EAS / SOC teams for escalations, handoffs, and shared incident workflows when needed.
Continuously improve production readiness through automation, post-incident learning, and repeatable playbooks for AI runtime issues.

Similar Jobs

Infrastructure & Site Reliability Engineer – Datacentre AI Engineering - Riyadh, KSA

Qualcomm · Riyadh

Company Qualcomm Middle East Information Technology Company LLC Job Area Engineering Group, Engineering Group > Software Test Engineering General Summary About Us Qualcomm is growing its presence in Riyadh and is hiring

Seniorfulltimeengineeringdesign

4 days ago

Generate Resume ↗

AI Infrastructure Nutanix Site Reliability Engineer

emagine · Riyadh

Job Title: AI Infrastructure Nutanix Site Reliability Engineer Location: Saudi Arabia Nationality: Saudi Nationals only Experience: 5+ years Job Overview: We are seeking an experienced AI Infrastructure Site Reliability

Mid-SeniorfulltimeAWSAzure

1 weeks ago

Generate Resume ↗

Nutanix AI Site Reliability Lead Engineer

emagine · Riyadh

Nationality: Saudi Nationals only We are seeking an experienced Site Reliability Lead Engineer to act as the on-site technical lead for Nutanix AI infrastructure environments. The role is responsible for driving reliabil

Mid-SeniorfulltimeDevOpsExcel

1 weeks ago

Generate Resume ↗

Site Reliability Engineering Officer

Takamol Holding · Riyadh

Job Description Job description : Provide support for application incidents across digital platforms, working closely with Platform Engineering, Application Development, and customer support teams to ensure timely resol

EntryfulltimeElasticsearchGit

1 weeks ago

Generate Resume ↗

Site Reliability Engineer

S2 Global · Riyadh

Overview S2 Global is seeking a skilled and motivated Site Reliability Engineer (SRE) to implement, maintain, and support deployments of our CertScan platform. As part of our systems engineering team, you will design and

Mid-Seniorfulltimeengineeringdesign

2 weeks ago

Generate Resume ↗

Senior Site Reliability Engineer

HALA · Riyadh

Who Are We HALA is a leading fintech player in the MENAP region that aims to redefine financial services and build the future bank of SMEs. HALA aims at empowering SMEs to start, run, and grow their businesses by providi

Mid-Seniorfulltimeengineeringdesign

1 months ago

Generate Resume ↗

Site Reliability Engineer (SRE)

PrimeGate for Communications and IT · Riyadh

About the Role: We are looking for a Site Reliability Engineer (SRE) with solid experience running production systems and working closely with development teams. The ideal candidate is comfortable with Linux, containers,

Mid-SeniorfulltimeCI/CDDevOps

1 months ago

Generate Resume ↗

Site Reliability Engineer (SRE)

Prime Gate · الرياض

Mid-SeniorfulltimeCI/CDDevOps

1 months ago

Generate Resume ↗

AI Job Platform

Stop applying blindly.
Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Start Today for Free

Free plan · No credit card required

Site Reliability Engineer - Observability

About This Role

Similar Jobs

Infrastructure & Site Reliability Engineer – Datacentre AI Engineering - Riyadh, KSA

AI Infrastructure Nutanix Site Reliability Engineer

Nutanix AI Site Reliability Lead Engineer

Site Reliability Engineering Officer

Site Reliability Engineer

Senior Site Reliability Engineer

Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)

Stop applying blindly. Start getting hired.

Stop applying blindly.
Start getting hired.