{bc}

Site Reliability Engineering Officer

Takamol HoldingRiyadh, KSA1 weeks agoEntryfulltime
ElasticsearchGitJiraScalaTerraform
Generate Resume for this Job
Via LinkedIn·

About This Role

Job Description Job description :

  • Provide support for application incidents across digital platforms, working closely with Platform Engineering, Application Development, and customer support teams to ensure timely resolution according to established SLAs and escalation procedures.
  • Operate and monitor the Elastic Observability stack — including Elasticsearch cluster health, Kibana, Fleet Server, APM Server, and Elastic Agent — deployed and managed via ECK on OKE.
  • Assist with day-to-day Elasticsearch operations such as index lifecycle management (ILM), snapshot lifecycle management (SLM), data tier housekeeping (hot, warm, cold, frozen), and capacity monitoring.
  • Troubleshoot telemetry ingestion issues across logs, metrics, traces, and synthetic monitors, ensuring consistent data collection from all platforms.
  • Maintain and update Kibana dashboards, alerting rules, and saved objects under the guidance of the SRE Manager.
  • Perform root cause analysis and participate in blameless post-incident reviews to improve system reliability and reduce recurrence.
  • Collaborate with Platform Engineering to automate repetitive tasks, improve deployment pipelines, and enhance observability coverage using Terraform, Helm charts, and scripting.
  • Develop and maintain support documentation, runbooks, and knowledge base articles aligned to standardized incident response procedures.
  • Manage and prioritize incidents and requests via the ticketing system (Jira/ServiceNow), ensuring all incidents, requests, and resolutions are documented in the service management system.
  • Participate in an on-call rotation and help reduce operational toil through automation and tooling.
  • Monitor and report on key performance metrics related to incident management, including mean time to detect (MTTD) and mean time to resolve (MTTR).
  • Collaborate with cross-functional teams and vendor partners to improve overall system reliability, observability maturity, and security posture.

Job Requirements

  • Bachelor’s degree in Computer Science, IT, Engineering, or related field (or equivalent experience).
  • 1–3 years of experience in IT operations, system administration, application support, DevOps, or SRE.
  • Familiarity with Observbility tools such as Elastic Stack (Elasticsearch, Kibana, etc.), including basic querying and dashboard usage.
  • Knowledge of Linux systems and scripting (Bash, Python, or Go).
  • Understanding of monitoring, logging, and alerting concepts.
  • Experience with ITSM tools (ServiceNow, Jira, Zendesk) and ITIL practices.
  • Strong grasp of incident, problem, and change management.
  • Basic experience with cloud native enviroments and containers such as Docker and Kubernetes.
  • Strong critical thinking, troubleshooting, and communication skills.

Similar Jobs

Infrastructure & Site Reliability Engineer – Datacentre AI Engineering - Riyadh, KSA

Qualcomm · Riyadh

Senior

**Company** Qualcomm Middle East Information Technology Company LLC **Job Area** Engineering Group, Engineering Group \> Software Test Engineering **General Summary** **About Us** Qualcomm is growing its presence in Riya

GitScala

AI Infrastructure Nutanix Site Reliability Engineer

emagine · Riyadh

Mid-Senior

**Job Title:** AI Infrastructure Nutanix Site Reliability Engineer **Location:** Saudi Arabia **Nationality:** Saudi Nationals only **Experience:** 5\+ years **Job Overview:** We are seeking an experienced AI Infrastruct

AWSAzureCI/CD

Nutanix AI Site Reliability Lead Engineer

emagine · Riyadh

Mid-Senior

**Nationality:** Saudi Nationals only We are seeking an experienced Site Reliability Lead Engineer to act as the on\-site technical lead for Nutanix AI infrastructure environments. The role is responsible for driving rel

DevOpsExcelMachine Learning

Site Reliability Engineer

S2 Global · Riyadh

Mid-Senior

**Overview** S2 Global is seeking a skilled and motivated **Site Reliability Engineer (SRE)** to implement, maintain, and support deployments of our CertScan platform. As part of our systems engineering team, you will de

ScalaVAT

Site Reliability Engineer - Observability

Mirai Arabian International Company Limited · Riyadh

Senior

Seeking a Site Reliability Engineer focused on observability, automation, and reliability for AI platforms, requiring strong coding and cloud automation skills.

Site Reliability Engineer - Observability

Senior Site Reliability Engineer

HALA · Riyadh

Mid-Senior

**Who Are We** HALA is a leading fintech player in the MENAP region that aims to redefine financial services and build the future bank of SMEs. HALA aims at empowering SMEs to start, run, and grow their businesses by pro

GitScala

Site Reliability Engineer (SRE)

PrimeGate for Communications and IT · Riyadh

Mid-Senior

**About the Role:** We are looking for a Site Reliability Engineer (SRE) with solid experience running production systems and working closely with development teams. The ideal candidate is comfortable with Linux, contain

CI/CDDevOpsDocker

Site Reliability Engineer (SRE)

Prime Gate · الرياض

Mid-Senior

**About the Role:** We are looking for a Site Reliability Engineer (SRE) with solid experience running production systems and working closely with development teams. The ideal candidate is comfortable with Linux, contain

CI/CDDevOpsDocker
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Start Today for Free

Free plan · No credit card required