Site Reliability Engineering Officer
About This Role
Job Description Job description :
- Provide support for application incidents across digital platforms, working closely with Platform Engineering, Application Development, and customer support teams to ensure timely resolution according to established SLAs and escalation procedures.
- Operate and monitor the Elastic Observability stack — including Elasticsearch cluster health, Kibana, Fleet Server, APM Server, and Elastic Agent — deployed and managed via ECK on OKE.
- Assist with day-to-day Elasticsearch operations such as index lifecycle management (ILM), snapshot lifecycle management (SLM), data tier housekeeping (hot, warm, cold, frozen), and capacity monitoring.
- Troubleshoot telemetry ingestion issues across logs, metrics, traces, and synthetic monitors, ensuring consistent data collection from all platforms.
- Maintain and update Kibana dashboards, alerting rules, and saved objects under the guidance of the SRE Manager.
- Perform root cause analysis and participate in blameless post-incident reviews to improve system reliability and reduce recurrence.
- Collaborate with Platform Engineering to automate repetitive tasks, improve deployment pipelines, and enhance observability coverage using Terraform, Helm charts, and scripting.
- Develop and maintain support documentation, runbooks, and knowledge base articles aligned to standardized incident response procedures.
- Manage and prioritize incidents and requests via the ticketing system (Jira/ServiceNow), ensuring all incidents, requests, and resolutions are documented in the service management system.
- Participate in an on-call rotation and help reduce operational toil through automation and tooling.
- Monitor and report on key performance metrics related to incident management, including mean time to detect (MTTD) and mean time to resolve (MTTR).
- Collaborate with cross-functional teams and vendor partners to improve overall system reliability, observability maturity, and security posture.
Job Requirements
- Bachelor’s degree in Computer Science, IT, Engineering, or related field (or equivalent experience).
- 1–3 years of experience in IT operations, system administration, application support, DevOps, or SRE.
- Familiarity with Observbility tools such as Elastic Stack (Elasticsearch, Kibana, etc.), including basic querying and dashboard usage.
- Knowledge of Linux systems and scripting (Bash, Python, or Go).
- Understanding of monitoring, logging, and alerting concepts.
- Experience with ITSM tools (ServiceNow, Jira, Zendesk) and ITIL practices.
- Strong grasp of incident, problem, and change management.
- Basic experience with cloud native enviroments and containers such as Docker and Kubernetes.
- Strong critical thinking, troubleshooting, and communication skills.
Similar Jobs
Infrastructure & Site Reliability Engineer – Datacentre AI Engineering - Riyadh, KSA
Qualcomm · Riyadh
**Company** Qualcomm Middle East Information Technology Company LLC **Job Area** Engineering Group, Engineering Group \> Software Test Engineering **General Summary** **About Us** Qualcomm is growing its presence in Riya
4 days ago
Generate Resume ↗AI Infrastructure Nutanix Site Reliability Engineer
emagine · Riyadh
**Job Title:** AI Infrastructure Nutanix Site Reliability Engineer **Location:** Saudi Arabia **Nationality:** Saudi Nationals only **Experience:** 5\+ years **Job Overview:** We are seeking an experienced AI Infrastruct
1 weeks ago
Generate Resume ↗Nutanix AI Site Reliability Lead Engineer
emagine · Riyadh
**Nationality:** Saudi Nationals only We are seeking an experienced Site Reliability Lead Engineer to act as the on\-site technical lead for Nutanix AI infrastructure environments. The role is responsible for driving rel
1 weeks ago
Generate Resume ↗Site Reliability Engineer
S2 Global · Riyadh
**Overview** S2 Global is seeking a skilled and motivated **Site Reliability Engineer (SRE)** to implement, maintain, and support deployments of our CertScan platform. As part of our systems engineering team, you will de
2 weeks ago
Generate Resume ↗Site Reliability Engineer - Observability
Mirai Arabian International Company Limited · Riyadh
Seeking a Site Reliability Engineer focused on observability, automation, and reliability for AI platforms, requiring strong coding and cloud automation skills.
2 weeks ago
Generate Resume ↗Senior Site Reliability Engineer
HALA · Riyadh
**Who Are We** HALA is a leading fintech player in the MENAP region that aims to redefine financial services and build the future bank of SMEs. HALA aims at empowering SMEs to start, run, and grow their businesses by pro
1 months ago
Generate Resume ↗Site Reliability Engineer (SRE)
PrimeGate for Communications and IT · Riyadh
**About the Role:** We are looking for a Site Reliability Engineer (SRE) with solid experience running production systems and working closely with development teams. The ideal candidate is comfortable with Linux, contain
1 months ago
Generate Resume ↗Site Reliability Engineer (SRE)
Prime Gate · الرياض
**About the Role:** We are looking for a Site Reliability Engineer (SRE) with solid experience running production systems and working closely with development teams. The ideal candidate is comfortable with Linux, contain
1 months ago
Generate Resume ↗Stop applying blindly.
Start getting hired.
Base Career automates the hardest parts of job searching — apply smarter, not harder.
AI Resume in 60s
Your resume rewritten for this exact role using the job description as the brief.
ATS-Optimized
Get past automated screening filters with the right keywords matched to each job.
Application Tracker
Track every job, follow-up, and interview in one visual kanban board.
Free plan · No credit card required