Site Reliability Engineer
Skills
About This Role
Overview
Khazna was founded in 2012 and has grown rapidly into becoming the leading and trusted wholesale Data Center provider in the Middle East and North Africa region.
Through our Data Centers, we provide industry benchmark levels of power supply and cooling services to better serve the growing need for data center operations in the UAE and wider region.
We are seeking a
Site Reliability Engineer
to support the reliability engineering program across multiple data centers in our fleet.
Reporting to the Reliability Manager, you will be responsible for monitoring system performance, driving preventative and predictive maintenance initiatives, leading root cause analysis efforts, and collaborating with cross-functional teams to minimize downtime and enhance infrastructure resilience.
Key Accountabilities
- Monitor real-time and historical performance metrics for critical power, cooling, and IT systems.
- Analyse system data to identify trends, failure modes, and reliability risks.
- Execute Root Cause Analyses (RCA) and Failure Mode & Effects Analyses (FMEA), then drive corrective and preventive actions.
- Develop and maintain condition-based and predictive maintenance routines, leveraging IoT, data analytics, and machine learning tools.
- Support preventive maintenance programs: schedule, document, and validate maintenance activities.
- Assist in asset lifecycle planning, including upgrades, decommissioning, and end-of-life strategies.
- Contribute to capacity runway assessments to forecast infrastructure needs.
- Implement and enforce availability management plans, risk assessments, and mitigation strategies.
- Ensure data collection and reporting processes for reliability KPIs (e.g., MTBF, MTTR, availability) are standardized and accurate.
- Prepare reliability reports and dashboards; present findings and recommendations to site leadership.
- Respond to and lead failure-response efforts during site incidents, ensuring rapid recovery and root-cause follow-through.
- Maintain compliance with industry standards and regulations (Uptime Institute, ISO, ASHRAE).
- Collaborate with Operations, Engineering, Facilities, and Vendors to integrate reliability best practices into day-to-day workflows.
- Propose continuous-improvement initiatives and pilot emerging reliability technologies.
- The job holder may be required to undertake additional duties, which may be reasonably expected and forms part of the function of the job.
Minimum Qualifications
- Bachelor’s degree in mechanical, Electrical, Reliability, or related Engineering discipline.
Minimum Experience
- 3+ years of experience in reliability engineering, maintenance engineering, or a data center operations environment.
- Hands-on experience with RCA, FMEA, and predictive maintenance methodologies.
- Proficiency with monitoring platforms, data-analytics tools, and scripting (e.g., Python, R).
- Familiarity with IoT sensors, machine-learning frameworks, and condition-based monitoring systems.
- Knowledge of industry reliability standards and regulations (ISO, ASHRAE, Uptime Institute).
Job-Specific Skills (Generic / Technical)
- Strong analytical and problem-solving skills, with acute attention to detail.
- Effective communicator, able to present technical findings to diverse audiences.
- Project coordination skills and the ability to manage multiple reliability initiatives.
- Collaborative mindset, comfortable working in cross-functional teams.
- Self-starter with a continuous-improvement attitude and commitment to resilience.
Your resume, rewritten
for this exact role.
Sign up free — Base Career tailors your CV to this job description in 60 seconds.
01 / 05
Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.
Free · No card · 60 seconds
02 / 05
Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.
Free · No card · 60 seconds
03 / 05
See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.
Free · No card · 60 seconds
04 / 05
Apply in One Click

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.
Free · No card · 60 seconds
05 / 05
Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.
Free · No card · 60 seconds
Similar Jobs
Software Engineer (DevOps) - Site Reliability
Revolut ·
About Revolut People deserve more from their money. More visibility, more control, and more freedom. Since 2015, Revolut has been on a mission to deliver just that. Our powerhouse of products — including spending, saving
Skills
2 days ago
Apply Now↗Apply Now ↗Site Reliability Engineer
Socium - Teams Done Differently · Abu Dhabi
Job Title: Senior Site Reliability Engineer (SRE) Location: Abu Dhabi, UAE Work Setup: Onsite Contract Duration: 6 Months Rolling Contract General Description We are seeking a highly experienced Senior Site Reliability E
Skills
2 days ago
Apply Now↗Apply Now ↗Senior Site Reliability Engineer
Core42 · Abu Dhabi Emirate
Senior Site Reliability Engineer, Core42 – Abu Dhabi, UAE About Us Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources a
Skills
3 days ago
Apply Now↗Apply Now ↗Site Reliability Engineer (SRE) (m/f/d)
Halian | Managed Services, Recruitment Agency & Contract Staffing · Abu Dhabi Emirate
Site Reliability Engineer (SRE) Role Overview We are seeking a Site Reliability Engineer to ensure the resilience, performance, and production readiness of cloud-based AI systems. Key Responsibilities Implement resilien
Skills
3 days ago
Apply Now↗Apply Now ↗Site Reliability Engineer
Socium - Teams Done Differently · Abu Dhabi
Job Title: Senior Site Reliability Engineer (SRE) Location: Abu Dhabi, UAE Work Setup: Onsite Contract Duration: 6 Months Rolling Contract General Description We are seeking a highly experienced Senior Site Reliability E
Skills
4 days ago
Apply Now↗Apply Now ↗Site Reliability Engineer (SRE)
D4 Insight · Abu Dhabi
Location: Abu Dhabi Experience: 5–8 Years ### Role Overview: We are seeking a highly motivated Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of enterprise applications and cloud
Skills
5 days ago
Apply Now↗Apply Now ↗Site Reliability Engineer (SRE) (m/f/d)
Halian | Managed Services, Recruitment Agency & Contract Staffing · Abu Dhabi Emirate
Site Reliability Engineer (SRE) Role Overview We are seeking a Site Reliability Engineer to ensure the resilience, performance, and production readiness of cloud-based AI systems. Key Responsibilities Implement resilien
Skills
5 days ago
Apply Now↗Apply Now ↗Site Reliability Engineer (SRE)
D4 Insight · Abu Dhabi
Location: Abu Dhabi Experience: 5–8 Years Role Overview We are seeking a highly motivated Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of enterprise applications and cloud infra
Skills
5 days ago
Apply Now↗Apply Now ↗Principal Site Reliability Engineer
Core42 · Abu Dhabi Emirate
About Us Core42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core42 empowers clients to harness sovere
Skills
6 days ago
Apply Now↗Apply Now ↗2.2K+
Cover Letters & Follow-ups
1.8K+
Resumes Tailored
190.5K+
Jobs Tracked
Trusted by professionals at
Stop applying blindly.
Start getting hired.
Base Career automates the hardest parts of job searching — apply smarter, not harder.
AI Resume in 60s
Your resume rewritten for this exact role using the job description as the brief.
ATS-Optimized
Get past automated screening filters with the right keywords matched to each job.
Application Tracker
Track every job, follow-up, and interview in one visual kanban board.
Free plan · No credit card required