{bc}

Site Reliability Engineer - L2 Support

Open Innovation AIAbu Dhabi Emirate, UAE1 weeks agoMid-Senior
Mid-Seniorfulltime

Skills

KubernetesLinuxScalaVAT
Apply with Base Career AI
Via LinkedIn·

About This Role

Company Overview

Open Innovation AI is a global technology company that specializes in developing advanced solutions for managing AI workloads. Its flagship product, the Open Innovation Cluster Manager (OICM), orchestrates complex AI tasks efficiently across diverse infrastructures. The platform is hardware-agnostic, optimized for various GPUs and accelerators hardware, and facilitates seamless integration and scalability for enterprise AI applications. Open Innovation AI focuses on optimizing and simplifying AI workload management and making AI technologies accessible to organizations of all sizes. With its innovative solutions, companies can reduce operational costs, accelerate time to value, and maximize their return on investment, ensuring that their AI strategies contribute directly to enhanced business outcomes.

Role Overview:

The Site Reliability Engineer – L2 is responsible for supporting and maintaining Open Innovation AI Products and deployments across customer environments, including secure and isolated on-premises infrastructures. This role requires strong troubleshooting skills across hardware, Linux OS, Kubernetes, middleware, and application layers.

The engineer is expected to diagnose and resolve technical incidents, applying deep product knowledge and strong analytical skills to restore service availability. The role requires solid understanding of operational processes such as Incident, Change, and Problem Management, along with a thorough grasp of the product architecture and how customers use it in production environments.

Role Responsibilities:

  • Provide L2 technical support for OICM deployments running in secure and isolated customer environments.
  • Diagnose and resolve incidents across hardware, Linux OS, Kubernetes clusters, containerized services, middleware, and platform components.
  • Perform detailed analysis of logs, system behavior, and application output to identify root causes and restore service functionality.
  • Review, validate, and execute approved changes following Change Management procedures, including system updates, configuration adjustments, and component upgrades.
  • Maintain a strong understanding of the OICM and other OI product’s architecture, its services, dependencies, and typical customer usage patterns.
  • Collaborate with L1 and Service Desk teams by providing technical guidance, clarifying issue details, and ensuring accurate ticket triage.
  • Escalate complex, code-level or product-defect issues to L3 with complete diagnostic in-formation and structured analysis.
  • Conduct on-site platform health assessments, validating Kubernetes cluster status, ser-vice integrity, system resources, and overall environment readiness.
  • Work closely with the Systems Engineering team to analyze and resolve performance is-sues across compute, storage, networking, and Kubernetes layers, and ensure that identified optimizations are reflected in the product and operational practices.
  • Update and maintain technical documentation including SOPs, runbooks, troubleshooting steps, and known-issue guides.
  • Participate in post-incident reviews, contributing technical insights and recommending improvements to prevent recurrence.
  • Ensure all activities adhere to established Incident, Change, and Problem Management processes.
  • Support on-call rotations and provide timely assistance during high-priority or critical incidents

Required experience & Qualification

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
  • 4–7 years of experience in L2 technical support, SRE, DevOps, Infrastructure Operations, or Platform Engineering roles within on-prem or secure environments.
  • Strong proficiency in Linux system administration, including troubleshooting, log analysis, service management, and performance tuning.
  • Hands-on experience with Kubernetes, container runtimes, and distributed systems deployed in on-prem environments.
  • Solid understanding of compute, storage, networking, and virtualization layers relevant to enterprise installations.
  • Practical experience with middleware and data-layer components such as Kafka, Redis, PostgreSQL, or similar technologies used in distributed on-prem environments.
  • Strong understanding of ITIL-aligned and experience operating within structured operational frameworks.
  • Ability to diagnose complex issues across multiple layers of the stack.
  • Experience working in secure, restricted, or isolated environments is an advantage.
  • Excellent analytical skills, communication abilities, and a methodical approach to troubleshooting.
  • Ability to produce clear technical documentation, including SOPs, runbooks, and investigation reports.
  • Certifications such as RHCSA/RHCE, CKA/CKAD/CKS.

Similar Jobs

Lead Site Reliability Engineer

Avrioc Technologies · Abu Dhabi Emirate

Mid-Seniorfulltime

⚙️ HIRING: 🚀 We’re Hiring | Senior SRE / DevOps Lead | Avrioc | UAE 🇦🇪 We’re looking for a seasoned DevOps & Site Reliability Engineering (SRE) Lead to design, scale, and elevate our cloud infrastructure and observabi

Skills

AWSAzureGCP

Site Reliability Engineer

Khazna Data Centers · Dubai

Entryfulltime

Khazna was founded in 2012 and has grown rapidly into becoming the leading and trusted wholesale Data Center provider in the Middle East and North Africa region. Through our Data Centers, we provide industry benchmark le

Skills

engineeringdesignproject management

Site Reliability Engineer (SRE)

D4 Insight · Abu Dhabi

Mid-Seniorfulltime

Location: Abu Dhabi Experience: 5–8 Years Role Overview We are seeking a highly motivated Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of enterprise systems and cloud infrastruc

Skills

SQLMongoDBAWS

Site Reliability Engineer (SRE)

D4 Insight · Abu Dhabi

Senior

Location: Abu Dhabi Experience: 5–8 Years ### Role Overview: We are seeking a highly motivated Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of enterprise systems and cloud infra

Skills

SQLMongoDBAWS

DevOps / Site Reliability Engineer (SRE)

integra.works · Abu Dhabi

Mid-Seniorfulltime

Job Summary We are seeking, on behalf of our customer, a skilled DevOps / Site Reliability Engineer (SRE) to build, maintain, and optimise scalable, reliable, and secure cloud-based platforms. This role focuses on ensuri

Skills

ScalaCI/CDDevOps

Site Reliability Engineer (SRE)

Dicetek LLC · Abu Dhabi

Entrycontract

Job Summary We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our IT team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our sys

Skills

SQLMongoDBAWS

Site Reliability Engineer (SRE)

Dicetek LLC · Abu Dhabi

Senior

Responsible for designing and maintaining scalable cloud infrastructure, developing software solutions, managing databases, and applying cybersecurity principles while collabora...

Skills

Site Reliability Engineer SRETechnical Operations EngineerCloud Operations Engineer

Site Reliability Engineer (SRE)

DICETEK LLC · Abu Dhabi

contract

Job Summary: We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our IT team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our sy

Skills

SQLMongoDBAWS

Site Reliability Engineer (SRE)  in Wealth, Trading / Brokerage (Fintech Domain)

TAT IT Technolgies · Abu Dhabi

Mid-Seniorcontract

Urgent requirement for Site Reliability Engineer (SRE) in W ealth, Trading / Brokerage platforms (Fintech Domain) is required for our banking clients in Abu Dhabi ,UAE Design, implement, and maintain scalable and reliabl

Skills

SQLMongoDBAWS
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Start Today for Free

Free plan · No credit card required