{bc}

Site Reliability Engineer - L2 Support

Open Innovation AIAbu Dhabi Emirate, UAE3 days agoMid-Seniorfulltime
KubernetesLinuxScalaVAT
Generate Resume for this Job
Via LinkedIn·

About This Role

Company Overview

Open Innovation AI is a global technology company that specializes in developing advanced solutions for managing AI workloads. Its flagship product, the Open Innovation Cluster Manager (OICM), orchestrates complex AI tasks efficiently across diverse infrastructures. The platform is hardware-agnostic, optimized for various GPUs and accelerators hardware, and facilitates seamless integration and scalability for enterprise AI applications. Open Innovation AI focuses on optimizing and simplifying AI workload management and making AI technologies accessible to organizations of all sizes. With its innovative solutions, companies can reduce operational costs, accelerate time to value, and maximize their return on investment, ensuring that their AI strategies contribute directly to enhanced business outcomes.

Role Overview:

The Site Reliability Engineer – L2 is responsible for supporting and maintaining Open Innovation AI Products and deployments across customer environments, including secure and isolated on-premises infrastructures. This role requires strong troubleshooting skills across hardware, Linux OS, Kubernetes, middleware, and application layers.

The engineer is expected to diagnose and resolve technical incidents, applying deep product knowledge and strong analytical skills to restore service availability. The role requires solid understanding of operational processes such as Incident, Change, and Problem Management, along with a thorough grasp of the product architecture and how customers use it in production environments.

Role Responsibilities:

  • Provide L2 technical support for OICM deployments running in secure and isolated customer environments.
  • Diagnose and resolve incidents across hardware, Linux OS, Kubernetes clusters, containerized services, middleware, and platform components.
  • Perform detailed analysis of logs, system behavior, and application output to identify root causes and restore service functionality.
  • Review, validate, and execute approved changes following Change Management procedures, including system updates, configuration adjustments, and component upgrades.
  • Maintain a strong understanding of the OICM and other OI product’s architecture, its services, dependencies, and typical customer usage patterns.
  • Collaborate with L1 and Service Desk teams by providing technical guidance, clarifying issue details, and ensuring accurate ticket triage.
  • Escalate complex, code-level or product-defect issues to L3 with complete diagnostic in-formation and structured analysis.
  • Conduct on-site platform health assessments, validating Kubernetes cluster status, ser-vice integrity, system resources, and overall environment readiness.
  • Work closely with the Systems Engineering team to analyze and resolve performance is-sues across compute, storage, networking, and Kubernetes layers, and ensure that identified optimizations are reflected in the product and operational practices.
  • Update and maintain technical documentation including SOPs, runbooks, troubleshooting steps, and known-issue guides.
  • Participate in post-incident reviews, contributing technical insights and recommending improvements to prevent recurrence.
  • Ensure all activities adhere to established Incident, Change, and Problem Management processes.
  • Support on-call rotations and provide timely assistance during high-priority or critical incidents

Required experience & Qualification

  • Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
  • 4–7 years of experience in L2 technical support, SRE, DevOps, Infrastructure Operations, or Platform Engineering roles within on-prem or secure environments.
  • Strong proficiency in Linux system administration, including troubleshooting, log analysis, service management, and performance tuning.
  • Hands-on experience with Kubernetes, container runtimes, and distributed systems deployed in on-prem environments.
  • Solid understanding of compute, storage, networking, and virtualization layers relevant to enterprise installations.
  • Practical experience with middleware and data-layer components such as Kafka, Redis, PostgreSQL, or similar technologies used in distributed on-prem environments.
  • Strong understanding of ITIL-aligned and experience operating within structured operational frameworks.
  • Ability to diagnose complex issues across multiple layers of the stack.
  • Experience working in secure, restricted, or isolated environments is an advantage.
  • Excellent analytical skills, communication abilities, and a methodical approach to troubleshooting.
  • Ability to produce clear technical documentation, including SOPs, runbooks, and investigation reports.
  • Certifications such as RHCSA/RHCE, CKA/CKAD/CKS.

Similar Jobs

Site Reliability Engineer (SRE)

D4 Insight · Abu Dhabi

Senior

**Location:** Abu Dhabi **Experience:** 5–8 Years ### **Role Overview:** We are seeking a highly motivated **Site Reliability Engineer (SRE)** to ensure the reliability, scalability, and performance of enterprise systems

SQLMongoDBAWS

Site Reliability Engineer (SRE)

D4 Insight · Abu Dhabi

Mid-Senior

**Location:** Abu Dhabi **Experience:** 5–8 Years **Role Overview** We are seeking a highly motivated **Site Reliability Engineer (SRE)** to ensure the reliability, scalability, and performance of enterprise systems and

SQLMongoDBAWS

DevOps / Site Reliability Engineer (SRE)

integra.works · Abu Dhabi

Mid-Senior

**Job Summary** We are seeking, on behalf of our customer, a skilled DevOps / Site Reliability Engineer (SRE) to build, maintain, and optimise scalable, reliable, and secure cloud\-based platforms. This role focuses on e

ScalaCI/CDDevOps

Site Reliability Engineer (SRE)

Dicetek LLC · Abu Dhabi

Entry

**Job Summary** We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our IT team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our

SQLMongoDBAWS

Site Reliability Engineer (SRE)

DICETEK LLC · Abu Dhabi

Mid-Senior

**Job Summary:** We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our IT team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of ou

SQLMongoDBAWS

Site Reliability Engineer (SRE)  in Wealth, Trading / Brokerage (Fintech Domain)

TAT IT Technolgies · Abu Dhabi

Mid-Senior

**Urgent requirement for Site Reliability Engineer (SRE) in W** **ealth, Trading / Brokerage platforms** **(Fintech Domain)** **is required for our banking clients in Abu Dhabi ,UAE** **Design, implement, and maintain sc

SQLMongoDBAWS

Lead Site Reliability Engineer

Avrioc Technologies · Abu Dhabi Emirate

Mid-Senior

⚙️ **HIRING:** 🚀 We’re Hiring \| Senior SRE / DevOps Lead \| Avrioc \| UAE 🇦🇪 We’re looking for a seasoned DevOps \& Site Reliability Engineering (SRE) Lead to design, scale, and elevate our cloud infrastructure and o

AWSAzureGCP

Site Reliability Engineer (SRE)

DICETEK LLC · Abu Dhabi

**Job Summary:** We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our IT team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of ou

SQLMongoDBAWS

Site Reliability Engineer (SRE)

DICETEK LLC · Abu Dhabi

Senior

**Job Summary:** We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) to join our IT team. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of ou

SQLMongoDBAWS
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Start Today for Free

Free plan · No credit card required