{bc}
indeed

SRE - Appzone + Dotnet

DICETEK LLC
, UAE
Yesterday
AppzoneDotnetSre
Free

Job Fit Check

Base Career helps you apply smarter for this job.

?%
Ready to Scan

Key skills for this role

AppzoneDotnetSre
Smart Apply

Full Job Posting

Overview

We’re looking for a talented **Site Reliability Engineer (SRE)** to keep our systems running smoothly, reliably, and at scale. Through smart **automation**, deep **observability**, and a calm head in a crisis, you’ll help us balance **speed**, **compliance**, and **stability**, working alongside **DevOps**, **Cloud**, **Quality Engineering**, and **Product** teams to drive continuous improvements in **performance**, **security**, and **resilience**.

You’ll play a key role in enhancing reliability, accelerating delivery, and ensuring seamless digital experiences for the bank customers.

This role reports directly to our **Lead SRE / Tribe Executive Manager**.

What You Will Be Doing

  • Define and implement **SLIs / SLOs** and **error budgets** for business-critical digital banking services.
  • Build actionable **observability** (metrics, logs, traces, dashboards, and alerts) using **Dynatrace**, **Prometheus**, **Grafana**, and **ELK**, while reducing alert fatigue.
  • Leverage **AI-driven insights** and **anomaly detection** (**Dynatrace Davis AI** or equivalent **AIOps** platform) to proactively predict and resolve reliability issues before impact.
  • Lead **incident management** — from on-call triage and root-cause analysis to blameless postmortems with actionable follow-ups.
  • Improve deployment safety with robust **rollout / rollback strategies**, **canary** and **blue-green deployments**, and **production readiness reviews**.
  • Support and optimize **microservices-based architectures**, ensuring **service reliability**, **scalability**, and **inter-service resilience**.
  • Conduct **capacity planning**, **performance tuning**, and **resilience testing**, optimizing for both reliability and cost efficiency.
  • **Automate operational toil** — from runbooks and remediation scripts to proactive health checks and self-healing workflows.
  • Collaborate with **DevOps** to embed **reliability gates** and validations into **CI / CD pipelines** (**GitHub Actions**, **Jenkins**, **GitLab CI / CD** or **Azure DevOps**).
  • Own and evolve the **observability** and **AIOps stack**, driving intelligent automation and predictive alerting capabilities.
  • Maintain high-quality **documentation**, **playbooks**, and **operational standards** across environments.
  • Ensure **operational compliance** and **security alignment** with internal controls and regulatory standards.
  • Analyze **system performance**, **availability**, and **cost data** to continually optimize operations.
  • Provide **reliability support** and escalation guidance for critical production systems during major incidents.

Experience and Qualifications

  • **5+ years of experience** in **SRE** or **DevOps** roles, building and managing large-scale, high-availability systems across **banking**, **fintech**, **e-commerce**, or other data-intensive digital ecosystems.
  • Bachelor’s degree in **Computer Science** or equivalent technical experience.
  • Strong experience with **Linux environments** and **performance troubleshooting**.
  • Proven expertise in **Terraform** and **Infrastructure as Code (IaC)** methodologies.
  • Proficiency with **Kubernetes** and **container orchestration** in **microservices** environments.
  • Hands-on experience with **AWS** (preferred); exposure to **Azure** or **GCP** is an advantage.
  • Deep knowledge of **Dynatrace (AIOps, Davis AI)**, **Prometheus**, **Grafana**, and the **ELK stack**.
  • Experience implementing **AI / ML-driven reliability** or **automation solutions** (**AIOps**, anomaly detection, predictive alerting).
  • Practical understanding of **CI / CD pipelines** (**GitHub Actions**, **Jenkins**, **GitLab CI / CD** or **Azure DevOps**).
  • Experience with **Kafka**, **RabbitMQ**, **Redis**, **Aurora**, and **RDS** databases.
  • Strong scripting or programming skills in **Python**, **Bash**, or **Go**. **The Ideal Candidate**
  • **Organized**, structured, and meticulous in approach.
  • Experienced in **cross-functional collaboration** and working with distributed teams.
  • Strong analytical mindset with **excellent troubleshooting skills** for complex production systems.
  • **Calm and composed communicator** under pressure, capable of leading during high-impact incidents.
  • **Proactive problem-solver** who anticipates issues and drives preventive improvements.
  • Passionate about **AI-driven automation**, **observability**, and **reliability engineering**.
  • Continuously learning, keeping up-to-date with **cloud-native**, **microservices**, and **SRE best practices**.
  • A **collaborative** and **adaptable team player** who thrives in a fast-paced, regulated environment and is passionate about building reliable, scalable systems that empower digital banking innovation.

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Sarah M.James T.Maya R.

Trusted by over 500,000 job seekers on Base Career

Start Free Today

More from this employer

More jobs at DICETEK LLC