{bc}
linkedin

Site Reliability Engineer

Insight Global
Toronto, UAE
contract
Mid-Senior
4 days ago
engineeringdesignproject managementmaintenancequality controltechnical
Free

Job Fit Check

Base Career helps you apply smarter for this job.

?%
Ready to Scan

Key skills for this role

engineeringdesignproject management
Smart Apply

Full Job Posting

Required Skills & Experience

  • Strong hands-on experience with observability and monitoring tools, including Dynatrace (with deep expertise in Real User Monitoring – RUM) to track system performance and user behaviour across distributed environments
  • Experience working with ElasticSearch for log aggregation, indexing, and querying large-scale system and application data
  • Proven ability to build and maintain dashboards and visualizations using Grafana to monitor system health, performance metrics, and operational KPIs
  • Hands-on experience with containerized and cloud-native platforms, particularly OpenShift, including deployment, scaling, and management of microservices-based applications
  • Experience implementing and supporting OpenTelemetry for distributed tracing, metrics collection, and observability across complex systems
  • Solid experience working with MongoDB, including performance tuning, data modelling, and managing high-throughput data environments
  • Strong background in Site Reliability Engineering principles, including high availability, incident response, monitoring, and system resilience

Nice to Have Skills & Experience

  • Experience working with PostgreSQL for relational data management and analytics workloads
  • Exposure to CI/CD pipelines, ideally using GitHub Actions for automated testing, build, and deployment workflows
  • Familiarity with HashiCorp Vault for secrets management, security, and compliance in distributed environments

Job Description

Insight Global is seeking a Site Reliability Engineer (SRE) to support a high-impact, next-generation platform focused on real-time transcription and summarization powered by Large Language Models (LLMs).

These solutions operate in distributed branch environments and must reliably process and analyze live communication streams (e.g., phone, Webex), introducing unique challenges around latency, scalability, and resilience.

This role is centred on ensuring the reliability, performance, and scalability of complex, real-time data pipelines and distributed systems.

You will be responsible for designing and maintaining high-availability infrastructure that supports continuous ingestion, processing, and summarization of live communication data.

Day-to-day, you will monitor system health using tools such as Dynatrace, Grafana, and ElasticSearch, proactively identifying performance bottlenecks and reliability risks.

You will play a key role in implementing end-to-end observability using OpenTelemetry, enabling deep visibility into system behaviour across microservices and environments.

You will also contribute to incident management and response, driving root cause analysis, implementing preventative measures, and enhancing system resilience.

Working closely with engineering and platform teams, you will help optimize infrastructure within OpenShift, improve automation, and ensure scalable, fault-tolerant deployments.

This is a highly collaborative, hands-on role requiring a strong understanding of SRE principles, cloud-native architecture, and real-time data processing systems.

The ideal candidate is proactive, detail-oriented, and thrives in fast-paced, high-availability environments where reliability is critical to business success.

This is a unique opportunity to work on cutting-edge AI-driven platforms and play a key role in ensuring their performance and stability at scale.

"

*We may use artificial intelligence tools to assist with the screening, assessment, or selection of potential applicants for this position."*

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Sarah M.James T.Maya R.

Trusted by over 500,000 job seekers on Base Career

Start Free Today