Site Reliability Engineer / Cloud Infrastructure Engineer
Job Fit Check
Base Career helps you apply smarter for this job.
Key skills for this role
About the Role
About the Role We are looking for a strong Site Reliability Engineer to own and continuously improve the reliability, scalability, security, and operational excellence of our production systems.
Key Skills for This Role
Full Job Posting
About The Role
We are looking for a strong Site Reliability Engineer to own and continuously improve the reliability, scalability, security, and operational excellence of our production systems.
You will be responsible for the infrastructure and reliability layer behind our commerce platform, including AWS, Kubernetes, networking, databases, search infrastructure, observability, CI/CD, incident response, and production operations.
This is a hands-on role with high ownership, ideal for someone who can operate independently, solve complex infrastructure problems, and help build a stable foundation for a fast-growing product.
Our stack includes AWS, EKS/Kubernetes, PostgreSQL, Redis/Valkey, Elasticsearch/OpenSearch, RabbitMQ, API Gateway, WAF, Load Balancers, Docker, Go/.NET Core microservices, Terraform web applications, and multiple third-party integrations.
Responsibilities
- Own the reliability, availability, performance, and security of our production infrastructure.
- Manage and improve our AWS cloud environment, including EKS, networking, load balancers, API Gateway, WAF, RDS/PostgreSQL, caching, and managed services.
- Operate and optimize Kubernetes workloads, deployments, scaling, resource usage, pod health, service discovery, ingress, and environment configuration.
- Maintain and improve database reliability, including PostgreSQL performance, backups, monitoring, replication awareness, connection management, and incident handling.
- Support and optimize Elasticsearch/OpenSearch usage for catalog/search workloads.
- Build and improve observability across the platform, including logs, metrics, dashboards, alerts, tracing, and actionable production monitoring.
- Improve incident response processes: detection, triage, mitigation, postmortems, and prevention of repeated issues.
- Strengthen CI/CD pipelines and release processes to make deployments safer, faster, and more reliable.
- Work closely with backend, mobile, product, and operations teams to support new features and ensure production readiness.
- Review architecture and infrastructure decisions with reliability, cost, security, and scalability in mind.
- Help secure the platform through AWS security best practices, WAF rules, IAM hygiene, network controls, secrets management, and vulnerability awareness.
- Monitor and optimize cloud costs without compromising reliability.
- Document operational procedures, runbooks, infrastructure decisions, and recovery processes.
Requirements
- Strong hands-on experience with AWS production environments.
- Solid experience with Kubernetes, Docker, deployments, services, ingress, scaling, and troubleshooting.
- Strong understanding of networking fundamentals: DNS, TLS, load balancing, routing, security groups, firewalls, private/public networking, and HTTP traffic flow.
- Experience operating PostgreSQL in production, including performance troubleshooting, backups, monitoring, and connection-related issues.
- Experience with Elasticsearch or OpenSearch in production environments.
- Good understanding of observability: metrics, logs, alerts, dashboards, tracing, SLIs/SLOs, and incident detection.
- Experience with CI/CD pipelines and modern deployment workflows.
- Ability to troubleshoot complex production issues across application, infrastructure, database, and network layers.
- Strong sense of ownership, clear communication, and the ability to operate calmly during incidents.
- Comfortable working in a fast-paced startup environment where priorities can move quickly.
What Success Looks Like
Within the first months, you will help us make the platform more stable, observable, secure, and predictable.
You will improve our production visibility, reduce recurring incidents, strengthen infrastructure ownership, and create clear operational standards for deployments, alerts, incident handling, and recovery.
We are looking for someone who does not just “manage servers,” but actively improves the engineering foundation of the company.
Location
Doha-based is preferred, but remote candidates can be considered if they are strong, reliable, and able to work with high ownership and clear communication.
Apply for this job in 1 click
Skip the repetitive application forms
Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.
Trusted by over 500,000 job seekers on Base Career
More from this employer
More jobs at Nology Store | نولوجي
Junior Accountant
Doha, QAT
Location: Qatar only Salary: Around QAR 6,000 per month We are looking for a Junior Accountant based in Qatar with accounting knowledge and audit experience. Requirements Must currently be living in Qatar Must have aud
Senior Accountant
Doha, QAT
Location: Qatar only Salary: QAR 10,000 – 12,000, depending on experience We are looking for a Senior Accountant based in Qatar with strong accounting and audit experience. Requirements Must be living in Qatar Must hav
Senior Mobile Engineer (Flutter)
Doha, QAT
About Nology Nology is a high-performance quick commerce platform based in Qatar, serving thousands of daily users. We are building a modern, scalable system with a strong focus on speed, reliability, and product quality