{bc}

Site Reliability Engineer – Datacentre AI Engineering - Riyadh, KSA

Qualcommالرياض, KSA2 months ago

Skills

Machine LearningScalaVAT

About This Role

-------------

Engineering Group, Engineering Group > Software Test Engineering

About Us

Qualcomm is enabling a world where everyone and everything can be intelligently connected.

You interact with products and technologies made possible by Qualcomm every day, including 5G-enabled smartphones that double as pro-level cameras and gaming devices, smarter vehicles and cities, and the technology behind the smart, connected factories that manufactured your latest purchase.

Qualcomm 5G and AI innovations are the power behind the connected intelligent edge.

You’ll find our technologies behind and inside the innovations that deliver significant value across multiple industries and to billions of people every day.

About the Role

We are recruiting for a Site Reliability Engineer – Datacentre AI Engineering at Qualcomm Technologies, Inc., located in Riyadh, Saudi Arabia.

This role centres on designing, maintaining, and scaling large-scale AI inference systems in a datacentre environment.

You will support critical AI use cases, ensuring that Qualcomm’s infrastructure is robust, reliable, and scalable for advanced machine learning workloads.

AI Infrastructure

  • Design and maintain large-scale AI Inference systems supporting critical AI use cases.
  • Ensure reliability, operability, and scalability of the Qualcomm data-centre cluster
  • Build software tools and ecosystems around AI software stacks.

AI & ML Engineering

  • Analyse software requirements and consult with architecture and hardware engineers.
  • Hands-on experience in building Agentic AI solutions, LLM orchestration and agentic AI libraries.
  • Collaborate with model, systems & software teams to improve model performance on AI100 deployments
  • Identify features that optimize workloads for multi-SoC and multi-card systems

Site Reliability Engineering (SRE)

  • Implement SRE fundamentals: incident management, monitoring, performance optimization
  • Hands-on experience with MLOps tools and practices, ensuring seamless integration of ML models into production
  • Establish operational maturity frameworks and sustainable incident response protocols.

Observability & Tooling

  • Build tools and frameworks to improve observability and define reliability metrics.
  • Monitor system health using Prometheus, Grafana, Cloudwatch, and custom telemetry.
  • Create and maintain documentation and knowledge base articles.

Automation & CI/CD

  • Design automation tools to reduce manual processes and operational overhead.
  • Ensure CI/CD reliability for agent deployment cycles.
  • Apply Infrastructure as Code practices using tools like Terraform CDK.
  • Required Skillset includes:

AI & Deep Learning

  • Experience with LLMs, NLP, Vision, Audio, and Recommendation systems.
  • Proficiency with LLM inference concepts: token streaming, batching, KV cache.
  • Proficiency in PyTorch, TensorFlow, JAX, and Ray.
  • Familiar with GPU / TPU compute, ML frameworks, checkpointing and distributed inferencing

AI Agent Operations

  • Experience supporting GenAI or agentic AI applications in production.
  • Familiarity with LLM orchestration, prompt reliability, and RAG systems.
  • Exposure to LangChain, AutoGen, and similar agent orchestration frameworks.

Programming & Software Design

  • Strong programming skills in Python with experience in PyTorch
  • Scripting (Python, Bash), configuration management (Ansible/Terraform), orchestration

Systems & Infrastructure

  • Strong Linux fundamentals: shell, systemd, containers, networking (TLS, DNS, HTTP/2, gRPC).
  • Expertise in Slurm (configuration, scheduling, plugins/extensions) or equivalent
  • Good knowledge of networking (RDMA, InfiniBand, RoCE, high-throughput, low-latency networks)
  • Experience operating and scaling distributed systems with high availability.

Observability & Monitoring

  • Hands-on experience with Prometheus, Grafana, ELK, Loki, Datadog, SIP, Homer.
  • Exposure to hardware health monitoring and system reliability.

DevOps & SRE Practices

  • Deep understanding of SDLC, release management, and system reliability.
  • Familiarity with CI/CD pipelines (Jenkins, GitLab) and Infrastructure as Code (Terraform CDK).

Qualifications & Experience

  • Bachelor's / Masters degree in Engineering, Machine learning/ AI, Information Systems, Computer Science, or related field.
  • 4-5 years’ of Software Engineering or related work experience.
  • What's on Offer
  • Apart from working with great people, we offer the below:
  • Salary including housing & transport allowance
  • Stock (RSU's) and performance related bonus
  • 16 weeks fully paid Maternity Leave
  • 6 weeks fully paid Paternity Leave
  • Employee stock purchase scheme

• Child Education Allowance

  • Relocation and immigration support (if needed)
  • Life and Medical Insurance
  • Live+ Well Reimbursement for health and recreational membership fees

Minimum Qualifications

  • Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience.

OR

Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Software Engineering or related work experience.

OR

  • PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience.
  • 2+ years of work experience with Programming Language such as C, C++, Java, Python, etc.
  • *References to a particular number of years experience are for indicative purposes only.
  • Applications from candidates with equivalent experience will be considered, provided that the candidate can demonstrate an ability to fulfill the principal duties of the role and possesses the required competencies.
  • Qualcomm is an equal opportunity employer.
  • If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process.
  • You may e-mail disability-accomodations@qualcomm.com or call Qualcomm's toll-free number found here .
  • Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process.
  • Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities.
  • We will not respond here to requests for updates on applications or resume inquiries).
  • Qualcomm expects its employees to abide by all applicable policies and procedures, including but not limited to security and other requirements regarding protection of Company confidential information and other confidential and/or proprietary information, to the extent those requirements are permissible under applicable law.
  • **To all Staffing and Recruiting Agencies** : Our Careers Site is only for individuals seeking a job at Qualcomm.
  • Staffing and recruiting agencies and individuals being represented by an agency are not authorized to use this site or to submit profiles, applications or resumes, and any such submissions will be considered unsolicited.
  • Qualcomm does not accept unsolicited resumes or applications from agencies.
  • Please do not forward resumes to our jobs alias, Qualcomm employees or any other company location.
  • Qualcomm is not responsible for any fees related to unsolicited resumes/applications.
  • If you would like more information about this role, please contact Qualcomm Careers .

Your resume, rewritten for this exact role.

Sign up free — Base Career tailors your CV to this job description in 60 seconds.

01 / 05

Resume Tailored to This Job

Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.

Get My Free Resume

Free · No card · 60 seconds

02 / 05

Cover Letter for This Role, Done

Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.

Get My Cover Letter

Free · No card · 60 seconds

03 / 05

See How Well You Fit This Role

See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.

Check My Fit Score

Free · No card · 60 seconds

04 / 05

Apply in One Click

Apply in One Click

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.

Start Applying Faster

Free · No card · 60 seconds

05 / 05

Track It. Follow Up at the Right Time.

Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.

Track My Applications

Free · No card · 60 seconds

Similar Jobs

Site Reliability Engineer

S2 Global · Riyadh

Mid-Seniorfulltime

Overview S2 Global is seeking a skilled and motivated Site Reliability Engineer (SRE) to implement, maintain, and support deployments of our CertScan platform. As part of our systems engineering team, you will design and

Skills

engineeringdesignproject management

Site Reliability Engineering Manager

Lucid Motors Middle East · Riyadh

Mid-Seniorfulltime

Leading the future in luxury electric and mobility At Lucid, we set out to introduce the most captivating, luxury electric vehicles that elevate the human experience and transcend the perceived limitations of space, perf

Skills

SRE PrinciplesCloud NativeKubernetes

Site Reliability Engineering Manager

Lucid Motors · Riyadh

Senior

Lead Site Reliability Engineering initiatives, ensuring cloud service reliability, performance, and team management while utilizing Kubernetes, Terraform, and incident management.

Skills

Project ManagementTeam LeadershipTechnical Expertise

Infrastructure & Site Reliability Engineer – Datacentre AI Engineering - Riyadh, KSA

Qualcomm · Riyadh

Seniorfulltime

Company Qualcomm Middle East Information Technology Company LLC Job Area Engineering Group, Engineering Group > Software Test Engineering General Summary About Us Qualcomm is growing its presence in Riyadh and is hiring

Skills

engineeringdesignproject management

Nutanix AI Site Reliability Lead Engineer

emagine · Riyadh

Mid-Seniorfulltime

Nationality: Saudi Nationals only We are seeking an experienced Site Reliability Lead Engineer to act as the on-site technical lead for Nutanix AI infrastructure environments. The role is responsible for driving reliabil

Skills

DevOpsExcelMachine Learning

AI Infrastructure Nutanix Site Reliability Engineer

emagine · Riyadh

Mid-Seniorfulltime

Job Title: AI Infrastructure Nutanix Site Reliability Engineer Location: Saudi Arabia Nationality: Saudi Nationals only Experience: 5+ years Job Overview: We are seeking an experienced AI Infrastructure Site Reliability

Skills

AWSAzureCI/CD

Site Reliability Engineering Officer

Takamol Holding · Riyadh

Entryfulltime

Job Description Job description : Provide support for application incidents across digital platforms, working closely with Platform Engineering, Application Development, and customer support teams to ensure timely resol

Skills

ElasticsearchGitJira

Site Reliability Engineer - Observability

Mirai Arabian International Company Limited · Riyadh

Seniorparttime

Seeking a Site Reliability Engineer focused on observability, automation, and reliability for AI platforms, requiring strong coding and cloud automation skills.

Skills

engineeringdesignproject management

Site Reliability Engineer

D360 Bank · Saudi Arabia

Senior

Support and maintain services, design scalable systems, develop monitoring tools, and ensure reliability while collaborating with teams and automating tasks.

Skills

engineeringdesignproject management

2.2K+

Cover Letters & Follow-ups

1.8K+

Resumes Tailored

190.5K+

Jobs Tracked

Trusted by professionals at

PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
PwC//
Emaar//
KPMG//
Noon//
Amazon AWS//
Talabat//
Deloitte//
Emirates//
Careem//
Aramex//
McKinsey//
Property Finder//
Majid Al Futtaim//
Chalhoub Group//
AI Job Platform

Stop applying blindly. Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Free plan · No credit card required