Platform Engineer

CloudJune TechnologiesAbu Dhabi, UAE2 weeks agoMid-Senior

Mid-Seniorfulltime

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

About This Role

Role Overview

The AI Factory’s Engineers build the systems and products that deliver AI at government scale.

The Staff Platform Engineer makes that work possible — and makes it faster, safer, and more reliable.

This is a staff-level individual contributor role.

You will own the platform infrastructure that the AI Factory’s engineering teams build on: model serving, vector infrastructure, data pipelines, developer tooling, observability, and the deployment platform.

Your job is to ensure that every team has the right foundation to move quickly, that production systems are observable and operable, and that AI capabilities can scale across multiple government entities without being rebuilt from scratch each time.

The ideal candidate combines deep platform and infrastructure engineering experience with a genuine understanding of what AI systems demand at the infrastructure layer — the latency, throughput, and reliability constraints of LLM inference, the data characteristics of retrieval pipelines, and the operational complexity of systems that depend on model behaviour. You are a force multiplier for the engineers around you, not just a reliable executor.

Core Responsibilities

Own and evolve the AI Factory’s core platform infrastructure: model serving, vector database infrastructure, embedding pipelines, and the compute layer that AI workloads run on.
Design, build, and operate GPU-based inference infrastructure — including serving systems such as vLLM, TGI, or TensorRT-LLM — ensuring low-latency, high-throughput, and cost-efficient model serving at production scale.
Build and maintain the data infrastructure that supports AI workloads: document ingestion pipelines, vector stores (pgvector, Pinecone, Qdrant), object storage, and the ETL processes that keep them current and reliable.
Design and operate the deployment platform — containerised services, CI/CD pipelines, infrastructure-as-code — so that Engineers can ship to production quickly and safely.
Own observability across the AI stack: define and implement telemetry, logging, tracing, and alerting for AI services, with specific attention to LLM-specific metrics such as latency distributions, token throughput, error rates, and model behaviour drift.
Build shared internal tooling and platform abstractions that raise the productivity of engineers — reusable deployment patterns, model-serving clients, pipeline SDKs, and developer environment tooling.
Set and enforce platform standards for reliability, security, and scalability across the AI Factory’s engineering teams.
Partner with Engineers to understand their infrastructure requirements and translate them into durable platform capabilities, not one-off solutions.
Lead technical decisions on infrastructure architecture, evaluate new platform technologies, and ensure the team is building on the right foundations for the next two to three years.

Basic Qualifications

10+ years of experience in platform, infrastructure, or backend engineering, with a demonstrable track record of building and operating production systems at scale.
Deep experience with cloud infrastructure — particularly Azure, with AWS or GCP also valued — including compute, networking, storage, managed services, and cost optimisation.
Strong expertise in containerisation and orchestration: Docker, Kubernetes, and the operational patterns that make containerised AI workloads reliable in production.
Proven experience building and operating CI/CD pipelines and infrastructure-as-code — treating infrastructure as software, with the same standards for testing, review, and observability.
Experience designing and operating data pipelines at production scale: ingestion, transformation, storage, and the reliability patterns that keep them running under real load.
Strong proficiency in at least one backend systems language — Python, Java, or Go — with the ability to read and reason about code across the stack.
Experience with relational databases at production scale, particularly PostgreSQL — schema design, query optimisation, connection pooling, and operational management.
Hands-on experience with observability tooling — distributed tracing, structured logging, metrics, and alerting — and the ability to instrument systems so that failures are diagnosed quickly.
Strong written and verbal communication — you can explain infrastructure decisions, failure modes, and trade-offs clearly to both engineering peers and non-technical stakeholders.

Preferred Qualifications

Direct experience operating GPU infrastructure and LLM inference serving systems such as vLLM, TGI (Text Generation Inference), TensorRT-LLM, or equivalent — including performance tuning, batching strategies, and cost management.
Experience building or operating vector database infrastructure — pgvector, Pinecone, Qdrant, Weaviate, or similar — at production scale, including indexing strategies, query performance, and operational maintenance.
Familiarity with the infrastructure demands specific to RAG systems: document ingestion pipelines, embedding generation at scale, retrieval latency, and the operational complexity of keeping vector indices current.
Experience building internal developer platforms or platform-as-a-product — with an understanding of what makes platform tooling actually adopted versus ignored.
Familiarity with AI/ML workload patterns: batch inference, streaming inference, model versioning, and the infrastructure trade-offs involved in serving large models reliably.
Experience operating real-time or streaming data infrastructure relevant to conversational AI or voice pipelines.
Security and compliance experience in a regulated or government-adjacent environment.

How We Work

**Ownership.**
Takes full responsibility for the platform from design through operation — not to the boundary of a ticket.
When engineers are blocked by infrastructure, this role unblocks them.
**Force multiplier.**
Measures success by the productivity and reliability it enables in others, not just the systems it directly builds.
The best platform work is invisible to those who depend on it.
**Evidence-driven.**
Grounds every infrastructure decision in evidence: load profiles, failure data, cost analysis, and real operational experience.
Avoids over-engineering and premature abstraction.
**Bias for operability.**
Builds systems that are easy to understand, debug, and operate — not just systems that work when conditions are ideal.
Runbooks, alerts, and dashboards are first-class deliverables.
**Closes the loop.**
Tracks whether platform changes improved the outcomes that matter — latency, reliability, developer velocity — and iterates until they do.
**Clear thinker.**
Writes and communicates clearly.
Can explain infrastructure decisions, failure modes, and operational trade-offs to engineers who do not share their infrastructure background.

Technical Depth Expectations

Candidates will be expected to demonstrate genuine depth in at least three of the following areas.
Conceptual familiarity is not sufficient.
LLM inference infrastructure — GPU serving systems, batching strategies, latency and throughput optimisation, cost management, and model versioning in production.
Vector infrastructure and retrieval pipelines — vector database operations, indexing strategies, embedding pipeline architecture, and retrieval performance at scale.
Cloud infrastructure at scale — compute, networking, storage, managed services, and cost optimisation on Azure, AWS, or GCP, with a strong understanding of the trade-offs between managed and self-operated infrastructure.
Platform and developer tooling — CI/CD, infrastructure-as-code, containerisation, and the design principles that make internal platforms actually useful to the engineers who depend on them.
Observability for AI systems — distributed tracing, structured logging, metrics design, alerting, and the specific telemetry challenges of non-deterministic AI workloads.
Data pipeline engineering — ingestion, transformation, reliability patterns, and the operational complexity of keeping AI data infrastructure current and trustworthy.

Your resume, rewritten
for this exact role.

01 / 05

Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.

Get My Free Resume

Free · No card · 60 seconds

02 / 05

Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.

Get My Cover Letter

Free · No card · 60 seconds

03 / 05

See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.

Check My Fit Score

Free · No card · 60 seconds

04 / 05

Apply in One Click

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.

Start Applying Faster

Free · No card · 60 seconds

05 / 05

Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.

Track My Applications

Free · No card · 60 seconds

Similar Jobs

Ai Platform Engineer (Agentic)

Client of Salt · Abu Dhabi

Seniorcontract

Design and implement multi-agent orchestration frameworks, build model-serving layers, and ensure AI security controls with strong Python backend engineering skills.

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

Today

Apply Now↗Apply Now ↗

Security Platform Engineer Data & Identity

Client of Salt · Abu Dhabi

Senior

An organisation is hiring a strong>Security Platform Engineer/strong> to manage data protection, secrets management, and identity-related security platforms. /p> strong>Key Resp...

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

4 days ago

Apply Now↗Apply Now ↗

Security Platform Engineer Network & Zero Trust

Client of Salt · Abu Dhabi

Senior

Manage network security platforms, implement Zero Trust architecture, and support incident response with strong API security and scripting skills.

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

5 days ago

Apply Now↗Apply Now ↗

Security Platform Engineer Cloud, Endpoint & Compliance

Client of Salt · Abu Dhabi

Senior

Manage cloud-native security, vulnerability management, and endpoint protection platforms while integrating tools and supporting incident response.

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

5 days ago

Apply Now↗Apply Now ↗

Staff AI Platform Engineer

Client of Discovered MENA · Abu Dhabi

Senior

Design and operate scalable AI platform infrastructure, develop GPU-based inference, and implement observability while ensuring reliability and security standards.

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

6 days ago

Apply Now↗Apply Now ↗

Staff AI Platform Engineer - Abu Dhabi

Arabic.AI · Abu Dhabi

Mid-Seniorfulltime

About The Opportunity We are hiring on behalf of a major government transformation initiative in Abu Dhabi that is building one of the world's most ambitious applied AI programs. This organisation is developing the infra

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

6 days ago

Apply Now↗Apply Now ↗

Staff AI Platform Engineer - Abu Dhabi

TALENTMATE · Abu Dhabi

Mid-Seniorfulltime

Job Description About The Opportunity We are hiring on behalf of a major government transformation initiative in Abu Dhabi that is building one of the worlds most ambitious applied AI programs. This organisation is devel

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

6 days ago

Apply Now↗Apply Now ↗

Staff AI Platform Engineer - Abu Dhabi

arabic · Abu Dhabi

fulltime

About the Opportunity We are hiring on behalf of a major government transformation initiative in Abu Dhabi that is building one of the world’s most ambitious applied AI programs. This organisation is developing the infra

Skills

Cloud ComputingInfrastructure as Code (IaC)CI/CD

6 days ago

Apply Now↗Apply Now ↗

Sr. AI Cloud Platform Engineer

Dicetek LLC · Abu Dhabi

Senior

Design and scale AWS cloud ecosystems, operationalize AI services, manage infrastructure as code, and optimize cloud costs with expert-level skills in Terraform and AI stack.

Skills

AutomationData PipelinesInfrastructure as Code

1 weeks ago

Apply Now↗Apply Now ↗

2.2K+

Cover Letters & Follow-ups

1.8K+

Resumes Tailored

190.5K+

Jobs Tracked

Trusted by professionals at

PwC//

Emaar//

KPMG//

Noon//

Amazon AWS//

Talabat//

Deloitte//

Emirates//

Careem//

Aramex//

McKinsey//

Property Finder//

Majid Al Futtaim//

Chalhoub Group//

PwC//

Emaar//

KPMG//

Noon//

Amazon AWS//

Talabat//

Deloitte//

Emirates//

Careem//

Aramex//

McKinsey//

Property Finder//

Majid Al Futtaim//

Chalhoub Group//

AI Job Platform

Stop applying blindly.
Start getting hired.

Base Career automates the hardest parts of job searching — apply smarter, not harder.

AI Resume in 60s

Your resume rewritten for this exact role using the job description as the brief.

ATS-Optimized

Get past automated screening filters with the right keywords matched to each job.

Application Tracker

Track every job, follow-up, and interview in one visual kanban board.

Free plan · No credit card required

Platform Engineer

About This Role

Role Overview

Core Responsibilities

Basic Qualifications

Preferred Qualifications

How We Work

Technical Depth Expectations

Your resume, rewritten for this exact role.

Similar Jobs

Ai Platform Engineer (Agentic)

Security Platform Engineer Data & Identity

Security Platform Engineer Network & Zero Trust

Security Platform Engineer Cloud, Endpoint & Compliance

Staff AI Platform Engineer

Staff AI Platform Engineer - Abu Dhabi

Staff AI Platform Engineer - Abu Dhabi

Staff AI Platform Engineer - Abu Dhabi

Sr. AI Cloud Platform Engineer

Stop applying blindly. Start getting hired.

Your resume, rewritten
for this exact role.

Stop applying blindly.
Start getting hired.