Platform Engineer
Skills
About This Role
Role Overview
The AI Factory’s Engineers build the systems and products that deliver AI at government scale.
The Staff Platform Engineer makes that work possible — and makes it faster, safer, and more reliable.
This is a staff-level individual contributor role.
You will own the platform infrastructure that the AI Factory’s engineering teams build on: model serving, vector infrastructure, data pipelines, developer tooling, observability, and the deployment platform.
Your job is to ensure that every team has the right foundation to move quickly, that production systems are observable and operable, and that AI capabilities can scale across multiple government entities without being rebuilt from scratch each time.
The ideal candidate combines deep platform and infrastructure engineering experience with a genuine understanding of what AI systems demand at the infrastructure layer — the latency, throughput, and reliability constraints of LLM inference, the data characteristics of retrieval pipelines, and the operational complexity of systems that depend on model behaviour. You are a force multiplier for the engineers around you, not just a reliable executor.
Core Responsibilities
- Own and evolve the AI Factory’s core platform infrastructure: model serving, vector database infrastructure, embedding pipelines, and the compute layer that AI workloads run on.
- Design, build, and operate GPU-based inference infrastructure — including serving systems such as vLLM, TGI, or TensorRT-LLM — ensuring low-latency, high-throughput, and cost-efficient model serving at production scale.
- Build and maintain the data infrastructure that supports AI workloads: document ingestion pipelines, vector stores (pgvector, Pinecone, Qdrant), object storage, and the ETL processes that keep them current and reliable.
- Design and operate the deployment platform — containerised services, CI/CD pipelines, infrastructure-as-code — so that Engineers can ship to production quickly and safely.
- Own observability across the AI stack: define and implement telemetry, logging, tracing, and alerting for AI services, with specific attention to LLM-specific metrics such as latency distributions, token throughput, error rates, and model behaviour drift.
- Build shared internal tooling and platform abstractions that raise the productivity of engineers — reusable deployment patterns, model-serving clients, pipeline SDKs, and developer environment tooling.
- Set and enforce platform standards for reliability, security, and scalability across the AI Factory’s engineering teams.
- Partner with Engineers to understand their infrastructure requirements and translate them into durable platform capabilities, not one-off solutions.
- Lead technical decisions on infrastructure architecture, evaluate new platform technologies, and ensure the team is building on the right foundations for the next two to three years.
Basic Qualifications
- 10+ years of experience in platform, infrastructure, or backend engineering, with a demonstrable track record of building and operating production systems at scale.
- Deep experience with cloud infrastructure — particularly Azure, with AWS or GCP also valued — including compute, networking, storage, managed services, and cost optimisation.
- Strong expertise in containerisation and orchestration: Docker, Kubernetes, and the operational patterns that make containerised AI workloads reliable in production.
- Proven experience building and operating CI/CD pipelines and infrastructure-as-code — treating infrastructure as software, with the same standards for testing, review, and observability.
- Experience designing and operating data pipelines at production scale: ingestion, transformation, storage, and the reliability patterns that keep them running under real load.
- Strong proficiency in at least one backend systems language — Python, Java, or Go — with the ability to read and reason about code across the stack.
- Experience with relational databases at production scale, particularly PostgreSQL — schema design, query optimisation, connection pooling, and operational management.
- Hands-on experience with observability tooling — distributed tracing, structured logging, metrics, and alerting — and the ability to instrument systems so that failures are diagnosed quickly.
- Strong written and verbal communication — you can explain infrastructure decisions, failure modes, and trade-offs clearly to both engineering peers and non-technical stakeholders.
Preferred Qualifications
- Direct experience operating GPU infrastructure and LLM inference serving systems such as vLLM, TGI (Text Generation Inference), TensorRT-LLM, or equivalent — including performance tuning, batching strategies, and cost management.
- Experience building or operating vector database infrastructure — pgvector, Pinecone, Qdrant, Weaviate, or similar — at production scale, including indexing strategies, query performance, and operational maintenance.
- Familiarity with the infrastructure demands specific to RAG systems: document ingestion pipelines, embedding generation at scale, retrieval latency, and the operational complexity of keeping vector indices current.
- Experience building internal developer platforms or platform-as-a-product — with an understanding of what makes platform tooling actually adopted versus ignored.
- Familiarity with AI/ML workload patterns: batch inference, streaming inference, model versioning, and the infrastructure trade-offs involved in serving large models reliably.
- Experience operating real-time or streaming data infrastructure relevant to conversational AI or voice pipelines.
- Security and compliance experience in a regulated or government-adjacent environment.
How We Work
- **Ownership.**
- Takes full responsibility for the platform from design through operation — not to the boundary of a ticket.
- When engineers are blocked by infrastructure, this role unblocks them.
- **Force multiplier.**
- Measures success by the productivity and reliability it enables in others, not just the systems it directly builds.
- The best platform work is invisible to those who depend on it.
- **Evidence-driven.**
- Grounds every infrastructure decision in evidence: load profiles, failure data, cost analysis, and real operational experience.
- Avoids over-engineering and premature abstraction.
- **Bias for operability.**
- Builds systems that are easy to understand, debug, and operate — not just systems that work when conditions are ideal.
- Runbooks, alerts, and dashboards are first-class deliverables.
- **Closes the loop.**
- Tracks whether platform changes improved the outcomes that matter — latency, reliability, developer velocity — and iterates until they do.
- **Clear thinker.**
- Writes and communicates clearly.
- Can explain infrastructure decisions, failure modes, and operational trade-offs to engineers who do not share their infrastructure background.
Technical Depth Expectations
- Candidates will be expected to demonstrate genuine depth in at least three of the following areas.
- Conceptual familiarity is not sufficient.
- LLM inference infrastructure — GPU serving systems, batching strategies, latency and throughput optimisation, cost management, and model versioning in production.
- Vector infrastructure and retrieval pipelines — vector database operations, indexing strategies, embedding pipeline architecture, and retrieval performance at scale.
- Cloud infrastructure at scale — compute, networking, storage, managed services, and cost optimisation on Azure, AWS, or GCP, with a strong understanding of the trade-offs between managed and self-operated infrastructure.
- Platform and developer tooling — CI/CD, infrastructure-as-code, containerisation, and the design principles that make internal platforms actually useful to the engineers who depend on them.
- Observability for AI systems — distributed tracing, structured logging, metrics design, alerting, and the specific telemetry challenges of non-deterministic AI workloads.
- Data pipeline engineering — ingestion, transformation, reliability patterns, and the operational complexity of keeping AI data infrastructure current and trustworthy.
Your resume, rewritten
for this exact role.
Sign up free — Base Career tailors your CV to this job description in 60 seconds.
01 / 05
Resume Tailored to This Job

Your keywords, structure, and story — rewritten to match this exact role and pass ATS filters.
Free · No card · 60 seconds
02 / 05
Cover Letter for This Role, Done

Job-specific cover letters written in Gulf professional tone — ready in seconds, not hours.
Free · No card · 60 seconds
03 / 05
See How Well You Fit This Role

AI match score with clear reasons — know your fit before investing time in the application.
Free · No card · 60 seconds
04 / 05
Apply in One Click

Autofill any application form on Workday, LinkedIn, Bayt, Greenhouse — with your tailored content.
Free · No card · 60 seconds
05 / 05
Track It. Follow Up at the Right Time.

Visual pipeline for every application with AI-timed follow-up reminders so nothing slips.
Free · No card · 60 seconds
Similar Jobs
Ai Platform Engineer (Agentic)
Client of Salt · Abu Dhabi
Design and implement multi-agent orchestration frameworks, build model-serving layers, and ensure AI security controls with strong Python backend engineering skills.
Skills
Security Platform Engineer Data & Identity
Client of Salt · Abu Dhabi
An organisation is hiring a strong>Security Platform Engineer/strong> to manage data protection, secrets management, and identity-related security platforms. /p> strong>Key Resp...
Skills
4 days ago
Apply Now↗Apply Now ↗Security Platform Engineer Network & Zero Trust
Client of Salt · Abu Dhabi
Manage network security platforms, implement Zero Trust architecture, and support incident response with strong API security and scripting skills.
Skills
5 days ago
Apply Now↗Apply Now ↗Security Platform Engineer Cloud, Endpoint & Compliance
Client of Salt · Abu Dhabi
Manage cloud-native security, vulnerability management, and endpoint protection platforms while integrating tools and supporting incident response.
Skills
5 days ago
Apply Now↗Apply Now ↗Staff AI Platform Engineer
Client of Discovered MENA · Abu Dhabi
Design and operate scalable AI platform infrastructure, develop GPU-based inference, and implement observability while ensuring reliability and security standards.
Skills
6 days ago
Apply Now↗Apply Now ↗Staff AI Platform Engineer - Abu Dhabi
Arabic.AI · Abu Dhabi
About The Opportunity We are hiring on behalf of a major government transformation initiative in Abu Dhabi that is building one of the world's most ambitious applied AI programs. This organisation is developing the infra
Skills
6 days ago
Apply Now↗Apply Now ↗Staff AI Platform Engineer - Abu Dhabi
TALENTMATE · Abu Dhabi
Job Description About The Opportunity We are hiring on behalf of a major government transformation initiative in Abu Dhabi that is building one of the worlds most ambitious applied AI programs. This organisation is devel
Skills
6 days ago
Apply Now↗Apply Now ↗Staff AI Platform Engineer - Abu Dhabi
arabic · Abu Dhabi
About the Opportunity We are hiring on behalf of a major government transformation initiative in Abu Dhabi that is building one of the world’s most ambitious applied AI programs. This organisation is developing the infra
Skills
6 days ago
Apply Now↗Apply Now ↗Sr. AI Cloud Platform Engineer
Dicetek LLC · Abu Dhabi
Design and scale AWS cloud ecosystems, operationalize AI services, manage infrastructure as code, and optimize cloud costs with expert-level skills in Terraform and AI stack.
Skills
1 weeks ago
Apply Now↗Apply Now ↗2.2K+
Cover Letters & Follow-ups
1.8K+
Resumes Tailored
190.5K+
Jobs Tracked
Trusted by professionals at
Stop applying blindly.
Start getting hired.
Base Career automates the hardest parts of job searching — apply smarter, not harder.
AI Resume in 60s
Your resume rewritten for this exact role using the job description as the brief.
ATS-Optimized
Get past automated screening filters with the right keywords matched to each job.
Application Tracker
Track every job, follow-up, and interview in one visual kanban board.
Free plan · No credit card required