AI Platform Lead
Job Fit Check
Base Career helps you apply smarter for this job.
Key skills for this role
About the Role
Lead AI platform operations, incident management, and support for Azure API Management, LLM services, and voice AI. Manage observability, release changes, security compliance, and automation.
Key Skills for This Role
Full Job Posting
Company
– TCS (MEA)
Location
– Dubai
Job type
– Full time
About Us
Tata Consultancy Services (TCS) is an IT services, consulting and business solutions organization that has been partnering with many of the world’s largest businesses in their transformation journeys for over 50 years.
TCS offers a consulting-led, cognitive powered, integrated portfolio of business, technology and engineering services and solutions.
This is delivered through its unique Location Independent Agile™ delivery model, recognized as a benchmark of excellence in software development.
A part of the Tata group, India's largest multinational business group, TCS has over 616,171 of the world’s best-trained consultants with 157 nationalities in 53 countries.
For more information, visit www.tcs.com and follow TCS news at @TCS_News.
API, AI Gateway and MCP Support
Operate and troubleshoot the Azure API Management layer that serves as both the enterprise API gateway and the MCP Gateway for Gernas.
Diagnose issues with API policies, authentication, authorisation, routing, rate limiting, quotas, caching, and backend connectivity.
Provide deep support for MCP servers, MCP tools, and the MCP Gateway pattern — including tool discovery, schema validation, protocol-level failures, and the coordination between MCP clients hosted in agents and the underlying tool endpoints.
Ensure that the gateway remains a secure, observable, and policy-compliant control plane for all AI traffic.
Observability And Performance Management
Use Comet Opik as the primary observability surface for agent and LLM execution, working with traces, prompts, agent execution paths, latency breakdowns, token usage, errors, and model quality indicators.
Build and maintain operational dashboards, alert rules, and correlation views that combine Opik telemetry with Azure Monitor, Application Insights, Log Analytics, and CloudWatch data.
Lead performance optimisation initiatives where trace evidence shows hotspots in prompts, tools, retrieval steps, or model selection, and ensure that observability coverage keeps pace with platform evolution.
Voice Ai Support
Support the production operation of ElevenLabs-based voice AI capabilities, including speech generation, voice-agent connectivity, real-time audio session handling, and API consumption patterns.
Investigate latency, audio quality, dropped sessions, and integration failures across the voice channel and its dependent platforms, and coordinate with ElevenLabs and integration partners on upstream issues.
Release And Change Management
Validate releases prior to and immediately following deployment, exercising production verification scripts, smoke tests, and rollback procedures.
Maintain release readiness through clear configuration management, environment parity checks, and pre-deployment risk reviews.
Operate within FAB's change-management framework, ensuring that all production changes including patches, upgrades, configuration adjustments, and model or prompt rotations pass through the appropriate change controls and post-implementation review.
Security, Risk and Compliance
Uphold the security and compliance posture of Gernas and dependent AI products.
Manage identity and access controls, secrets, certificates, and managed identities across the platform; coordinate vulnerability remediation and patching cycles; and maintain audit evidence for internal audit, supervisory reviews, and external assurance.
Operate responsible-AI controls including content filtering, PII and PCI detection, data egress controls, and model-access governance and ensure that secure integration patterns are followed across every API, MCP tool, and external dependency.
Service Improvement And Automation
Drive continuous reduction of manual support effort through automation of routine operational tasks, self-healing patterns, monitoring enhancements, and proactive remediation.
Maintain a current and high-quality library of support playbooks, runbooks, knowledge articles, and standard operating procedures.
Identify and lead service-improvement initiatives that lift platform reliability metrics, reduce incident volume, and shorten mean time to resolution.
Stakeholder And Vendor Coordination
Operate as a credible technical counterpart to business units, engineering teams, the Cloud Platform team, Cybersecurity, Architecture, AI Governance, and Service Management.
Lead vendor engagement with Microsoft, AWS, Core42, ElevenLabs, and other technology partners on incidents, capacity reviews, roadmap items, and product issues, ensuring that vendor accountability is exercised and that escalations are progressed effectively.
On-Call And Operational Readiness
Participate in a 24×7 support model, including a structured on-call rotation, major incident leadership, disaster-recovery exercises, business-continuity testing, and production-readiness assessments for new agents, models, and integrations entering the platform.
Treat operational readiness as a release gate rather than an afterthought, and ensure that nothing reaches production without explicit operational sign-off.
TECHNICAL SKILLS: minimum 8-10 yrs of working experience mandatory
Llm Operations
Kubernetes and Containers
Observability
DevOps and Automation
Voice Ai
Security and Governance
Education
Bachelor's degree in Computer Science, Artificial Intelligence, Information Technology, Engineering, or a closely related discipline.
A relevant master's degree in AI, Machine Learning, or Cloud Computing will be considered an advantage.
Professional Experience
Approximately 8–10 years of overall IT experience, with significant time spent in cloud application support, platform engineering, DevOps, Site Reliability Engineering, production operations, or senior technical support roles.
Of this, a minimum of 3–5 years of relevant experience supporting AI, machine learning, Generative AI, conversational AI, cloud-native platforms, or other data-intensive enterprise systems is required.
Banking And Regulated Environment
Experience operating within banking, financial services, government, telecommunications, or another highly regulated enterprise environment is strongly preferred.
Familiarity with regulatory expectations around data protection, model governance, audit evidence, and operational resilience is highly valued.
Technical Troubleshooting
Strong analytical and diagnostic skills, with demonstrated ability to troubleshoot complex distributed systems across applications, APIs, AI models, agents, cloud services, Kubernetes workloads, networking, identity, and third-party integrations.
Capable of reasoning from symptom to root cause across stack boundaries without losing fidelity.
Operational Skills
Proven track record of producing high-quality runbooks, support procedures, monitoring standards, operational dashboards, knowledge articles, root-cause analysis reports, and service-improvement plans.
Comfort operating with formal SLAs, OLAs, and change-management discipline.
Communication and Leadership
Strong written and verbal communication skills, with the ability to lead incidents under pressure, coordinate vendors and cross-functional stakeholders, mentor junior engineers in the AI operations function, and translate complex technical issues into clear narratives for both technical and senior business audiences.
Thank you for your interest in applying for this position with TCS.
We will review your application and will get back to you if we are considering your interest in this opportunity.
Privacy Note
https://www.tcs.com/connect-with-tcs/privacy-policy
Apply for this job in 1 click
Skip the repetitive application forms
Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.
Trusted by over 500,000 job seekers on Base Career
More from this employer
More jobs at Tata Consultancy Services
AI Platform Engineer
Dubai, UAE
Provide day-to-day monitoring and operational support of AI platform, deployed AI agents, agent runtime environments, AI gateways, and supporting cloud services. Lead L2 and L3 incident response, troubleshoot AI agent wo
AI Engineer
Dubai, UAE
Job Title – AI Engineer Company – TCS (MEA) Location – Dubai Job type – Full time About Us: Tata Consultancy Services (TCS) is an IT services, consulting and business solutions organization that has been partnering with
Sr Solution Architect
Dubai, UAE
Job Title – Sr Solution Architect Company – TCS (MEA) Location – Dubai Job type – Full time About Us: Tata Consultancy Services (TCS) is an IT services, consulting and business solutions organization that has been partne
Oracle Integration Cloud Consultant / Administrator
Dubai, UAE
Job Title – Oracle Integration Cloud Consultant / Administrator Company – TCS (MEA) Location – Dubai, United Arab Emirates Job type – Full time About Us: Tata Consultancy Services (TCS) is an IT services, consulting and
IDQ Developers
Dubai, UAE
About Us : Tata Consultancy Services (TCS) is an IT services, consulting and business solutions organization that has been partnering with many of the world’s largest businesses in their transformation journeys for over
EDC / Axon SME (Informatica) – Data Governance & Data Catalog Engineer
Dubai, UAE
Job Title – EDC / Axon SME (Informatica) – Data Governance & Data Catalog Engineer Company – TCS (MEA) Location – Dubai, United Arab Emirates Job type – Full time About Us : Tata Consultancy Services (TCS) is an IT servi
INFORMATICA DG ARCHITECT
Dubai, UAE
Job Title – INFORMATICA DG ARCHITECT Company – TCS (MEA) Location – Dubai, United Arab Emirates Job type – Full time About Us : Tata Consultancy Services (TCS) is an IT services, consulting and business solutions organiz
Cyber Security Architect
Dubai, UAE
**Join Tata Consultancy Services, Asia Pacific and be part of an organization committed to sustainable development for our future. TCS follows the Tata group philosophy of building sustainable businesses that are rooted
AI Platform Engineer
Dubai, UAE
AI Engineer
Dubai, UAE
Sr Solution Architect
Dubai, UAE
Oracle Integration Cloud Consultant / Administrator
Dubai, UAE
IDQ Developers
Dubai, UAE
EDC / Axon SME (Informatica) – Data Governance & Data Catalog Engineer
Dubai, UAE
INFORMATICA DG ARCHITECT
Dubai, UAE
Cyber Security Architect
Dubai, UAE