{bc}
linkedin

Senior QA Automation Engineer

cander
Abu Dhabi, UAE
fulltime
Mid-Senior
Today
PythonSQLCloud PlatformsCI/CDDevOpsJenkins
Free

Job Fit Check

Base Career helps you apply smarter for this job.

?%
Ready to Scan

Key skills for this role

PythonSQLCloud Platforms
Smart Apply

Full Job Posting

Overview

Headquartered in Abu Dhabi, specializes in developing defense-grade artificial intelligence solutions to ensure high standards of reliability, reliability where large language models and predictive systems are tested rigorously.

The company focuses on industries requiring mission-critical AI, integrating automated validation frameworks that meet strict evaluation criteria while supporting seamless ERP integrations and real-time data integrity.

Their work emphasizes Agile development combined with formal Systems Engineering to deliver secure and traceable AI validation methods for high-stakes applications.

Job Summary

We are seeking a Senior QA Automation Engineer to lead the validation and verification strategies for our AI transformation initiative.

In this critical role, you will define and enforce rigorous standards for non-deterministic AI systems, ensuring Large Language Models (LLMs) and predictive engines meet the stringent reliability requirements of the defense sector.

Your expertise will bridge Agile development methodologies with formal Systems Engineering practices, creating automated testing frameworks that validate both software functionality and AI behaviors against established Ground Truth datasets.

Working within a structured Stage Gate delivery model, you will ensure AI agents successfully pass Test Readiness Reviews (TRR) and Functional Configuration Audits (FCA) before deployment.

This role demands a deep understanding of AI evaluation techniques, including hallucination detection, consistency validation, and factual accuracy assessment, while also addressing integration challenges with enterprise systems like SAP S/4HANA and Ariba.

Your work will directly contribute to the trustworthiness of AI-driven decision-making in mission-critical applications, setting new benchmarks for how defense organizations validate intelligent systems.

As the final line of defense, you will pioneer methodologies for testing Generative AI in safety-critical environments, ensuring AI agents are capable of negotiating contracts or designing system components with the highest levels of reliability and compliance.

Key Responsibilities

  • Lead the validation and verification of AI systems by architecting automated testing frameworks to evaluate non-deterministic outputs, including hallucination detection, consistency checks, and factual accuracy against gold standard datasets for Large Language Models (LLMs) and predictive engines.
  • Design and implement automated evaluation metrics such as RAGAS, faithfulness, and answer relevance to validate Retrieval-Augmented Generation (RAG) pipelines, ensuring accurate citation of internal technical documentation and regulatory texts.
  • Develop prompt regression suites to monitor prompt drift and maintain the quality of AI-generated engineering documents, mitigating degradation risks from underlying model or system instruction changes.
  • Construct robust integration tests to validate data consistency between AI agents and critical enterprise systems (e.g., SAP S/4HANA, Ariba), ensuring no corruption of Bill of Materials (BOM) or financial data in supply chain applications.
  • Create performance benchmarking tests to validate latency and throughput of forecasting models and risk scoring engines, aligning with real-time requirements for supply chain dashboards.
  • Automate API validation for secure gateways, verifying Role-Based Access Control (RBAC) and Personally Identifiable Information (PII) redaction logic before data ingestion into AI models.
  • Ensure alignment of automated test cases with System Requirements and User Needs, generating digital evidence for Verification and Validation (V&V) reports as per Systems Engineering Handbook standards.
  • Prepare Test Readiness packages for Stage Gate reviews, providing quantitative evidence to demonstrate system stability and readiness for progression from Minimum Viable Product (MVP) to Production deployment.
  • Manage the defect lifecycle between Requirements Quality Assistants and development teams, tracing defects in AI logic back to specific model versions or datasets for resolution.
  • Bridge Agile development and formal Systems Engineering processes, ensuring AI agents meet rigorous Test Readiness Reviews (TRR) and Functional Configuration Audits (FCA) for defense-sector deployment.

Qualifications And Experience

  • 5+ years of experience in QA Automation, with at least 2 years focused on testing complex data-driven applications, machine learning models, or AI agents.
  • Experience in Defense, Aerospace, or highly regulated industries is a strong plus. Understanding of IV&V (Integration, Verification, and Validation) processes is highly desirable.
  • Expert proficiency in Python for building custom test harnesses (Pytest) and standard automation libraries (Selenium/Playwright for UI, Requests for API).
  • Expert proficiency in crafting Performance Test Plans and Implementations (e.g., Locust, Jmeter, K6).
  • Experience utilizing frameworks for evaluating Large Language Models (e.g., DeepEval, TruLens, or custom Python evaluators) and understanding of 'Ground Truth' dataset creation and management.
  • Proficiency with SQL and data validation tools (e.g., Great Expectations) to verify data quality within Data Lakehouses and Vector Databases.
  • Strong experience integrating automated tests into GitLab CI/CD pipelines, enforcing 'Quality Gates' that prevent non-compliant code or models from merging.
  • Familiarity with requirements management tools (e.g., Jira, Linear, Jama, Polarion) and linking automated test results to specific requirement IDs.
  • Strong hands-on experience managing Test Reports and Artifacts (e.g., TestRail, Allure).
  • String knowledge of maintaining code-based frameworks (e.g., Git, GitLab).
  • Strong hands-on knowledge of modern Quality Engineering best practices for fast-paced development environments (e.g., Shift Left Approaches, Test Pyramid, Mono-repo architecture for automation projects).
  • Ability to define pass/fail criteria for probabilistic systems and communicate 'Confidence Levels' to engineering leadership.
  • Proven ability to collaborate with Data Scientists to understand model limitations and with Systems Engineers to understand formal acceptance criteria.

Required Technical Skills

  • Expert proficiency in Python for building custom test harnesses using Pytest and standard automation libraries (Selenium/Playwright for UI testing, Requests for API testing).
  • Expert proficiency in crafting Performance Test Plans and implementations using tools such as Locust, JMeter, and K6.
  • Experience utilizing frameworks for evaluating Large Language Models (LLMs), including DeepEval, TruLens, or custom Python evaluators, with a strong understanding of Ground Truth dataset creation and management.
  • Proficiency with SQL and data validation tools (e.g., Great Expectations) to verify data quality within Data Lakehouses and Vector Databases.
  • Strong experience integrating automated tests into GitLab CI/CD pipelines, enforcing Quality Gates to prevent non-compliant code or models from merging.
  • Familiarity with requirements management tools (e.g., Jira, Linear, Jama, Polarion) and the ability to link automated test results to specific requirement IDs.
  • Hands-on experience managing test reports and artifacts using tools such as TestRail and Allure.
  • String knowledge of maintaining code-based frameworks using Git and GitLab for version control.
  • Strong hands-on knowledge of modern Quality Engineering best practices for fast-paced development environments, including Shift Left Approaches, Test Pyramid principles, and Mono-repo architecture for automation projects.
  • Architecting automated frameworks to evaluate non-deterministic AI outputs for hallucination, consistency, and factual accuracy against Gold Standard datasets.
  • Implementing automated metrics (e.g., RAGAS, faithfulness, answer relevance) to verify Retrieval-Augmented Generation (RAG) pipelines for accurate citation of internal technical documentation and regulatory texts.
  • Designing regression suites to monitor prompt drift and ensure changes to underlying models or system instructions do not degrade the quality of AI-generated engineering documents.
  • Building robust integration tests to validate data consistency between AI agents and critical enterprise systems (e.g., SAP S/4HANA, Ariba), ensuring no corruption of Bill of Materials (BOM) or financial data.
  • Designing performance tests to validate latency and throughput of forecasting models and risk scoring engines for real-time supply chain dashboard requirements.
  • Automating the testing of secure API gateways to verify Role-Based Access Control (RBAC) and Personally Identifiable Information (PII) redaction logic before data reaches AI models.

Company And Project Focus

Join a dynamic organization where innovation and quality assurance intersect to drive excellence in software delivery.

In this role, you will contribute to a high-impact project focused on automating quality assurance processes, ensuring robust, scalable, and reliable test frameworks.

The project emphasizes continuous improvement, collaboration across teams, and the adoption of cutting-edge automation tools to enhance software testing efficiency and accuracy.

Your work will directly support the development of high-quality products while fostering a culture of precision and technical excellence.

• Location: Abu Dhabi, United Arab Emirates

  • Company: Sister Company of the Client
  • Project Focus: Integration, Verification, and Validation (IV&V) for the AI platform and Intelligent Supply Chain systems

Why This Role Matters

This role represents the critical final line of defense in ensuring AI integrity and reliability.

As the gatekeeper of AI trustworthiness, you will determine whether an AI agent is authorized to perform high-stakes tasks such as negotiating contracts or designing critical system components.

Your work will pioneer groundbreaking methodologies for testing Generative AI within safety-critical environments, establishing industry-leading standards for how defense organizations validate intelligent systems.

By setting these benchmarks, you will shape the future of AI validation, ensuring that intelligent systems meet the highest safety and performance criteria in mission-critical applications.

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Sarah M.James T.Maya R.

Trusted by over 500,000 job seekers on Base Career

Start Free Today

More from this employer

More jobs at cander