{bc}
linkedin

Cloud Systems Architect (Remote)

Hire Feed
Abu Dhabi, UAE
contract
6 days ago
Cloud Architecture (AWSAzureGCP)Enterprise ArchitectureSolution DesignInfrastructure as Code (IaC)
Free

Job Fit Check

Base Career helps you apply smarter for this job.

?%
Ready to Scan

Key skills for this role

Cloud Architecture (AWSAzureGCP)
Smart Apply

Full Job Posting

Overview

  • **Role**

: Cloud Systems Architect (Remote)

  • **Location**
  • : Remote (Work from Anywhere)
  • **Payout**
  • : Competitive

Role Overview

We are hiring for one of our clients, seeking a Site Reliability Engineer (LInE) to work on a contractor basis.

As a Site Reliability Engineer, you will apply your expertise to help train next-generation AI systems, shaping how models learn, reason, and perform through high-quality, real-world input.

This role offers a unique opportunity to contribute to the development of frontier AI models, leveraging your domain knowledge to drive innovation in the AI industry.

Key Responsibilities

  • Design, implement, and maintain scalable infrastructure using Linux, Kubernetes, and Prometheus, ensuring seamless deployments and high system availability.
  • Monitor system health, analyze performance metrics, and proactively address bottlenecks or potential failures, minimizing manual intervention and increasing system reliability.
  • Automate operational processes to minimize manual intervention and increase system reliability, and respond swiftly to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures.
  • Collaborate closely with development and operations teams to deliver seamless deployments and high system availability, creating comprehensive documentation and clear runbooks for operational excellence.
  • Respond to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures, ensuring high system availability and minimizing downtime.

Required Skills & Qualifications

  • Proven experience designing, implementing, and maintaining scalable infrastructure using Linux, Kubernetes, and Prometheus, with a strong understanding of system health monitoring and performance metrics analysis.
  • Strong understanding of automation tools and technologies, with experience in automating operational processes to minimize manual intervention and increase system reliability.
  • Excellent problem-solving skills, with the ability to analyze complex system issues, identify root causes, and develop effective solutions.
  • Strong communication and collaboration skills, with the ability to work closely with development and operations teams to deliver seamless deployments and high system availability.
  • Experience with comprehensive documentation and clear runbooks for operational excellence, with a strong attention to detail and ability to create clear, concise documentation.

More About the Opportunity

This role offers a unique opportunity to work with a global leader in the AI industry, leveraging your domain knowledge to drive innovation and shape the development of next-generation AI systems.

You will have the opportunity to work on a global scale, collaborating with top experts and contributing to the creation of cutting-edge AI models.

Equal Opportunity Employer

We hire based on skills and expertise.

All qualified candidates are welcome regardless of background, experience, or prior employment history.

Applications are reviewed solely on demonstrated technical ability and qualifications.

Apply Now!

Apply for this job in 1 click

Skip the repetitive application forms

Install the Base Career Chrome Extension and autofill job applications across major job boards with your profile.

Sarah M.James T.Maya R.

Trusted by over 500,000 job seekers on Base Career

Start Free Today

More from this employer

More jobs at Hire Feed