AI Reliability Engineer - DevOps

Department Icon Data Science Analytics & Machine Learning
149+ Applicants
Posted: 5 months ago
3-5 years
Bengaluru / Bangalore, Karnataka
work from office

Posted: 5 months ago
|
Applicants: 149+
Job Description
Similar Jobs
Please verify your account first! Send OTP

Please click on the Apply to verify the status of jobs posted more than 15 days ago, as they may have expired. Similar Jobs

Job Description

Were looking for an Agentic AI Reliability Engineer who thrives at the intersection of DevOps, support engineering, and AI infrastructure. This role is critical to maintaining the stability, uptime, and seamless performance of our no-code Agentic AI platform, which powers autonomous digital workers across industries.
Youll be responsible for building core support and monitoring infrastructure, handling incident response, collaborating with engineering for escalations, and applying site reliability best practices to intelligent, multi-agent systems.
Responsibilities;
  • Build and manage support tooling and observability pipelines for AI agent operations.
  • Troubleshoot, investigate, and resolve incidents across multi-agent AI workflows.
  • Collaborate with engineering on complex technical escalations and fixes.
  • Monitor system health using tools like Prometheus, Cloudwatch, and custom LLM telemetry.
  • Ensure CI/CD reliability for agent deployment cycles.
  • Maintain clear, proactive communication with customers and internal teams.
  • Create and maintain high-quality documentation and knowledge base articles.
  • Continuously improve incident response playbooks and automation.
Requirements
  • BE/BTech/BS or MS in Computer Science or related field.
  • 3+ years of experience in SRE, DevOps, or technical support engineering.
  • Deep understanding of SDLC, release management, and system reliability.
  • Looking to get Placed? Try our Placement Guarantee Plan

    Familiarity with support systems and ticketing workflows.
  • Proficiency in AWS/Azure, CI/CD pipelines (Jenkins), Ansible, and infrastructure-as-code.
  • Experience with observability tools like Datadog, Prometheus, SIP, Homer, and Cloudwatch.
  • Strong written and verbal communication skills.
  • Experience supporting GenAI or agentic AI applications in production.
  • Familiarity with LLM orchestration, prompt reliability, or RAG systems.
  • Passion for automation and building resilient AI-powered platform infrastructure.
  • Exposure to managing infrastructure for applications like LangChain, AutoGen, or similar agent orchestration frameworks.
This job was posted by Jayanth Babu from Avaamo.

Skills

Ai

If an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.

Important dates & deadlines?

Application Deadline

01 Jan 26, 05:43 PM IST

Similar Jobs

View All
Loading...
Bag Logo
Jobaaj
Don't Miss out any Updates

Subscribe now for the latest job alerts
and never miss an update

Job Alert
Google hiring for Specific Roles Apply Now!
1 min ago
New Opportunity
Amazon is hiring freshers Apply Now!
5 min ago
Featured Jobs
Microsoft opening 50+ positions Apply Now!
10 min ago

AI Reliability Engineer - DevOps

Share with