Director Backend Engineering - AI Infrastructure

Department Icon Data Science Analytics & Machine Learning
149+ Applicants
Posted: 18 hours ago
15-17 years
Bengaluru / Bangalore, Karnataka
work from office

Posted: 18 hours ago
|
Applicants: 149+
Job Description
About Company
Similar Jobs
Please verify your account first! Send OTP

Job Description

Company Introduction

We exist to wow our customers. We know were doing the right thing when we hear our customers say, How did I ever live without Coupang Born out of an obsession to make shopping, eating, and living easier than ever, we are collectively disrupting the multi-billion-dollar commerce industry from the ground up and establishing an unparalleled reputation for being leading and reliable force in South Korean commerce.

We are proud to have the best of both worlds — a startup culture with the resources of a large global public company. This fuels us to continue our growth and launch new services at the speed we have been since our inception. We are all entrepreneurs surrounded by opportunities to drive new initiatives and innovations. At our core, we are bold and ambitious people that like to get our hands dirty and make a hands-on impact. At Coupang, you will see yourself, your colleagues, your team, and the company grow every day.

Role Overview

We are seeking a visionary Director of Backend Engineering to lead the teams responsible for the software brain that manages our global AI Physical Infrastructure. You will oversee the development of the SDN orchestrators, automated fleet management systems, and the high-performance storage backends that power our AI training and inference clusters.

Your mission is to abstract the complexity of specialized hardware (NVIDIA/HPC) into a seamless, automated, and hyper-reliable cloud platform.

Key Responsibilities

1. Strategic Leadership & Fleet Orchestration

  • Software-Defined Infrastructure: Lead the design and delivery of an SDN Orchestrator to automate complex GPU networking (InfiniBand/RoCE/NVLink) and core DC routing.
  • Fleet Health Automation: Oversee the development of backend services for GPU Health & Fault Detection, automating the lifecycle from burn-in and diagnostics to global RMA workflows.
  • Capacity & Traffic Engineering: Drive the backend logic for global traffic routing, load balancing (NGINX/Kong), and IPAM to ensure zero-bottleneck training environments.

2. Data & Storage Systems

  • HPC Data Pipelines: Collaborate with storage engineers to build backend interfaces for Parallel File Systems (Lustre, Weka, VAST etc.), ensuring high-throughput data delivery to compute nodes.
  • Storage Durability: Direct the backend strategy for AI Object Storage, focusing on high durability and low-latency retrieval for massive datasets.

3. Engineering Excellence

  • Scalable Architecture: Act as the final technical authority for AI Infra Architecture, ensuring systems are resilient, multi-region, and capable of sub-millisecond coordination.
  • DevOps & IaC Culture: Champion a Hardware-as-Code mindset, utilizing Python, Ansible, and Terraform to eliminate manual intervention in DC operations.

4. Team Development

  • Lead a multi-disciplinary org including Backend Developers, SDN Engineers, and Infra Ops teams, AI Infra Engineering
  • Establish 24/7 L1/L2/L3 operational standards to maintain > 99.99% availability of the AI fleet.

Required Qualifications

  • Experience: 15+ years in Backend Engineering, with at least 5 years in a leadership role managing complex infrastructure (Cloud, FinTech, or HPC).
  • Deep Infrastructure Knowledge: Proven experience with Linux internals, hardware-software interfaces (drivers/firmware), and distributed systems.
  • Networking Mastery: Solid understanding of L2/L3 networking, and ideally, specialized fabrics like InfiniBand or RoCE.
  • The Stack: Professional proficiency in Python, Go, or C++, and deep experience with Terraform, Kubernetes, and Ansible.
  • Large-Scale Data:

    Looking to get Placed? Try our Placement Guarantee Plan

    Experience managing high-performance storage backends (GPFS, Lustre, or equivalent parallel systems).
  • Hardware Savvy: You dont just write code; you understand power envelopes, liquid cooling constraints, and GPU architecture (NVIDIA/HPE/Dell).

Preferred Skills

  • Experience building custom SDN controllers or orchestration layers from scratch.
  • Direct experience with NVIDIA or GPUDirect technologies.
  • Previous success in a Hyper-scale environment (AWS, Azure, GCP, Meta, AI Cloouds etc.).

Our Hybrid work model

Coupang hybrid work model is designed to enable a culture of collaboration that acts a catalyst to enrich the experience of employees. Employees are required to work at least 3 days in the office per week, with the flexibility to work from home 2 days a week, depending on the role requirement. Some businesses may require more time in office due to nature of work.

Details to consider

Those eligible for employment protection (recipients of veterans benefits, the disabled, etc.) may receive preferential treatment for employment in accordance with applicable laws.

Privacy Notice

Your personal information will be collected and managed by Coupang as stated in the Application Privacy Notice located below.

https://privacy.coupang.com/en/land/jobs/

Equal Opportunities for All

Coupang is an equal opportunity employer and is committed to equal opportunity regardless of color, ancestry, religion, sex, nation origin, sexual orientation, age, citizenship, marital status, disability, gender identity or veteran status. Our unprecedented success could not be possible without the valuable inputs of our globally diverse team.

Skills

PythonAi

If an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.

About Company

Coupang is a South Korean e-commerce company. It is the largest e-commerce company in South Korea by revenue.

Important dates & deadlines?

Application Deadline

18 Aug 26, 03:08 PM IST

Similar Jobs

View All
Loading...
Bag Logo
Jobaaj
Don't Miss out any Updates

Subscribe now for the latest job alerts
and never miss an update

Job Alert
Google hiring for Specific Roles Apply Now!
1 min ago
New Opportunity
Amazon is hiring freshers Apply Now!
5 min ago
Featured Jobs
Microsoft opening 50+ positions Apply Now!
10 min ago

Director Backend Engineering - AI Infrastructure

Share with