Please click on the Apply to verify the status of jobs posted more than 15 days ago, as they may have expired. Similar Jobs
Job Description
Our Cosmos infrastructure team sits at the heart of this mission. We build the systems that make it possible to train Cosmos, NVIDIAs world foundation model for physical AI. Cosmos enables large-scale AI models for robots, autonomous agents, and AI systems to understand, plan, and act in complex environments. Our team develops the Cosmos platform infrastructure that powers model training, data pipelines, simulation, and deployment at scale, enabling research and production to move faster and more efficiently than ever before. This role is a unique opportunity to work on infrastructure that directly enables physical AI at scale - from optimizing massive data pipelines to designing training workflows that support foundation models, and from scaling distributed compute systems to building the backbone for simulation-driven experimentation.
What Youll Be Doing:
- Design, build, and operate scalable infrastructure for training Cosmos and supporting large-scale data pipelines
- Develop high-throughput systems for data processing, retrieval, and workflow orchestration
- Collaborate across research, optimization, and platform teams to accelerate experiments and deployments
- Improve system reliability, performance, and observability across distributed compute environments
- Contribute to long-term infrastructure strategy for training, data management, and large-scale compute efficiency
What We Need to See:
- A Masters Degree in Computer Science, Computer Engineering, related STEM Degree, or equivalent experience.
- Strong engineering background in distributed systems, ML infrastructure, or large-scale compute/data platforms with 6 years of relevant work experience
- Proficiency in Python and at least one systems language (e.g., C++/Go/Rust)
- Experience with orchestration systems, scheduling, and scalable storage or data pipelines
- Ability to work across teams, drive technical clarity, and deliver robust solutions in complex environments
- Comfortable bridging research workflows and production-grade systems
Looking to get Placed? Try our Placement Guarantee Plan
- Experience building or optimizing infrastructure for large-scale model training
- Hands-on work with distributed compute environments or high-performance systems
- Familiarity with synthetic data, simulation pipelines, or large multimodal datasets
- Contributions to open-source infrastructure or large-scale internal tooling
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5.
You will also be eligible for equity and benefits.
Applications for this job will be accepted at least until December 27, 2025.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
Skills
CPythonDistributed SystemsHigh-performance SystemsSoftware EngineerIf an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.
Important dates & deadlines?
Application Deadline
27 Feb 26, 12:07 PM IST
Similar Jobs
View All

