Data/ML Engineer

Department Icon Data Science Analytics & Machine Learning
149+ Applicants
Posted: 3 weeks ago
0-1 years
Gurugram, Haryana
work from office

Posted: 3 weeks ago
|
Applicants: 150+
Job Description
About Company
Similar Jobs
Please verify your account first! Send OTP

Please click on the Apply to verify the status of jobs posted more than 15 days ago, as they may have expired. Similar Jobs

Job Description

About the Role

We are looking for a Data Engineer / Machine Learning Engineer to build and scale the data and intelligence layer powering our multi-tenant SaaS platform. You will own data pipelines that ingest, transform, and serve billions of events across analytics, billing, and product surfaces — and increasingly power ML-driven features such as scoring, recommendations, and intelligent automation.

This is a hands-on role with deep ownership: you will design schemas, build pipelines end-to-end, optimize query performance on ClickHouse, and ship ML workflows into production. You will work closely with backend, product, and founding engineering leadership.

Core Responsibilities

  • Design, build, and maintain scalable batch and streaming data pipelines using Apache Airflow and Python.
  • Model and optimize analytical workloads on ClickHouse — including partition strategy, sort keys, materialized views, and ReplacingMergeTree / AggregatingMergeTree patterns.
  • Build and maintain ETL/ELT workflows for ingestion from operational stores (Mysql, PostgreSQL, MongoDB) into the analytics warehouse.
  • Develop, deploy, and monitor machine learning models — from feature engineering to training, evaluation, and production serving.
  • Define and enforce data contracts, schema evolution, and data quality checks across services.
  • Partner with backend teams to instrument event tracking and ensure data correctness across multi-tenant boundaries.
  • Optimize query performance and cost; investigate and resolve slow queries, full partition scans, and skew issues.
  • Contribute to the MLOps stack: model versioning, experiment tracking, monitoring, and retraining pipelines.
  • Write clean, tested, well-documented code. Participate in code reviews and design discussions.

Mandatory Skills

  • Python Programming — strong proficiency, including pandas, NumPy, and production-grade code (typing, packaging, testing).
  • Data Pipelines — solid experience designing batch and/or streaming pipelines, with awareness of idempotency, backfills, and failure recovery.
  • Apache Airflow — authoring DAGs, custom operators, sensors, and managing dependencies in production.
  • ClickHouse — hands-on experience with table engines (MergeTree family), partitioning, sort keys, and materialized views.
  • SQL — advanced proficiency: window functions, CTEs, query plans, and performance tuning on large datasets.
  • Relational and NoSQL databases — working knowledge of PostgreSQL and MongoDB (schemas, indexing, CDC patterns).
  • Distributed data processing — practical experience with PySpark, Dask, or equivalent for large-scale transforms.
  • Message brokers & streaming — hands-on experience with RabbitMQ and Apache Kafka; understanding of producers/consumers, partitioning, consumer groups, delivery guarantees, and dead-letter handling.
  • Machine Learning fundamentals — supervised/unsupervised techniques, model evaluation, and at least one framework (scikit-learn, PyTorch, or TensorFlow).
  • Version control with Git and collaborative workflows (PRs, code reviews).

    Looking to get Placed? Try our Placement Guarantee Plan

Preferred Skills

  • Experience in agile development environments.
  • Familiarity with DevOps tools and CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins).
  • Knowledge of containerization tools like Docker and orchestration platforms like Kubernetes.
  • Exposure to cloud platforms like AWS or GCP (BigQuery, Cloud Composer, GKE, Pub/Sub, Dataflow) is a plus.
  • Familiarity with CDC tools (Debezium) and stream processing frameworks (Kafka Streams, Flink).
  • Exposure to MLOps tooling — MLflow, Weights & Biases, SageMaker, Vertex AI, or equivalent.
  • Experience with LLMs and Generative AI — embeddings, RAG, vector databases (pgvector, Pinecone, Weaviate), and prompt orchestration frameworks.
  • Familiarity with observability tools — Grafana, Prometheus, or Datadog — for data pipelines.
  • Bachelors or Masters degree in Computer Science, Engineering, Mathematics, or related field.

Skills

PythonEtlData ProcessingMachine LearningMysqlData EngineerAnalyticsAiMlSqlMl Engineer

If an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.

About Company

RoundCircle is a global Technology and Consulting firm helping organizations accelerate growth through Digital Transformation and Product Engineering. We specialize in Healthcare, Fintech, Supply Chain and Manufacturing. Our expertise lies in solving complex business problems using cutting edge technology. We bring together the right combination of people, process and technology to deliver innovative solutions. Our team is passionate about excellence and committed to delivering exceptional value for our clients. We provide flexible solutions that are tailored to the unique needs of each organization.
Read More

Important dates & deadlines?

Application Deadline

15 Jul 26, 06:21 PM IST

Similar Jobs

View All
Loading...
Bag Logo
Jobaaj
Don't Miss out any Updates

Subscribe now for the latest job alerts
and never miss an update

Job Alert
Google hiring for Specific Roles Apply Now!
1 min ago
New Opportunity
Amazon is hiring freshers Apply Now!
5 min ago
Featured Jobs
Microsoft opening 50+ positions Apply Now!
10 min ago

Data/ML Engineer

Share with