Please click on the Apply to verify the status of jobs posted more than 15 days ago, as they may have expired. Similar Jobs
Job Description
About the Role
We are looking for a Data Engineer / Machine Learning Engineer to build and scale the data and intelligence layer powering our multi-tenant SaaS platform. You will own data pipelines that ingest, transform, and serve billions of events across analytics, billing, and product surfaces — and increasingly power ML-driven features such as scoring, recommendations, and intelligent automation.
This is a hands-on role with deep ownership: you will design schemas, build pipelines end-to-end, optimize query performance on ClickHouse, and ship ML workflows into production. You will work closely with backend, product, and founding engineering leadership.
Core Responsibilities
- Design, build, and maintain scalable batch and streaming data pipelines using Apache Airflow and Python.
- Model and optimize analytical workloads on ClickHouse — including partition strategy, sort keys, materialized views, and ReplacingMergeTree / AggregatingMergeTree patterns.
- Build and maintain ETL/ELT workflows for ingestion from operational stores (Mysql, PostgreSQL, MongoDB) into the analytics warehouse.
- Develop, deploy, and monitor machine learning models — from feature engineering to training, evaluation, and production serving.
- Define and enforce data contracts, schema evolution, and data quality checks across services.
- Partner with backend teams to instrument event tracking and ensure data correctness across multi-tenant boundaries.
- Optimize query performance and cost; investigate and resolve slow queries, full partition scans, and skew issues.
- Contribute to the MLOps stack: model versioning, experiment tracking, monitoring, and retraining pipelines.
- Write clean, tested, well-documented code. Participate in code reviews and design discussions.
Mandatory Skills
- Python Programming — strong proficiency, including pandas, NumPy, and production-grade code (typing, packaging, testing).
- Data Pipelines — solid experience designing batch and/or streaming pipelines, with awareness of idempotency, backfills, and failure recovery.
- Apache Airflow — authoring DAGs, custom operators, sensors, and managing dependencies in production.
- ClickHouse — hands-on experience with table engines (MergeTree family), partitioning, sort keys, and materialized views.
- SQL — advanced proficiency: window functions, CTEs, query plans, and performance tuning on large datasets.
- Relational and NoSQL databases — working knowledge of PostgreSQL and MongoDB (schemas, indexing, CDC patterns).
- Distributed data processing — practical experience with PySpark, Dask, or equivalent for large-scale transforms.
- Message brokers & streaming — hands-on experience with RabbitMQ and Apache Kafka; understanding of producers/consumers, partitioning, consumer groups, delivery guarantees, and dead-letter handling.
- Machine Learning fundamentals — supervised/unsupervised techniques, model evaluation, and at least one framework (scikit-learn, PyTorch, or TensorFlow).
- Version control with Git and collaborative workflows (PRs, code reviews).
Looking to get Placed? Try our Placement Guarantee Plan
Preferred Skills
- Experience in agile development environments.
- Familiarity with DevOps tools and CI/CD pipelines (e.g., GitHub Actions, GitLab CI, Jenkins).
- Knowledge of containerization tools like Docker and orchestration platforms like Kubernetes.
- Exposure to cloud platforms like AWS or GCP (BigQuery, Cloud Composer, GKE, Pub/Sub, Dataflow) is a plus.
- Familiarity with CDC tools (Debezium) and stream processing frameworks (Kafka Streams, Flink).
- Exposure to MLOps tooling — MLflow, Weights & Biases, SageMaker, Vertex AI, or equivalent.
- Experience with LLMs and Generative AI — embeddings, RAG, vector databases (pgvector, Pinecone, Weaviate), and prompt orchestration frameworks.
- Familiarity with observability tools — Grafana, Prometheus, or Datadog — for data pipelines.
- Bachelors or Masters degree in Computer Science, Engineering, Mathematics, or related field.
Skills
PythonEtlData ProcessingMachine LearningMysqlData EngineerAnalyticsAiMlSqlMl EngineerIf an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.
About Company
Important dates & deadlines?
Application Deadline
15 Jul 26, 06:21 PM IST
Similar Jobs
View All

