Please click on the Apply to verify the status of jobs posted more than 15 days ago, as they may have expired. Similar Jobs
Job Description
Senior Data Scientist: AI Training Data (2-4 Months Contract)
Company: BespokeLabs (VC-backed; founded by IIT & Ivy League alumni)
Location: Remote
Role Type: Contract (2-4 Months)
Time Commitment: 40 hrs/week (Full-time availability required)
Compensation: Hyper-competitive hourly rate (matching top-tier Senior Data Scientist bands) Experience: 6+ years
About BespokeLabs
BespokeLabs is a premier, VC-backed AI Research lab with an exceptionally talent-dense team of IIT and Ivy League alumni. We dont just build tooling around AIwe build the massive-scale data systems and reasoning architectures that directly power next-generation models. Our research shapes the frontier of AI: weve published breakthroughs like GEPA, driven foundational datasets like OpenThoughts, and shipped state-of-the-art models including Bespoke-MiniCheck and Bespoke-MiniChart. More on our website bespokelabs.ai :)
Role Overview
We are looking for a high-impact Senior Data Scientist for an intensive, 2-month sprint. You will leverage your deep expertise in production-grade machine learning and applied statistics to develop the algorithms and logic that curate and evaluate datasets for advanced AI model training.
This is not a traditional model-building or research role. We need a seasoned practitioner who has already owned the end-to-end DS lifecycle at scale. You will use your intuition for feature engineering, statistical validity, and large-scale data processing to programmatically generate, shape, and validate AI training data.
What You Will Do (The Contract)
- Algorithm Design: Design and implement custom statistical models and programmatic logic (e.g., anomaly detection, active learning, similarity scoring) to evaluate data quality, complexity, and redundancy at scale.
- Hands-on At-Scale Coding: Write scalable PySpark and Python (NumPy/Pandas) code to apply these algorithms across massive datasets, translating experimental logic into reliable, large-scale workflows.
- Metric Formulation: Develop custom quantitative metrics and heuristic benchmarks to rigorously assess the fidelity and suitability of data subsets for specific AI training objectives.
- Validation & Iteration: Run high-speed validation cycles, analyzing the output of data-curation algorithms to diagnose skew, bias, or noise, and iteratively refining the logic.
- High-Level Curation: Apply Senior-level domain expertise in predictive modeling and feature engineering to ensure the final training inputs meet the strict standards required for state-of-the-art ML systems.
What You Bring to the Table (Your Past Experience)
To be successful in this contract, you must have a track record of:
- The End-to-End DS Lifecycle: Framing problems, modeling, validation, production, and iteration.
- Production Ownership: Building and deploying ML and statistical models on large-scale datasets.
- Large-Scale Data Processing: Working with Apache Spark to develop scalable feature pipelines and offline training workflows.
- Experimentation: Designing and analyzing rigorous experiments (A/B tests, causal inference).
- Impact: Translating complex model outputs into clear product and business decisions.
Required Qualifications (Non-Negotiable)
- Experience: 6+ years as a Data Scientist or Applied Scientist.
- Production Background: Proven ownership of models running in production environments.
Looking to get Placed? Try our Placement Guarantee Plan
- Applied Statistics: Strong background in applied statistics and experimentation frameworks.
Core Technical Skills
- Languages: Python (NumPy, Pandas, Scikit-learn, PyTorch / TensorFlow) and Strong SQL.
- Big Data: Apache Spark (PySpark or Spark SQL) for large-scale data processing.
- Methodologies: Feature engineering, model evaluation, statistical modeling, and hypothesis testing.
Strong Signals (Highly Valued)
- Scale: Models trained on TB-scale datasets.
- Domain Specificity: Experience in high-complexity domains such as: Recommendations, Pricing, Fraud / risk, Search / ranking, or Growth & experimentation.
- Collaboration: Experience deploying models alongside data engineering pipelines.
Out of Scope (Who Should Not Apply)
- BI / reporting-only roles
- SQL-only analysts
- Research-only ML roles with no production ownership
- Early-career profiles
Skills
Big DataPythonData ProcessingMachine LearningPredictive ModelingStatistical ModelingApplied ScientistData ScientistAiMlSqlIf an employer asks you to pay any kind of fee, please notify us immediately. Jobaaj does not charge any fee from the applicants and we do not allow other companies also to do so.
About Company
Important dates & deadlines?
Application Deadline
28 Mar 26, 03:40 PM IST
Similar Jobs
View All



