Skills highlighted in blue are preferred key skills
Job Description – Data Scientist / Applied ML Engineer
Location: August Kranti Marg, Siri Fort Institutional Area, near Siri Fort Auditorium, New Delhi - 110049
Experience: 4 – 6 years
C2H - 3 months with the possibility of an extension based on business requirements.
Budget - 75k to 80K per month
Role Overview
We are looking for a highly skilled Data Scientist / Applied ML Engineer to build and optimize DigiCatalog’s probabilistic AI stack for catalogue intelligence, multilingual understanding, and product attribute extraction.
The role focuses on selecting, orchestrating, and evaluating open-source AI models across OCR, computer vision, multimodal AI, image enhancement, and Indic language processing. The ideal candidate should have strong expertise in applied CV/NLP systems, model evaluation, pipeline optimization, and practical ML engineering with a strong focus on cost, latency, and quality trade-offs.
This role emphasizes building efficient, purpose-built AI pipelines rather than relying on large general-purpose models.
Key Responsibilities
Design and orchestrate AI/ML pipelines using open-source models across:
OCR
Computer Vision
Multimodal AI
Indic language processing
Image enhancement and correction
Select and optimize models based on:
Cost
Latency
Accuracy
Deployment constraints
Edge-to-cloud trade-offs
Build probabilistic AI workflows using small specialized models instead of defaulting to large LLMs.
Work with technologies and frameworks such as:
DocTR
PaddleOCR
TrOCR-class models
Grounding DINO
OWL-ViT
CLIP
2–3B parameter VLMs
Real-ESRGAN
SDXL-Turbo class models
IndicTrans2
NLLB
IndicWhisper
Build and maintain:
Golden datasets
Automated evaluation pipelines
Regression testing frameworks
Model benchmarking systems
Define and monitor evaluation metrics including:
Attribute F1 score
OCR CER/WER
Hallucination rate
Per-language performance slices
Perform:
Error analysis
Threshold tuning
Calibration
Ensembling
Post-processing optimization
Run regression validation on every model replacement or pipeline change.
Clean, curate, and prepare datasets for training and evaluation workflows.
Fine-tune models only when justified through measurable cost and quality improvements.
Continuously evaluate emerging research and rapidly productionize relevant open-source innovations.
4-6 years Immediate C2H joiners
Required Skills & Qualifications
4–6 years of experience in:
Applied Machine Learning
Computer Vision
NLP
Multimodal AI systems
Strong hands-on expertise in:
PyTorch
Python
ML pipeline orchestration
Model evaluation systems
Experience with:
OCR systems
Vision-language models
Image processing pipelines
ConnectGuru is a specialized IT staffing partner, connecting top talent in enterprise applications, cloud, and digital transformation. With a presence in Mumbai and Indore, we focus on speed, integrity, and quality placements.”