Core Responsibilities
• Collaborate directly with technical leads to design evaluation experiments for model
performance assessment
• Set up controlled testing environments for LLM-as-judge scenarios
• Analyze experimental outputs and translate findings into clear, actionable insights
• Develop feature engineering approaches to understand dataset characteristics and quality
Required Technical Skills
• Python, statistical analysis, hypothesis testing
• Advanced data manipulation, feature selection, dimensionality reduction
• A/B testing, cross-validation, statistical significance testing
• Understanding of language model evaluation challenges and LLM-asjudge methodologies
Preferred
• Experience with PyTorch/TensorFlow for model analysis
• SQL for data extraction and analysis
• Visualization tools (matplotlib, seaborn, plotly) for results presentation
• Background in NLP evaluation or model interpretability
Interested parties please send your full resume with your current and expected salary to shirley.cho@manpowergrc.hk
Type: Contract
Category: I.T & T - Engineering
Reference ID: 508-24112025-SC
Date Posted: 24/11/2025