Distinguished, Data Scientist - Quality & LLM Judg...
- Walmart
- Sunnyvale, California
- Full Time
About the Role
Walmarts Next Gen Commerce team is shaping the future of conversational shopping by building intelligent agents that not only respond, but reason, recommend, and proactively assist customers. As a Distinguished Data Scientist for Quality & LLM Judging Systems in Conversational Commerce , you will serve as the key IC partner to the Director of Data Science for this space. You will lead the technical vision and model development for cutting-edge evaluation methodologies to measure and improve the quality of AI-powered conversations and tool outputs.
Youll help define how we evaluate our agents and their dependent tools using a combination of human-labeled benchmarks, LLM-as-a-judge systems, and scalable automated pipelines. You'll design prompts, validate agreement with human judgment, and develop LLM distillation strategies to replicate high-quality judgment cost-effectively.
This is a high-impact, hands-on technical role requiring deep expertise in LLM prompting, evaluation frameworks, and structured experimentation. You will work closely with modeling, product, and platform teams to ensure that measurement drives improvement, and that the agents behaviors align with quality, safety, and relevance at every step.
Responsibilities
- Design evaluation pipelines for conversational agents and their tool outputs using LLM-as-a-judge, human annotation, and hybrid methods
- Develop high-quality prompts for structured evaluation tasks and iterate based on inter-rater reliability with human judges
- Develop novel techniques to assess non-textual or subjective outputssuch as recommendations, summaries, and agent-driven actionswhere standard metrics fall short
- Guide the modeling team to distill or fine-tune smaller LLMs to act as scalable evaluation proxies
- Work with engineering partners to integrate evaluation hooks into model training, validation, and production workflows
- Conduct in-depth failure mode analysis and define actionable quality signals that inform model and production iteration.
- Uphold statistical rigor in metric design, validation, and experimental analysis to ensure reliable and interpretable results
- Foster a culture of principled measurement and trustworthy AI throughout the organization
Minimum Qualifications
- 7+ years of experience in data science or machine learning, preferably in evaluation, NLP, or conversational AI
- Hands-on experience with large language models, including prompt engineering, response grading, and structured generation tasks
- Familiarity with both human annotation workflows and automated evaluation strategies using LLMs
- Deep understanding of metric design, evaluation reliability, and statistical validity
- Strong software engineering fundamentals and ability to own end-to-end pipelines
- Excellent communication skills and the ability to influence without authority across functions
Preferred Qualifications
- Graduate degree (M.S./Ph.D.) in Computer Science, Machine Learning, NLP, or a related field
- Experience with conversational AI, summarization, retrieval-augmented generation, or recommendation evaluation
- Knowledge of model distillation, LoRA, instruction tuning, or parameter-efficient adaptation techniques
- Familiarity with evaluating open-ended outputs where ground truth is subjective or contextual
- Publications, patents, or open-source contributions in LLM evaluation or applied AI
Why Join Us?
This is a rare opportunity to shape the science behind how intelligent agents are judgedliterally. Your work will directly define what quality means in conversational commerce and enable AI systems that are not only functional but truly helpful, engaging, and aligned with human expectations.
At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable.For information about PTO, see .
Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.Eligibility requirements apply to some benefits and may depend on your job classification and length of employment. Benefits are subject to change and may be subject to a specific plan or program terms.For information about benefits and eligibility, see One.Walmart .
Sunnyvale, California US-11656:The annual salary range for this position is $169,000.00-$338,000.00Bentonville, Arkansas US-10735:The annual salary range for this position is $130,000.00-$260,000.00Additional compensation includes annual or quarterly performance bonuses.Additional compensation for certain positions may also include:- Stock Minimum Qualifications...Outlined below are the required minimum qualifications for this position. If none are listed, there are no minimum qualifications.
Option 1: Bachelors degree in Statistics, Economics, Analytics, Mathematics, Computer Science, Information Technology or related field and 6 years' experience in an analytics related field. Option 2: Masters degree in Statistics, Economics, Analytics, Mathematics, Computer Science, Information Technology or related field and 4 years' experience in an analytics related field. Option 3: 8 years' experience in an analytics or related field. Preferred Qualifications...Outlined below are the optional preferred qualifications for this position. If none are listed, there are no preferred qualifications.
Data science, machine learning, optimization models, PhD in Machine Learning, Computer Science, Information Technology, Operations Research, Statistics, Applied Mathematics, Econometrics, Publications or active peer reviewer in related journals or conference, Successful completion of one or more assessments in Python, Spark, Scala, or R, Using open source frameworks (for example, scikit learn, tensorflow, torch), We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmarts accessibility standards and guidelines for supporting an inclusive culture. Primary Location... 1375 Crossman Ave, Sunnyvale, CA 94089-1114, United States of America