Abigail Haddad - ML Engineer & Data Scientist

About Me

I'm an ML Engineer and Data Scientist working on modular tools for unstructured text data pipelines - focused on asking consistent questions of document sets, then evaluating and displaying the results.

My background is in data science and Python. I earned a Public Policy PhD from RAND, worked as a data scientist for the Department of the Army, and until recently was part of the DHS AI Corps.

My areas of interest include:

Methods for evaluating and classification of text data, including via LLMs
Government data workforce and talent management issues
Code-first data science and processes for better transparency and replication
DC school data

Get in touch if you're interested in getting involved with/sponsoring Data Community DC or having me speak at your event.

Selected Talks

"Automating Tests for your RAG Chatbot or Other Generative Tool" - New York R Conference

Video

A Framework for Testing LLMs for Outputting Hazardous Information - Generative AI DC

Video

Automated Evaluations for Your RAG Chatbot or Other Generative Tool - AI In Production

Video

What Job Is This, Anyway?: Using LLMs to Classify USAJobs Data Scientist Listings - RGOV Conference 2023

Video Slides and Notes Code

GitHub: How To Tell Your Professional Story - posit::conf(2024)

Video

Selected Writing

What To Ask Your Engineers: Evaluating AI for Less-Technical Stakeholders

Guide for non-technical stakeholders on how to evaluate AI solutions and what questions to ask.

Your Coders Use LLMs: How Should Your Organization Adapt?

Strategies for organizations to effectively adapt to the widespread use of LLMs in coding workflows.

Automating LLM Evaluation: A Guide for RAG Chatbots and Other Very Specific Generative Tools

Comprehensive approach to evaluating and testing LLM-based tools and chatbots.

When Models Give Different Answers To The Same Question: Incorporating Non-Deterministic Tools Into Deterministic Processes

Exploring strategies for handling non-deterministic AI outputs in production systems.

Can Mistral Large Do That? Using an LLM Grader to Quickly Evaluate New Models

Techniques for rapidly assessing new LLM capabilities using other models as graders.

Mapping Testing to Threats: For Pre-release Testing, the Threat Model Should Inform the Methods

Approach to testing AI systems based on their specific threat models.

GitHub: How To Tell Your Professional Story

Strategies for effectively using GitHub to showcase your skills and experience.

Improving Your Junior Data Scientist Resume

Practical advice for early-career data scientists on optimizing their resumes.

See all articles on my Substack →