Episode 60

Beyond Checklists: Evaluating Conversational AI

In this episode of the Data Science Salon Podcast, we sit down with Carlos Aguilar, Head of Product at Hex and former founder of Hashboard, to discuss a topic critical for every data team: how to properly evaluate AI analytics tools.

Carlos shares why traditional checklist-based evaluations fail for conversational AI and generative analytics tools, and how focusing on context, workflow, and real user testing can dramatically improve the chances of success. Drawing on his experience leading the Data Insights team at Flatiron Health, he provides practical guidance for both end-users and data teams.

Key Highlights:

  • End-User vs Data Team Evaluation: Why both perspectives are crucial for measuring AI effectiveness.
  • Context Management: How setting up reference questions ensures accurate and relevant answers.
  • Workflow & Observability: Why monitoring and iterating on AI outputs is essential for real-world success.
  • Lessons from the Field: Examples of tools that look good in demos but fail in production—and how to avoid those pitfalls.

🎧 Tune in to Episode 60 to learn how to evaluate AI analytics tools the right way and ensure your data teams deploy solutions that actually work in practice.


Be sure to mark your calendars for the 9th annual DSS ATX on Feb 18, where we will focus on GENAI AND INTELLIGENT AGENTS IN THE ENTERPRISE. Join us to hear from experts on how AI is shaping the future of the enterprise. https://www.datascience.salon/austin/