DISCOVER THE LATEST AND SHARPEN YOUR SKILLS IN AI, DATA SCIENCE, MACHINE LEARNING, AND RELATED TOPICS.

Past webinars

Regulatory-Grade Multimodal Medical Data De-Identification and Tokenization

Recorded on: July 16, 2025 @ 2pm ET

Healthcare and life science organizations are increasingly working with large-scale, multimodal datasets that include structured records, clinical notes, diagnostic images, and PDF documents.

Sharing this data for research and AI development requires rigorous de-identification to ensure patient privacy — without compromising the ability to extract insights across time and modalities.

In this webinar, experts from John Snow Labs and Databricks will demonstrate an end-to-end solution for automating the de-identification and tokenization of medical data with regulatory-grade accuracy. You’ll learn how to:

  • Automatically de-identify structured data, unstructured text, DICOM & JPEG images, whole-slide pathology images (SVS), and PDFs using John Snow Labs’ industry-leading software and AI models
  • Apply patient tokenization to enable linking of de-identified data across modalities and time points
  • Use Databricks to process and scale these capabilities across large, real-world datasets
  • Support HIPAA, GDPR, and other regulatory requirements for privacy-preserving research

This session is ideal for data scientists, clinical researchers, compliance teams, and healthcare IT leaders working with multimodal patient data who want to enable longitudinal, privacy-compliant research at scale.

 

Srikanth Kumar Rana

Solutions Architect
Databricks

 

Youssef MELLAH, Ph.D.

Senior Data Scientist & Machine Learning Engineer
John Snow Labs

Open-Source Multimodal Data Ingestion and Enrichment at Scale with Spark NLP 6

Recorded on: Wednesday, May 28 @ 2pm ET

This webinar introduces the recently released Spark NLP 6.0, an Apache 2.0 licensed open-source Python library which enables you to analyze large amounts of multi-modal data for batch LLM inference or to prepare data for RAG & LLM solutions – privately, efficiently, and at no cost. The library can operate on a single machine or container, or scale natively on any Spark hardware without code changes. Spark NLP recently crossed 150M downloads and this new release adds supports 3 major new use cases:

  • Support for ingesting and pre-processing PDF, Excel, PowerPoint, text and image files. Prepare, analyze, and ingest all files formats into a LLM / RAG solution using one unified pipeline.
  • Visual language models! Multiple VLMs of different sizes & features are natively available as steps in processing pipelines, enabling you to extract facts and answers from images and visual PDF files.
  • Extract structure, semantics, and metadata from unstructured and visual data in all file formats – using batch inference at scale.

Join to learn how to apply these new capabilities by walking through Python notebooks showcasing end-to-end scenarios.

 

Maziyar Panahi

Principal AI Engineer & Team Lead
at John Snow Labs

EvalOps Efficiency: Empowering AI with Cost-Effective LLM Judges

Recorded on: Thursday, March 13 @ 2pm ET

Join us to explore the dual themes of agent evaluations and EvalOps in this comprehensive technical session. This webinar will focus on how to effectively integrate cost-effective LLM judges into AI systems, enhancing the precision and efficiency of agent decisions and overall LLM evaluations. You’ll learn about the strategic application of EvalOps and agent-specific evaluations, balancing accuracy, speed, and cost, backed by practical case studies.

Key Takeaways:
– Gain insights into both specific agent evaluations and general LLM evaluations through EvalOps.
– Discover strategies to optimize both agent performance and overall system evaluations.
– Examine case studies demonstrating successful implementations of agent evaluations and EvalOps.
– Learn how to maintain budget control while maximizing performance and accuracy.

This webinar is designed for ML engineers and data scientists looking to deepen their expertise in sophisticated AI model management.

 

Ari Heljakka

Co-Founder & CEO, Root Signals

Matching Patients with Clinical Guidelines

Recorded on: Wednesday, January 22 @ 2pm ET

Healthcare systems, payers, and medical societies invest massive effort to maintain evidence-based clinical guidelines for a variety of conditions. However, when patients are in the hospital, often clinicians just don’t have the time to research or read these guidelines, leading to major gaps in how consistently they are applied. Recent advances in Medical AI can shortcut this problem by automatically reading the full history of a given patient, finding the most recent and relevant guideline for their clinical history, and presenting it in context. This session will walk through a solution architecture for an end-to-end solution that does this, using a state-of-the-art healthcare-specific LLM, that can be deployed locally within an organization’s security perimeter to ensure privacy, compliance, and the ability to read organization-specific guideline documents. We’ll also show how to handle formatting of clinical guideline documents that are challenging to general-purpose LLMs like flowcharts, decision trees, and visual decision tables.

 

 

Veysel Kocaman

Chief Technology Officer, John Snow Labs

Top-10 Misconceptions About LLM Judges in Production

Recorded on: Wednesday, December 4 @ 2pm ET

Implementing LLM “judges” in production settings can be a game-changer for evaluating AI behaviors, but it’s often more challenging than it appears. Many teams struggle with common pitfalls such as; high error rates and cost unpredictability to issues with latency and long-term maintenance. This webinar will break down the top 10 misconceptions around LLM judges, equipping you with the insights to avoid these challenges and build more reliable, production-ready evaluation systems.

Join us to learn:
Key Misconceptions: Understand where teams often go wrong in deploying LLM judges.
EvalOps Principles: Discover best practices and tools to operationalize evaluations effectively.
Reliability in Production: Learn how to make evaluation outcomes systematic, scalable, and dependable.
Use Cases & Value: Explore real-world applications where LLM judges add significant value and the tangible results they deliver.

Who Should Attend:
This webinar is ideal for machine learning engineers, data scientists, AI practitioners, and technical leaders seeking to enhance their approach to model evaluation in production environments.

 

Ari Heljakka

Founder & CEO, Root Signals

 

Ouz Gencoglu

Co-Founder & Head of AI, Root Signals

Building Patient Journeys from Multi-Modal Healthcare Data Using Medical Language Models

Recorded on: Thursday, October 31 @ 2pm ET

Data integration has been an enormous challenge in healthcare for decades. Issues of data standardization, data quality, legacy formats, unstructured data, and semantic inconsistencies have made it hard to answer basic questions about how a hospital operates or what should be done next for a patient. Recent advances in Healthcare AI combine to transform this age-old problem – enabling you to automatically ingest large volumes of raw, multi-format, multi-modal, untrusted medical data into coherent longitudinal patient stories in an industry-standard format.

This webinar presents an integrated solution in action that uses John Snow Labs’ state-of-the-art Medical Language Models, healthcare-specific data preparation pipelines, and Text-to-OMOP question answering models running on Databricks’ secure, scalable, and compute-optimized AI platform. The solution takes in multi-modal data – structured (tabular), semi-structured (FHIR resources), and unstructured (free-text) – and generates an OMOP/OHDSI standard data model that:

– Builds a unified view of each patient over time.
– Builds this unified patient view from multi-modal source data.
– Reasons at the patient level.

We’ll then show how the resulting patient data model can then be used for either “AI” (building patient cohorts with natural language queries) or for “BI” (dashboards for patient risk scoring and quality measures), all from the same source of truth, with full explainability and traceability.

 

Kayla Grieme

Senior Solutions Architect – Academic Medical Centers, Databricks

 

Kate Weber

Senior Data Scientist, John Snow Labs

Fireside Chat – Efficiency Unlocked: The Power of Synthetic Data in Modern Enterprises

Recorded on: Wednesday, October 2 @ 2pm ET

Synthetic data has emerged as a powerful tool for organizations looking to enhance their operational efficiency and reduce costs. By generating artificial data that mimics real-world data, companies can overcome limitations related to data scarcity, privacy concerns, and high costs associated with data collection and storage. This conversation will delve into the practical benefits and real-world applications of synthetic data, providing insights on how it can drive significant efficiency gains and cost savings.

 

Varun Nakra

Vice President, Deutsche Bank

 

Raghavan Muthuregunathan

Senior Engineering Manager, Search AI, Linkedin

 

Alex Watson

Co-Founder & CPO, Gretel.ai

Polars and Time Series: What It Can Do, What It Can’t (Yet!), and What You Can Do About It

Recorded on: Tuesday, September 10 @ 12:00pm ET

Time series analysis is ubiquitous in applied data science because of the value it delivers. In order to do effective time series analysis, you need to know your tools well. Polars has excellent built-in time series support, and it’s also possible to extend it where necessary.

In this talk, we will cover:
– Basic built-in time series operations with Polars (e.g. “what’s the average number of sales per month?”).
– Advanced, custom time series operations, and how you can implement them as Polars plugins (e.g. business day arithmetic).
– What Polars can’t (yet!) do, and what we can do about it

Audience level:
Basic interest and knowledge of Python and data will be assumed, but no prior Polars experience is required.

This event is hosted by OpenTeams (https://www.openteams.com), the sponsor of REPOS, a platform that promotes sustainable funding for open source projects. To learn more, visit https://repos.openteams.com.

 

Marco Gorelli

Sr. Software Engineer, Quansight Labs

 

Inessa Pawson

OSPO Manager, OpenTeams

Why You Should Consider Being a Trainer

Recorded on: Thursday, August 22 @ 12:00pm ET

Have you ever considered pursuing a career in training? For our guest speaker, Reuven Lerner, this path was not obvious at first. With a degree in Computer Science, the expectation was a career focused on software engineering and architecture. However, after many years of teaching Python and pandas full-time, he has found both greater financial rewards and a deeper sense of fulfillment and excitement in his work. In this talk, he will share valuable insights into the world of corporate training, his journey into this field, and whether it might be a suitable path for you. The session will conclude with an open Q&A, providing an opportunity for attendees to ask questions and learn more.

Reuven Lerner

Python Trainer, Lerner Python

Open Source Qualified Leads: A Marketing Framework for Commercial Open Source

Recorded on: Wednesday, July 31 @ 12:00pm ET

Commercial open source businesses face a unique challenge compared to their proprietary software counterparts: many potential customers already have access to a free version of their product. Determining which open source users are most likely to be interested in paid support, premium features, or other business offerings can be daunting. In this talk, we present a novel framework that commercial open source businesses can utilize to score leads based on their open source usage. This framework helps them establish a thriving business around their open source project.

Avi Press

CEO & Founder, Scarf

Making Multimodal Generative AI Work

Recorded on: Wednesday, June 17 @ 2:00pm ET

The future of generative-AI is multimodal and yet, projects involving LLM and Large Vision Models in production are already struggling as teams try to scale up (and out) awe-inspiring Twitter demos. We’ll discuss what teams need to be doing to support the development and integration of multimodal models into product features.

Mikiko Bazeley

AI & MLOps Engineer, Labelbox

Fast, Cheap, Scalable: Open-Source LLM Inference with Spark NLP

Recorded on: Wednesday, June 26 @ 2pm ET

Learn how the open-source Spark NLP library provides optimized and scalable LLM inference for high-volume text and image processing pipelines. This session dives into optimized LLM inference without the overhead of commercial APIs or extensive hardware setups. We will show live code examples and benchmarks comparing Spark NLP’s performance and cost-effectiveness against both commercial APIs and other open-source solutions.

Key Takeaways:

  • Learn how to efficiently process millions of LLM interactions daily, circumventing the costs associated with traditional LLM deployments.
  • Discover advanced methods for embedding LLM inference within existing data processing pipelines, enhancing throughput and reducing latency.
  • Review benchmarks that compare Spark NLP’s speed and cost metrics relative to commercial and open-source alternatives.

Danilo Burbano

Software and Machine Learning Engineer at John Snow Labs

Danilo Burbano is a Software and Machine Learning Engineer at John Snow Labs. He holds an MSc in Computer Science and has 13 years of commercial experience. He has previously developed several software solutions over distributed system environments like microservices and big data pipelines across different industries and countries. Danilo has contributed to Spark NLP for the last 6 years. 

Turning Open Source Into a Sustainable Business

Recorded on: Wednesday, June 12 @ 2:00pm ET

Turning open-source projects into sustainable businesses is a crucial challenge for those in the tech industry. This AMA session explores practical strategies for building profitable business models within the open-source landscape. Travis and Peter will talk about various revenue streams such as support services, premium features, dual licensing, and partnerships. Participants will learn how to effectively communicate the value of open-source solutions, engage with communities for support and feedback, and balance open-source ethos with commercial objectives.

Join us and bring your questions for our speakers, Peter Wang from Anaconda and Travis Oliphant from OpenTeams, as they provide insights on transforming open-source initiatives into lasting ventures.

 

Travis Oliphant

CEO, Open Teams and Quansight

 

Peter Wang

Chief AI and Innovation Officer & Co-founder, Anaconda

New State-of-the-art Accuracy for the 3 Primary Uses of Healthcare Language Models

Recorded on: Tuesday, May 29, 2024 @ 2pm ET

This talk presents new levels of accuracy that have very recently been achieved, on public and independently reproducible benchmarks, on the three most common use cases for language models in healthcare:

  • Understanding clinical documents: Such as information extraction from clinical notes and reports; detecting entities, relationships, and medical codes; de-identification; and summarization.
  • Reasoning about patients: Fusing information across multiple modalities (tabular data, free text, imaging, omics) to create a longitudinal view of each patient, including making reasonable inferences and explaining them.
  • Answering medical questions: Answering medical licensing exam questions, biomedical research questions, and similar medical knowledge questions – accurately, without hallucinations, and while citing relevant sources.

David Talby

CTO at John Snow Labs

David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards in 2022 and Game Changers Awards in 2023.

Supporting Open Source in the Post-xz utils Backdoor World

Recorded on: Thursday, May 22 @ 2pm ET

On April 5th, 2024, the open-source community faced a significant challenge when a malicious backdoor was discovered in xz Utils, a popular data compression utility. This incident, triggered by the original maintainer’s burnout and subsequent manipulation by an attacker, underscores the vulnerabilities that arise from insufficient community support and the pressures faced by open-source maintainers.

Armin Nehzat

Chief Product & Marketing Officer
at Thanks.dev

Guiding organisations to accelerate innovation and digital transformation. Over 15 years experience in managing projects and product development life cycles. Strong knowledge of Sales, Marketing, Manufacturing tech stack. Strong knowledge of DevOps in IoT, eCommerce, web and mobile application design.

The 2024 Generative AI in Healthcare Survey

Recorded on: Tuesday, April 30, 2024 @ 2pm ET

This webinar presents key findings from the 2024 Generative AI in Healthcare Survey, conducted in February & March of 2024 by Gradient Flow to assess the key use cases, priorities, and concerns of professionals and technology leaders in this space. Topics covered:

  • Current levels of adoption and budget allocation
  • Types of language models being used
  • Use cases for LLMs
  • Priorities for evaluating LLMs and roadblocks
  • LLM model enhancement strategies
  • LLM testing for Responsible AI requirements

Ben Lorica

The Founder of Gradient Flow

Ben Lorica is founder at Gradient Flow. He is a highly respected data scientist, having served leading roles at O’Reilly Media (Chief Data Scientist, Program Chair of the Strata Data Conference, O’Reilly Artificial Intelligence Conference, and TensorFlow World), at Databricks, and as an advisor to startups. He serves as co-chair for several leading industry conferences: the AI Conference, the NLP Summit, the Data+AI Summit, Ray Summit, and K1st World. He is the host of the Data Exchange podcast and edits the Gradient Flow newsletter.

David Talby

CTO at John Snow Labs

David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards in 2022 and Game Changers Awards in 2023.

The Journey to Healthy Leadership Habits

Recorded on: Thursday, April 18 @ 2pm ET

This webinar dives into the essence of effective leadership and the challenges inherent in translating leadership theory into practice. Acknowledging that most leadership behaviors are deeply ingrained and habitual, the session will explore the complexities of habit formation in a professional context. Drawing on new insights from cognitive sciences, the webinar will dissect why forming new leadership habits is a challenging endeavor and will outline strategies for successful habit creation. Tobi will illustrate this journey through three foundational leadership habits: Being Present, Sharing Context, and Setting Boundaries. Each habit will be discussed in detail, providing you with a roadmap for connecting with colleagues, sharing knowledge and perspectives, and establishing clear principles and boundaries. The session aimed to equip leaders with practical advice and actionable strategies to foster healthy leadership habits that can significantly impact their effectiveness and the well-being of their teams.

Tobi Tonner

Growth Leadership Coach at Workband

Tobi Tonner has a ICF Level 1 Diploma in Coaching which is regarded as the gold standard. I’ve also trained in Situational Leadership II, NeuroLeadership and the Entrepreneurial Operating System. He’s passionated about coaching because I believe it’s a powerful way to unlock the potential of people and organisations.

Monetizing Open Source:
Making a Living from Your OS Product

Recorded on: Thursday, April 10 | 2pm ET 11am PT

This webinar is designed to address one of the most pressing questions for creators and sales professionals in the open-source space: how to generate sustainable revenue from open-source products. It will explore various monetization strategies, discuss the challenges and opportunities inherent in open-source business models, and provide actionable insights for effectively selling and scaling open-source solutions.

 

Wes McKinney

Principal Architect, Posit

 

Peter Wang

Chief AI and Innovation Officer & Co-Founder, Anaconda

 

Travis Oliphant

CEO, Open Teams

 

Carol Willing

President, Willing Consulting

 

Andy Terrel

Principal Technical Product Manager, NVIDIA

John Snow Labs’ Native Integrations with LangChain and HayStack

Recorded on: Wednesday, February 28st @ 2pm ET

Learn to enhance Retrieval Augmented Generation (RAG) pipelines in this webinar on John Snow Labs’ integrations with LangChain and HayStack. This session highlights the ability to retain your existing pipeline structure while upgrading its accuracy and scalability. Accuracy is improved thanks to customizable embedding collection and document splitting. Using Spark NLP’s optimized pipelines greatly improves scalability, runtime speed, and as a result cost.

Learn how these native integrations enable an easy transition to more effective methods, enhancing document ingestion from diverse sources without overhauling existing systems. Whether your goal is to enhance data privacy, optimize NLP & LLM accuracy, or scale your RAG applications to millions of documents, this webinar will equip you with the knowledge and tools to fully leverage John Snow Labs’ software to get it done. Join us to unlock the potential of your applications with the latest innovations in Generative AI, without departing from the familiar toolset of your current pipeline.

Muhammet Santas

Sr. Data Scientist at John Snow Labs

Muhammet Santas holds a Master’s Degree in Artificial Intelligence and currently serves as a Senior Data Scientist at John Snow Labs, where he is an integral part of the Healthcare NLP Team. With a robust background in AI, Muhammet contributes his expertise to advancing NLP technologies within the healthcare sector.

Next-Gen Table Extraction from Visual Documents: Leveraging Multimodal AI

Recorded on: Tuesday, January 30st @ 2pm ET

Explore the latest advancements in multimodal AI for extracting tabular data from visual documents. This session will delve into novel methods implemented in John Snow Labs’ Visual NLP library, which has significantly improved the accuracy of information extraction and question answering from tables in PDFs and image files.

The webinar will cover a range of practical applications, demonstrating how this technology is adept at handling complex documents such as financial disclosures, clinical trial results, insurance rates, lab scores, and academic research. The focus will be zero-shot models, where the AI model directly interprets and responds to queries from source images, eliminating the need for specialized training or tuning.

We’ll also cover Visual NLP capabilities that have been specifically designed to enhance table extraction quality, especially in challenging cases like multi-line cells or borderless tables. We’ll discuss the technical underpinnings of this feature, including the integration of computer vision and object character recognition for detecting tables and individual cells within them. We’ll touch upon how that extends to support for tables with custom borders, dark & noisy backgrounds, uncommon table layouts, multilingual text, and international number & currency formats.

Alberto Andreotti

Senior Data Scientist at John Snow Labs

Alberto Andreotti is a data scientist at John Snow Labs, specializing in Machine Learning, Natural Language Processing, and Distributed Computing. With a background in Computer Engineering, he has expertise in developing software for both Embedded Systems and Distributed Applications. Alberto is skilled in Java and C++ programming, particularly for mobile platforms. His focus includes Machine Learning, High-Performance Computing (HPC), and Distributed Systems, making him a pivotal member of the John Snow Labs team.

Operationalizing Large Language Models

Recorded on: Wednesday, January 31 @ 2pm ET

In this webinar, we’ll be exploring the intricate process of effectively managing and scaling multiple LLM applications. Our focus will extend beyond the basic use of notebook-based methods, diving into more robust and advanced strategies. From the initial development stages right through to deployment, this webinar is tailored to provide a comprehensive understanding of the complexities and nuances involved in operationalizing LLMs at scale.

Key Topics We’ll Cover:

  1. Advanced Management Techniques: Step beyond traditional approaches with insights into state-of-the-art strategies for managing and scaling LLM applications.
  2. Operationalization Challenges: Uncover the critical aspects often missed in the journey from development to deployment, emphasizing the importance of ongoing maintenance and monitoring for sustained success.
  3. Scalability in Practice: Equip yourself with practical solutions to effectively scale LLM applications, ensuring they meet the evolving demands and technological trends in today’s fast-paced environment.

Amanda Milberg

Senior Sales Engineer – Partnerships at Dataiku 

Amanda Milberg (She/Her) is a seasoned Data Scientist and Senior Sales Engineer at Dataiku, specializing in NLP and AI/ML business solutions. With a Bachelor’s degree in Computer Science and Mathematics from Colgate University, she combines deep technical expertise in Java, Python, C, Neo4j, SQL, HTML/CSS, Dash, and JavaScript with a strong acumen for fostering partnerships. As a WiDS Ambassador, Amanda is also dedicated to promoting women’s involvement in data science, making her a respected leader and innovator in the tech community.

Interested in putting your brand in front of the most diverse data science community in the space?