Discover the latest and sharpen your skills in AI, Data Science, Machine Learning, and related topics.

Building a Modern Data Platform

Watch live | April 28 @ 2.00PM EST

Join us as we kick off a new webinar series featuring some of the most impactful tools needed to create a modern data platform — a collection of tools and capabilities that, when brought together, allow organizations to achieve a fundamentally data-driven organization.

The webinar series will feature best practices in building modern data platforms through live case studies, and feature leading players in the modern data stack such as Snowflake, Databricks, Looker, Sisense & Atlan.

To kick the first webinar off, Snowflake & Atlan will be partnering to showcase how users can build a modern data platform using Snowflake’s cloud native data warehouse and Atlan’s modern data catalog & governance product.

Join us on April 28 at 2 PM ET to learn more about:

  • How to get started using Atlan + Snowflake
  • Examples of current customers that are using Atlan + Snowflake
  • Specific enhancements that users can expect through using both platforms

Presented by

Jeremy Maranitch

Senior Alliance Manager at Snowflake

Prukalpa Sankar

Co-Founder at Atlan

Democratizing Satellite Imagery Analysis with Dask

Watch live | April 29 @ 11.00AM ET

Gabe Joseph, the developer of stackstac and an open-source engineer at Coiled with a passion for pictures from space, joins us to discuss Dask’s role in making satellite imagery approachable for anyone to use.

50 years ago, to access satellite images back on Earth, you had to drop a film canister from space and catch it with an airplane. Thanks to cloud hosting and standardized formats, Earth observation data is getting easier to access, but it’s still hard to analyze lots of it. Dask is a logical solution to this problem. Unlike proprietary platforms, like Google Earth Engine, Dask integrates with the Python data science ecosystem—and new tools like stackstac are opening the door to connecting these enormous datasets of global change with the Python ecosystem. Moreover, with a platform like Coiled, it’s easy to scale out your code to the cloud and run it next to the data, letting you process this planet-sized data.

After attending, you’ll know:

– The landscape of geospatial tools in Python, and how they do and don’t fit together
– The limits on Earth-scale data in Dask, and what we’re doing to expand them
– [Sneak peek!] Making an interactive map that renders on-the-fly with Dask, stackstac, and Coiled (aka your own mini Earth Engine)
– The one simple trick to remove all clouds from your satellite imagery that doctors DON’T want you to know!! (lol nope, sorry this doesn’t exist.)

You’ll also learn:

– What a STAC and COG is, why stacking your COGs is hard, and how Dask+stackstac make this easy
– How to search Earth observation data in STAC catalogs
– How to stack lots of GeoTIFFs into a timeseries using stackstac
– Compositing, clipping, and basic analysis of raster timeseries in xarray

Join us on 29th April at 11 am US Eastern time to learn about what could come next for Earth observation in Python!

Presented by

Gabe Joseph

OSS Engineer at Coiled

Finance trends in 2021 – fundraising from an operator’s perspective

Watch live | May 12th @ 4.00PM ET

While each company and each fundraise is unique, fundraising insights are commonly shared among CEOs and VCs and less so between operators. SO much work happens behind the scenes, and there are experiences that can be shared among finance operators to enhance the fundraising process for all of us.

Join a group of finance and accounting professionals as we share experiences, best practices, and discuss the latest trends we’re seeing in the fundraising process. Potential topics include: What do you prepare in your data room? How much do audited financials matter? Do you involve other teams in preparation of materials?

The session will be led by Rose Punkunus and a few leads who have recently helped their companies raise significant rounds. Sign up is required and space is limited.

Moderated by

Rose Punkunus

Founder and CEO at Sudozi

Building the Best AI Infrastructure Stack to Accelerate Your Data Science

Watch live | May 19th @ 11am ET

In recent years, enterprises have invested heavily in AI and Deep Learning initiatives. The three stages of most AI implementations, data ingestion and preparation, model building and training, and inference in production, require Data Science, IT and Ops teams to work in new and sometimes confounding ways. Adding complexity, many organizations are choosing to build AI infrastructure from the ground up using containers and Kubernetes.

Join Data Science Salon as we host NVIDIA, RUN:AI and The AI Center for Value-Based Healthcare (at King’s College London) who will share best practices for building an optimized AI infrastructure stack using NVIDIA GPUs, Kubernetes and Run:AI.

In the webinar, we will walk through an example of a well-architected AI Infrastructure stack and discuss how Kubernetes and advanced GPU scheduling can be used to maximize efficiency and speed up data science initiatives. We’ll cover how one organization solved the orchestration challenges that enterprises face when they begin to scale.


Prof. M. Jorge Cardoso - Associate Professor & Senior Lecturer, King’s College London & CTO, AI Centre for Value-Based Healthcare

The London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare is a consortium of academic, healthcare and industry partners, led by King’s College London. It uses medical images and electronic healthcare data held by the UK National Health Service to train sophisticated computer vision and natural-language processing deep learning algorithms.

Michael Balint, Sr. Product Manager, NVIDIA

NVIDIA’s (NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market and has redefined modern computer graphics, high performance computing and artificial intelligence. The company’s pioneering work in accelerated computing and AI is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others.

Omri Geller - CEO and co-founder of Run:AI

Run:AI provides a cloud-native computer resource management platform for the AI era. With Run:AI, data scientists get access to all of the pooled compute power they need to accelerate AI experimentation – whether on-premises or cloud. The company’s Kubernetes-based platform provides IT and MLOps with real-time visibility and control over scheduling and dynamic provisioning of GPUs – and gains of more than 2X in utilization of existing infrastructure.

Past webinars

Using & Expanding the NLP Models Hub

Watch live: March 10 @ 2 PM ET | 11 AM PT

The NLP Models Hub which powers the Spark NLP and NLU libraries takes a different approach than the hubs of other libraries like TensorFlow, PyTorch, and Hugging Face. While it also provides an easy-to-use interface to find, understand, and reuse pre-trained models, it focuses on providing production-grade state-of-the-art models for each NLP task instead of a comprehensive archive.

This implies a higher quality bar for accepting community contributions to the NLP Models Hub – in terms of automated testing, level of documentation, and transparency of accuracy metrics and training datasets. This webinar shows how you can make the most of it, whether you’re looking to easily reuse models or contribute new ones.

Presented by

Dia Trambitas

Dia Trambitas is a computer scientist with a rich background in Natural Language Processing. She has a Ph.D. in Semantic Web from the University of Grenoble, France, where she worked on ways of describing spatial and temporal data using OWL ontologies and reasoning based on semantic annotations. She then changed her interest to text processing and data extraction from unstructured documents, a subject she has been working on for the last 10 years. She has a rich experience working with different annotation tools and leading document classification and NER extraction projects in verticals such as Finance, Investment, Banking, and Healthcare.

Simplifying MLOps with Model Registry

Recorded on: February 25 @ 2 PM ET | 11 AM PT

To effectively deploy and scale ML models across the development pipeline requires a mix of machine learning, software engineering, and operational skills which is rare to find in a single person or even in a single team. Additionally, organizations with hundreds of models today face the unique challenge arising from the heterogeneity in ML workflows and the siloed nature of these teams.

In this talk, we will discuss the whys and hows around streamlining model release and model management using a Registry;

  • Ensure model reproducibility & portability across local, dev, and prod environments
  • Build transparency by creating a central source of truth for models across their lifecycle
  • Establish best practices around managing model releases & workflows
  • Enforce compliance and governance for models across risk categories

Presented by

Meeta Dash

VP Product at Verta

State-of-the-art Natural Language Processing for 200+ Languages with 1 Line of code

Recorded on: February 18 @ 2 PM ET | 11 AM PT

Learn to harness the power of 1,000+ production-grade & scalable NLP models for 200+ languages – all available with just 1 line of Python code by leveraging the open-source NLU library, which is powered by the widely popular Spark NLP.

John Snow Labs has delivered over 80 releases of Spark NLP to date, making it the most widely used NLP library in the enterprise and providing the AI community with state-of-the-art accuracy and scale for a variety of common NLP tasks. The most recent releases include pre-trained models for over 200 languages – including languages that do not use spaces for word segmentation algorithms like Chinese, Japanese, and Korean, and languages written from right to left like Arabic, Farsi, Urdu, and Hebrew. All software and models are free and open source under an Apache 2.0 license.

This webinar will show you how to leverage the multi-lingual capabilities of Spark NLP & NLU – including automated language detection for up to 375 languages, and the ability to perform translation, named entity recognition, stopword removal, lemmatization, and more in a variety of language families. We will create Python code in real-time and solve these problems in just 30 minutes. The notebooks will then be made freely available online.

Presented by

Christian Kasim Loan

Data Scientist and Spark/Scala ML engineer

Credit Risk – Why Model Fairness is Needed

Recorded on: October 28th @ 2PM ET

In this talk, Jen Burningham, Jorge Castañón and Rakshith Dasenahalli will discuss the applications of Machine Learning and AI Fairness techniques in credit risk models for banking institutions. They will cover the typical use cases and the approaches the Data Science Elite (DSE) team used to address the challenges and necessity in mitigating model bias. This session will introduce the Credit Risk accelerator that was developed based on the experience of the DSE team.

Presented by

Jen Burningham

Jen Burningham is a manager, Senior Data Scientist, and Machine Learning Engineer in the IBM Data Science and AI Elite Team. She has a strong background in healthcare, tech, and startups. Jen is passionate about using AI to improve businesses and lives. She received her Bachelor’s Degree in Applied Mathematics with Concentration in Statistics from Yale University and earned her MBA from Yale School Of Management.

Rakshith Dasenahalli

Rakshith is a Data Scientist with the IBM Data Science Elite (DSE) Team and has 3 years of experience on the team. He has worked with various enterprises across different industries helping them understand and use their data to improve their businesses. He has experience working in Machine Learning, Data Visualization, and AIfairness customer engagements. He holds a M.S. in Electrical Engineering from University of Southern California.

Jorge Castañón

Jorge Castañón hails from Mexico City and received his Ph.D. in Computational and Applied Mathematics from Rice University. He has a genuine passion for data science and machine learning business applications. For 15+ years he has been developing data science and AI models as well as algorithms to solve numerical optimization and regularized inverse problems. At IBM, Jorge is the Lead Data Scientist of the Machine Learning Hub North America, a team that empowers organizations to create business value through data science and AI. In 2020, Jorge was certified as a Level 3 Thought Leader and Distinguished Data Scientist by The Open Group.

Euphoria Convenience

RECORDED ON: September 3rd @ 2pm EDT

Euphoria Convenience is big data and predictive analytics perfectly attuned: it means satisfying goods and services arriving even before consumers sense that they want them. Already we see slivers of it on Netflix when the next, curated video begins even before the previous one ends. We experience it on Amazon when suggestions occasionally fulfil an urge we didn’t know we had.

The key to perfected entertainment and service is our personal information, and the underlying human dilemma is about how much of ourselves we should expose to get what we want. In this webinar, real world cases and philosophical thought experiments will help us investigate by exploring questions including: What is privacy? Why is Kim Kardashian the most private person in the world? Is it more satisfying to want and then get, or to get before wanting? Answers are elusive, but the goal is to identify and clarify the ethical dilemmas arising when personal information is gathered and leveraged in the media, entertainment, marketing and retail sectors. This webinar is tailored for professionals in data governance, but open to all those interested in the regulatory and human dilemmas inhabiting privacy and AI powered convenience.

Presented by

James Brusseau

James Brusseau (PhD, Philosophy) is author of books, articles, and digital media in the history of philosophy and ethics. He has taught in Europe, Mexico, and currently at Pace University near his home in New York City. As Director of AI Ethics Site, a research institute currently incubating at Pace University, he explores the human experience of artificial intelligence.

Breaking into AI: ML in the Real World

WATCH LIVE: August 20th @ 2pm EDT

Do you want to build a career in the field of AI? Join the session where 2 IBM data scientists from Data Science and AI Elite Team will share with you their journey to AI, the challenges they faced, interview tips and useful resources, along with their experience applying ML to real-world client problems by leveraging their earlier experiences.

Presented by

Safura Suleymanova - Data Scientist at the IBM Data Science & AI Elite Team

Safura Suleymanova is a Data Scientist at the IBM Data Science and AI Elite Team and develops Machine Learning powered solutions for customers across various industries, including finance, telecom, supply chain, among others. She feels passionate about how data can be used to streamline more routine processes ultimately improving business decisions. To improve existing operational processes at her previous workplaces she started leveraging the data, which led her to the field of Data Science. Safura has a bachelor’s degree in Mathematics, with a specialization in Risk Management and Statistics from University of Waterloo.

Aishwarya Srinivasan - Data Scientist, Data Science Elite Team, IBM Data & AI

Aishwarya joined DSE Team in June 2018 as a Data Scientist. She is a researcher in machine learning and reinforcement learning, and she strives for innovation. During her engagements, she primarily focuses on bringing in machine learning research to create business value.  

With her interest in integrating machine learning in finance, she has presented her work to leading Financial firms like Morgan Stanley, Credit Suisse and World Bank. She is very focused on expanding her horizons in the machine learning research community including her recent Patent Award won in 2018 for developing Reinforcement Learning model for Machine Trading. She is a post-graduate in Data Science from Columbia University.  

She is an ambassador for Women in Data Science community, originating from Stanford University. She has 50k+ follower base on LinkedIn and actively participates/organizes in events and conferences to inspire budding data scientists.  Besides being a data junkie, she is a fitness fanatic who is into martial arts (Krav Maga) and yoga. To spread her knowledge in the space of data science, and to inspire budding Data Scientists, she actively writes blogs related to machine learning in her website

Dos and Don’ts of Hosting Engaging Virtual Events (and the tech stack to do it)

Recorded on: July 23 @ 2PM ET

Martech—the tools marketing teams are using to empower their digital marketing—has become a key component in almost any organization. But only recently did martech’s long lost cousin gain notoriety: the event tech stack. While in-person events have come to a halt, event planners are being required to quickly pivot and create an engaging virtual experience.

Join us for a live webinar on Thursday, July 23 at 2 PM ET to learn how to:

  • Engage your community and create a first-class virtual experience while maintaining the human component.
  • Evaluate your event tech stack including your event platform, breakout sessions, and networking applications
  • Integrate your entire tech stack to create a cohesive and engaging experience.
  • How to use your tech stack to make your event more visible to your community.

Julia Mohler

Director, Marketing & Operations at Formulatedby

Lyssa Myska Allen

Social Media & Community Manager at Formulatedby

Using Python with a Massively Parallel Database to Predict COVID-19 Numbers

RECORDED ON: July 7th @ 11am PDT

Python is a powerful programming language that is a good choice for many types of analytics. It is rapidly becoming the language of choice for scientists and researchers of many types. Now by combining a massively parallel (MPP) database like Vertica with Python, you can overcome many scale and analytics challenges that can limit Python users. Join this session to see a live demo of Vertica’s built-in integration with Python to measure and predict key numbers of the COVID-19 pandemic by using historical data on the number of cases and fatalities per state to predict the results for the following day.

Presented by

Badr Ouali, Data Scientist at Vertica

Statistical Paradoxes & Logical Fallacies: Don’t Believe the Lies your Data Tells

Recorded on: June 30th @ 11am PDT

I hate to admit it, but your data is lying to you — and more often than you think. Having clean data with high volume, velocity, and variety doesn’t necessarily protect one from the possibility of reaching faulty conclusions to research questions of interest. Despite what you may have learned in Statistics 101, a significant p-value isn’t always groundbreaking. All data can be coerced and bribed to tell any story; thus, as data practitioners, it’s our duty to be cognizant of the possible pitfalls that abound and how to navigate around common traps — responsibly.

By showcasing enterprise use cases and sample projects in Dataiku’s Data Science Studio, Christopher Peter Makris will address the following questions:

– Is more data always better? How can the inclusion or exclusion of data obfuscate a previously held conclusion?

– Is an observed event truly a trend? How can previously noted behaviors be a marker for the complete opposite behavior in the future?

– Is an association worth my time/money/effort? When do conclusive conclusions lead us not to act on a valid association?

Presented by

Christopher Peter Makris, Lead Data Scientist at Dataiku

Christopher Peter Makris (CPM) is a Lead Data Scientist at Dataiku. With a background in Logic, Discrete Mathematics, & Statistics, he brings experience from both industry and academia. CPM previously stood as Director of Data Science at the NYC Data Science Academy and Executive Director of the Master’s of Statistical Practice Program at Carnegie Mellon University; at both institutions, CPM took pride in professionally developing students, aiding in the redesign of technical curricula, and delivering over 3,000 hours of lectures to graduate audiences. In industry, CPM most recently helped thwart cybercrime as the Director of Security Data Science at Empower Retirement. When not crunching numbers on his computer, CPM is crunching his abs in the studio as he’s an avid dancer, choreographer, and CrossFitter (so, while he considers himself a data scientist, he can only count to 5-6-7-8…)

Choosing the Right Deep Learning Framework: A deep learning approach

Recorded on: May 20th @ 2pm EST

Deep Learning Frameworks allow for the easy construction of neural networks and have made Python the go-to language for Artificial Intelligence development. This webinar will show how deep learning can be used to help align developers and data scientists to a particular framework as well as demonstrate some examples of how IBM is extending the value of deep learning frameworks via open source projects, and how new advancements in NLP like transformers are changing how and why to use a particular framework.

Presented by

Nick Acosta, IBM

Before becoming an AI Advocate at IBM, Nick studied Computer Science at Purdue University and the University of Southern California, and was an HPC Consultant for Hewlett-Packard in Grenoble, France. He now specializes in machine learning and utilizes it to understand machine learning developers of various communities, startups, and enterprises in order to help them succeed on IBM’s data science platform and in open source. He has a strong interest in open source software and all things Kardashian.

Everything You Need to Know to Start Data Science with Python

RECORDED ON: May 14th @ 11am PST

Python is becoming a common language between developers and data scientists, which is why so many of the open-source tools and libraries are built for Python. However, navigating through the hundreds of thousands of available packages for data science and machine learning (DS/ML) can be a daunting task.

Are you ready to benefit from Python for data science? Join us to discover new and innovative ways to enhance your models, algorithms, and visualizations.

During the webinar, we’ll discuss:

  • Popular open-source tools for DS/ML
  • Use cases
  • Helpful resources to get started with data science with Python

Presented by

Stan Seibert - Sr. Director, Community Innovation at Anaconda

Stan leads the Community Innovation team at Anaconda, where his work focuses on high performance GPU computing and designing data analysis, simulation, and processing pipelines.He is a longtime advocate of the use of Python and GPU computing for research. Prior to joiningAnaconda, Stan served as Chief Data Scientist at Mobi, where he worked on vehicle fleet tracking and route planning.Stan received a PhD in experimental high energy physics from the University of Texas at Austin,and performed research at Los Alamos National Laboratory, University of Pennsylvania, and theSudbury Neutrino Observatory.

Explainable Data Science Workflows


Learn explainable workflows using open source software, semi-automated pipelines, and Python! From data ingestion, cleaning, piping, and modeling, explainability and trust are at the forefront of enterprise data science initiatives. In this talk, learn the best practices for codifying and relaying explainable data science to stakeholders, management, and the end user in a reproducible and portable fashion.

Presented by

Austin Eovito - Data Scientist, IBM

Austin is a Data Scientist on the Technical Marketing and Evangelism team in San Francisco, California. As a recent graduate student of Florida State University, Austin is focused on the balancing the bleeding-edge research produced by academia and the tools used in applied data science. His Masters thesis was on White Collar Crime using Time-aware Joint-Topic-Sentiment Analysis (TTS). Austin leads IBM’s Data Science Masterclass on AI Explainability and currently resides in San Francisco, with his fiance, dog, and two cats.

Interested in putting your brand in front of the most diverse data science community in the space?