Upcoming webinarswatch recordings

Past webinars

Designing Custom ML Pipelines with AWS SageMaker

Recorded on: November 23, 2022 @ 2pm ET

Using Amazon SageMaker to train, test and deploy models is an efficient way of handling the machine learning lifecycle. Being an iterative process, using SageMaker allows data scientists to manage the entire machine learning pipeline with autoscaling provisions, advanced data security, data monitoring, model monitoring, high performance and low cost ML development. Compared to local or on demand set up, using SageMaker can save up to 67% of total resource utilization.

However, deploying custom machine learning models in the cloud such as AWS can be daunting. First, there is not much documentation available for individual use cases and secondly, each model is different.

This tutorial will walk you through the framework for deployment of any custom model in SageMaker using Docker. It will cover how to deploy the model as a Rest API and how to test it through Postman to showcase the model to customers and stakeholders.

What you’ll learn in this webinar:

  1. How to create custom ML models in AWS SageMaker
  2. How to deploy the model as a REST API
  3. All you need to know about AWS services such as SageMaker, ECR (Elastic Container Registry), Lambda, API Gateway and non AWS Services- Docker

Presented by

Swagata Ashwani - Senior Data Scientist at Boomi

Swagata Ashwani is a Senior Data Scientist at Boomi where she focuses on deploying large scale models in production.

She has created various custom models in the integration space in SageMaker, and realized that although this should be a common problem, it is something that is not straightforward. While SageMaker supports bringing your own custom model to life, the process involves knowledge of a lot of concepts.

Swagata loves to talk about the possibilities of AI in the future and is keen on making an impact in the world. She believes in collaboration and sharing all the knowledge and journey with like minded professionals.

Outside of work, she is an avid reader, dancer and found sipping a hot cup of masala chai at all other times.

Data Mesh: From Concept to Reality

Recorded on: November 16, 2022 @ 2PM EST

Join this webinar and learn how BairesDev is building data mesh from the ground up with Databricks and Monte Carlo!

The data mesh is one of the buzziest topics in the modern data organization – and for good reason. By treating data-as-a-product and embedding data teams across your different domains, data mesh architectures can help organizations more directly tie data to immediate business value.

After realizing their existing data infrastructure could not support the company’s growth, data volume, and complex pipelines and workflows, Bairesdev built their data mesh architecture from the ground up in 2021.

But, it’s not easy. Data mesh implementation can involve uncharted territories and extending new technologies in ways that others have not yet navigated. Learn how BairesDev, the leading nearshore technology solutions company, is implementing data mesh in their organization of over 5,000 engineers across 36 countries to help accelerate digital transformation in Fortune 500 companies and leading brands.

If you’re a data leader that’s considering or in the middle of implementing a data mesh architecture, you won’t want to miss this webinar to learn about:

  • How data mesh can support an organization under rapid growth with complex data workflows
  • How the BairesDev team evaluated infrastructure solutions, including Databricks and Monte Carlo, for their data mesh architecture
  • The challenges and learnings the BairesDev experienced during their implementation

Matheus Espanhol

Data Engineering Manager at BairesDev

Matheus Espanhol is Head of Data Engineering at BairesDev. He has over 15 years of experience with a background in data lakes, data warehouses and mission critical databases. At BairesDev, his current challenge is providing a big data platform for the enterprise, enabling large scale data processing across different products to support the company growth.

Jon So

Head of Product Marketing at Monte Carlo

Jon leads product and partner marketing at Monte Carlo. Prior to Monte Carlo, Jon led product marketing, pricing strategy, and growth teams at Twilio Segment, Nauto, and Oracle | Opower, and advised Fortune 500 companies while working in Deloitte Consulting’s Strategy & Operations practice. He resides in San Francisco, CA, and graduated from the University of North Carolina at Chapel Hill.

Jason Pohl

Principal Solution Architect, Data Management Lead at Databricks

Jason is a Principal Solution Architect with Databricks and leads the Data Management Specialist group. He helps customers be successful in all aspects of data management. Jason’s career spans architecting legacy data warehouse solutions to modern data lakehouses in the cloud.

Zero-Shot Learning of Healthcare NLP Models

Recorded on: November 2, 2022 @ 2:00 PM ET

Zero-Shot Learning (ZSL) is a new paradigm that has gained massive popularity recently due to its potential of reducing data annotations and high generalisability. In the pursuit of bringing product-ready latest ML research to our community, we have implemented ZSL for two major tasks in Spark NLP for Healthcare: Named Entity Recognition (NER) and Relation Extraction (RE).

In this session, we will explore ZSL models available as part of Spark NLP for healthcare library, how to use them using automatic prompt generation using Q&A models, and finally, how they perform on real data and help reduce data annotation requirements.

Presented by

Hasham Ul Haq - Machine Learning Engineer at John Snow Labs

Hasham Ul Haq is a Data Scientist at John Snow Labs, and an AI scholar and researcher at PI School of AI. During his carrier, he has worked on numerous projects across various sectors, including healthcare. At John Snow Labs, his primary focus is to build scalable and pragmatic systems for NLP, that are both, production-ready, and give SOTA performance. In particular, he has been working on Span detection, Natural Language Inference, disambiguation, Named Entity Recognition, and a lot more! Hasham also has an active research profile with publications in NeurIPS, AAAI, and multiple scholarship grants and affiliations.

Prior to John Snow Labs, he was leading search engine and knowledge base development at one of Europe’s largest telecom providers. He has also been mentoring startups in computer vision by providing trainings and designing ML architectures

How to set up a Scalable Data Quality Solution

Recorded on: September 27 @ 2PM EST

As companies are continuing to explore new data-driven initiatives, integrating more business applications into their data lake or lakehouse, and implementing new data-focused tooling to power business decisions, the imperative to create reliable data infrastructure becomes paramount. In this demo, we’ll explore Monte Carlo’s approach to identifying where data incidents occur, whether across your entire data stack or down to the field level, so that you can mitigate data downtime to increase trust in your team’s work and provide quality data to your business.

Presented by

Ethan Post - Sales Engineer at Monte Carlo

Ethan Post is a Sales Engineer at Monte Carlo, a data reliability company, where he helps data teams achieve more trustworthy data with end-to-end data observability. Ethan began his career as BI implementation consultant, building end-to-end data solutions for some of the most recognizable brands in the world. Most recently, he led the PreSales Center of Enablement for an analytics startup where he witnessed firsthand the true cost of poor data quality and observability. He received his MS in Information Systems from Indiana University. In his spare time, he loves cooking, woodworking, and spending time with his wife and two young children.

Brandon Chen - Product Marketing at Monte Carlo

Brandon is currently part of the Product Marketing team at Monte Carlo. Prior to Monte Carlo, Brandon had experience working in both the Sales Engineering and Product Marketing teams at Fivetran. He currently resides in Oakland, CA, and is an avid fisherman in his off-time.

Data Masking, Redaction & Obfuscation: How to Anonymize Data at Scale

Recorded on: September 13 @ 2PM EST

Join this webinar to learn about different ways to anonymize sensitive data when working with data at scale.

Ben Herzberg, Chief Scientist at Satori, will present different types of data anonymization, such as data masking, redaction, obfuscation and more.

In addition, you will hear some of the challenges in applying anonymization when data is accessed by many users in modern environments, and how to overcome them.

Presented by

Ben Herzberg - Chief Scientist at Satori

Ben Herzberg is an experienced tech leader and book author with a background in endpoint security, analytics, and application & data security.

Ben filled roles such as the CTO of Cynet, and Director of Threat Research at Imperva.

Currently, Ben is the Chief Scientist for Satori, the DataSecOps platform.

How to Eliminate Data Downtime & Start Trusting your Data

Recorded on: June 29 @ 2PM EST

Broken data is costly, time-consuming, and nowadays, an all-too-common reality for even the most advanced data teams. In this talk, we will introduce this problem, called “data downtime” — periods of time when data is partial, erroneous, missing or otherwise inaccurate — and discuss how to eliminate it in your data ecosystem with end-to-end data observability. Drawing corollaries to application observability in software engineering, data observability is a critical component of the modern DataOps workflow and the key to ensuring data trust at scale. We will share why data observability matters when it comes to building a better data quality strategy and highlight tactics you can use to address it today.

Presented by

Jon So - Head of Product Marketing at Monte Carlo

Jon So leads product and partner marketing at Monte Carlo. Prior to Monte Carlo, Jon led product marketing, pricing strategy, and growth teams at Twilio Segment, Nauto, and Oracle | Opower, and advised Fortune 500 companies while working in Deloitte Consulting’s Strategy & Operations practice. He resides in San Francisco, CA, and graduated from the University of North Carolina at Chapel Hill.

Katie Noonan - Support Engineer at Monte Carlo

Katie Noonan leads the technical support department at Monte Carlo. Prior to Monte Carlo, Katie specialized in technical enablement for enterprise accounts, and product process and operations at Twilio Segment. She currently resides in San Francisco, CA, and graduated from the University of Oregon.

Zero Shot Learning for Semantic Relation Extraction from Unstructured Text

Recorded on: June 22 @ 2PM ET

Relation Extraction, which is one of the most important tasks of NLP applications in healthcare, is an expensive process to find competent people who can label the data and label the data in order to train the models. By using the Zero-Shot Learning method, which has recently been used in the field of NLP, it has now become possible to train Relation Extraction models without the need for data labeling. In this presentation, we will explain how to use the Zero-Shot Learning method for Relation Extraction in unstructured texts.

Presented by

Muhammet Santas - Data Scientist at John Snow Labs

Muhammet Santas has a Master’s Degree St. in Artificial Intelligence and works as a Data Scientist at John Snow Labs as part of the Healthcare NLP Team.

Automating ML Pipelines with Production-First Data

Recorded on: WEDNESDAY, JUNE 1 @ 12PM ET

As teams look to build and deploy models into production, they need tools that can adequately scale with them. In particular, the tools they need must allow them to quickly monitor, segment, retrain, and experiment on the data.

Join us as the team at Pachyderm and Superwise discuss:

  • What is MLOps and why is data critical for it
  • How to architect a scalable and automated platform
  • Why your team should adopt a Production-First data approach.

Presented by

Oryan Omer - Lead Software Engineer at Superwise

Oryan Omer is on the lead software engineering team at Superwise with 7 years experience developing ML products. Oryan has also led an elite unit of engineers for Israel Defense Forces. In recent years, the focus has been on MLOps solutions to simplify the ML Life cycle. In his spare time, he surfs on every board there is (Wave surfing, snowboarding, carving, etc..)

Harpreet Sahotar - Developer Relations at Pachyderm

Harpreet is part of the Developer Relations team at Pachyderm, where he works across marketing, content, evangelism, and product teams. He’s also host of The Artists of Data Science podcast, father, avid reader, perpetual learner, craft beer snob.

Using graph-based features to get a quantum leap in fraud detection rates

Recorded on: May 5th @ 2PM EST

Sophisticated financial criminals are constantly changing their tactics.

AI-based detection systems, dependent on large training sets of historical data, may be good at catching the low-value frauds but may struggle to pick up on the newest highest-risk approaches.

In that case, you need a system that gives your seasoned fraud professionals an easy and quick way to pull together a 360-degree view of the people, accounts, transactions and communications of suspicious actors, so you can react before they take too much.
This talk will explore how graph databases can help improve your AI-based detection systems and give you tools for intelligent investigation, transforming your ability to respond to fraud and sending a strong message to organized criminals to leave your bank alone.

Presented by

Harry Powell

Harry Powell is Head of Industry Solutions at TigerGraph. He joined TigerGraph from Jaguar Land Rover where he spent four years running the Data and Analytics business, where the team contributed $800 million profit over four years. Prior to that, he was the Director of Advanced Analytics at Barclays which released a number of world-class data science innovations to production, including the first Apache Spark application in the European financial services industry. Harry has been working with graph data on and off for 10 years. His team at Barclays built a number of graph applications, and at JLR, he was an early adopter of TigerGraph, using a graph database to solve supply chain, manufacturing and purchasing challenges at the height of Covid shutdown and the semiconductor shortage.

Supreet Kaur

Supreet Kaur is working as an Assistant Vice President at Morgan Stanley where she is the product owner of various AI-driven products. Prior to this, she was working at ZS where she developed various technology-driven solutions to automate workflows and drive business decisions for top pharmaceutical clients.

She is the founder of DataBuzz which is a volunteer-driven organization aimed at engaging the tech and AI enthusiasts through mentorship and educating them about the latest tech and AI trends through fireside chat series with leaders from the tech industry, and blogs, etc. She is also the US-Analytics Lead for Women in AI and a mentor at Rutgers University. Columbia University and Weavr.She is also an ardent writer on Medium and has written various blogs to inspire others who also want to pivot in the technology industry.

Beyond the Data Stack: Creating a Modern Data Experience

Recorded on: April 19th @ 2PM EST

A lot of ink has been spilled on the modern data stack. Everyone talks about its architecture, it’s tools, and how to organize teams to support it. As the data community, it’s easy to be excited about this impressive new technology.

Succeeding with the modern data stack, however, goes far beyond a collection of architectural diagrams; it’s an experience. It’s rushing to answer your CEO’s urgent question just before the meeting; it’s spending more time trying to verify that a dashboard is accurate than talking about what the dashboard actually says; it’s piecing together this month’s board deck from disconnected BI reports, SQL queries, and Excel files.

An experiential roadmap provides a solution to this problem. It helps companies move from building the modern data stack to creating the modern data experience, which is key to use data effortlessly.

In this webinar, Benn Stancil, Co-Founder and Chief Analytics Officer at Mode, will explain why the modern data stack isn’t sufficient enough and how companies can overcome the challenges of creating sustainable user experiences. He’ll lay out a vision for the modern data experience, along with some actionable, implementable recommendations.

In this webinar you’ll learn:

  • How to establish a data culture that facilitates better automation, better understanding between cross-functional teams, and more effective workflows.
  • How to merge Business Intelligence (BI) and data science.
  • Why it’s important to make status and trust explicit and be transparent about where the data is coming from.
  • Why sharing the same business logic is crucial when creating a more seamless experience for people who are consuming the data.

Presented by

Benn Stancil

Benn Stancil is an accomplished data analyst with deep expertise in collaborative Business Intelligence and Interactive Data Science. Stancil is Co-founder, President, and Chief Analytics Officer of Mode, an award-winning SaaS company that combines the best elements of Business Intelligence (ABI), Data Science (DS) and Machine Learning (ML) to empower data teams to answer impactful questions and collaborate on analysis across a range of business functions. Under Stancil’s leadership, the Mode platform has evolved to enable data teams to explore, visualize, analyze and share data in a powerful end-to-end workflow. Prior to founding Mode, Stancil served in senior Analytics positions at Microsoft and Yammer, and worked as a researcher for the International Economics Program at the Carnegie Endowment for International Peace. Stancil also served as an Undergraduate Research Fellow at Wake Forest University, where he received his B.S. in Mathematics and Economics. He believes in fostering a shared sense of humility and gratitude

Model Serving Event with AI Infrastructure Alliance

Recorded on : April 7 @ 11am-1pm ET

Join us on Thursday, April 7, 2022, 11am-1pm ET for the second edition of the new event series by AI Infrastructure Alliance, focusing on model serving.

Learn how to serve models in simple and complex environments, see the difference between batch and online, sunset old models, use shadow and canary deployments, monitor the models for speed and latency and much, much more.

This month’s edition will feature four lightning talks and one 1-hour presentation including DEMOS and CODE, which means less slides and more hands on.

Clayton Davis

Head of Data Science at Modzy

Saumil Dave

Head of ML Engineering at Modzy

Brennan Smith

Sales Engineer at Iguazio

Juha Kiili

Product Lead at Valohai

Ed Shee

Head of Developer Relations at Seldon

Leveraging Software Engineering Secrets in Your Data Science Workflows

Recorded on: March 29th @ 2PM EST

Python is used daily by millions of engineers, scientists, analysts, and other people trained in fields outside of computer science and software engineering. If you are one of those people, you probably appreciate that Python helps you get your job done. Still, you may wonder what the “real” software engineers might know that would help you do an even better job.
In this talk, we discuss the career journey of Matt Kramer, Sr. Software Engineer at Anaconda, from science and engineering into Python software development; and how the tools and practices of software engineering would have helped his original workflows as a domain expert. He’ll show you how version control, packaging, and automated testing aren’t just for software engineers; they can help everyone tame their workflows. He’ll also share some Python tools from that let you easily share your results with other stakeholders, without spending all your time writing plotting code and without having to become (or hire!) a front-end engineer.

Presented by

Matt Kramer

Matthew Kramer is a Senior Software Engineer at Anaconda. Previously he spent eight years working in the Energy sector for Chevron as an engineer and leading a team in Chevron’s Surface Data and Insights Platform. Prior to joining the industry, he received his BSE, MSE, and Ph.D. in Naval Architecture & Marine Engineering from the University of Michigan. During the Ph.D. program, he studied the fluid-structure interaction of highly-flexible membrane structures interfacing with the ocean surface (hovercraft skirts). This work involved several disciplines related to computing, including Computational Fluid Dynamics and Finite Element Methods. During this time (c. 2011), he began using Python for data processing and automation of engineering analysis workflows.

So, You Migrated to Snowflake. Now What?

Recorded on: March 24th @ 1PM EST

Your data team migrated to Snowflake. Congratulations! What happens next?

Snowflake has revolutionized what’s possible for data analytics and engineering, but it takes much more than lightning-fast queries and flexible compute to drive results. At the end of the day, if you can’t operationalize and trust your data, your best laid plans for innovation and access are all for naught.

Join Monte Carlo and Snowflake to learn how best-in-class data teams are making the most of their Snowflake adoption with data quality, governance, and reliability in mind. We’ll demonstrate the critical features and functionalities in the Snowflake Data Cloud and Monte Carlo’s end-to-end data observability platform that will help you maximize impact with Snowflake by giving your company confidence in the reliability of your data and the products it powers.

Topics include:

  • Snowflake Data Quality metrics
  • Automated end-to-end lineage from Snowflake to the BI layer
  • Data discovery
  • Data auditing
  • PPI classification
  • How to set SLAs, SLIs, and SLOs for Snowflake data pipelines
  • Operational analytics for your data platform
  • And much more

Presented by

Keith Smith

Keith Smith works as a Senior Partner Sales Engineer at Snowflake, helping Technology Partners integrate their solutions so our customers can easily get data into Snowflake and insights out of Snowflake by creating a single copy of data for their cloud data analytics strategy. Prior to this role, he worked as a Senior Solution Architect within Snowflake Professional Services. After leading several Quick Start client engagements, he worked with several large clients focusing on SQL optimization for various financial reporting workloads moving to Snowflake, where he gained a new appreciation for query profile diagrams, “most expensive nodes”, micro-partitions, and pruning statistics. Keith holds degrees from Brigham Young University (BS in Business Management, MIS emphasis) and Purdue University (MBA)

Presented by

Glen Willis

Experienced Solutions Architect with a demonstrated history of success working in SAAS, specifically the Analytics space. Skilled in Python, JQL, Analytics, and bridging the gap between customer feedback and Product. Master’s degree in Product Development Engineering from University of Southern California.

Bridging the ML Gap: How to Get Your Team from Prepare to Deploy

Recorded on: March 16th @ 2PM EST

As companies invest more heavily in AI/ML projects, teams will need to move rapidly from experimentation into production. But even with new advances in technology and processes, many teams struggle to get their projects into the later stages of the Machine Learning Lifecycle.

Join us as representatives from the AI Infrastructure Alliance (AIIA), Pachyderm, Superb AI, and WhyLabs discuss the big trends in MLOps, useful strategies for accelerating model development, and best practices all teams can incorporate.

Presented by

Lee Baker

Lee is the General Secretary at the AI Infrastructure Alliance. Based out of the UK, he is responsible for the proceedings and growth of the AI Infrastructure Alliance (AIIA) as well as working at Pachyderm to deploy programs and initiatives that enable customer-facing teams to execute the core aspects of their jobs more effectively. When not shuttling his 3 children around, he can most often be found cycling, running and swimming around England’s South Coast.

Jimmy Whitaker

Jimmy Whitaker is the Data Science Evangelist at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.”

Danny Leybzon

Danny D. Leybzon has worn many hats, all of them related to data. He studied computational statistics at UCLA, before becoming first an analyst and then a product manager at a big data platform named Qubole. He went on to be the primary field engineer for data science and machine learning at Imply, before taking on his current role as MLOps Architect at WhyLabs. He has worked to evangelize machine learning best practices, talking on subjects such as distributed deep learning, productionizing machine learning models, automated machine learning, and lately has been talking about AI observability and data logging. When Danny’s not researching, practicing, or talking about data science, he’s usually doing one of his numerous outside hobbies: rock climbing, backcountry backpacking, skiing, etc.

James Le

James Le currently runs Data Relations at Superb AI, a Series A ML data management startup. As part of his role, James executes content and partnership initiatives – while working cross-functionally with growth, product, customer success, sales, marketing, and community functions to drive Go-To-Market strategy.

Before joining Superb AI, he completed his Computer Science Master’s degree at RIT, where his research thesis lies at the intersection of deep learning and recommendation systems. Outside of work, he is highly active in the broader data and ML community – writing data-centric blog posts, hosting a data-focused podcast, and organizing in-person community events.

Deeper Clinical Document Understanding Using Relation Extraction

Recorded on: March 16th @ 2PM EST

Recognizing entities is a fundamental step towards understanding a piece of text – but entities alone only tell half the story. The other half comes from explaining the relationships between entities. Spark NLP for Healthcare includes state-of-the-art (SOTA) deep learning models that address this issue by semantically relating entities in unstructured data.

John Snow Labs has developed multiple models utilizing BERT architectures with custom feature generation to achieve peer-reviewed SOTA accuracy on multiple benchmark datasets. This session will shed light on the background and motivation behind relation extraction, techniques, real-world use cases, and practical code implementation.

Presented by

Hasham Ul Haq

Hasham Ul Haq is a Data Scientist at John Snow Labs, and an AI scholar and researcher at PI School of AI. During his carrier, he has worked on numerous projects across various sectors, including healthcare. At John Snow Labs, his primary focus is to build scalable and pragmatic systems for NLP, that are both, production-ready, and give SOTA performance. In particular, he has been working on Natural Language Inference, disambiguation, Named Entity Recognition, and a lot more! Hasham also has an active research profile with a publications in NeurIPS, AAAI, and multiple scholarship grants and affiliations.

Prior to John Snow Labs, he was leading search engine and knowledge base development at one of Europe’s largest telecom providers. He has also been mentoring startups in computer vision by providing trainings and designing ML architectures

Defining Reliability: SLAs for Data Teams

Recorded on: February 23rd @ 2PM EST

Reliable and trustworthy data is critical for any data-driven company, but it can be a big challenge when there are multiple teams each producing, transforming, and consuming data from one another simultaneously. Data teams are often caught in the middle, struggling to keep everyone happy. SLAs can remedy this challenge, serving as a powerful tool for aligning on the definition of quality and who’s responsible for addressing issues when they arise.

In this webinar, Kyle Kirwan, CEO of Bigeye and Uber’s first metadata product manager, will discuss the power of data SLAs, and how data teams can use them to align with their stakeholders and increase reliability.

In this presentation, find out:

  • The challenges posed when data quality hasn’t been quantified
  • How to build SLAs for data including creating SLIs and SLOs
  • Using SLAs as a prioritization tool
  • Measuring success and using SLAs to communicate reliability

Presented by

Kyle Kirwan

Bio Coming Soon.

DataSecOps – What it is and how it keeps your data secure and compliant

Recorded on: February 22nd @ 2PM EST

DataSecOps is an evolution in the way organizations treat security as part of their data operations. It is an understanding that security should be a continuous part of the data operations processes and not an afterthought.

As companies grow and data users expand, the need to secure sensitive personal information, including key areas such as health and financial data, becomes increasingly critical. To ensure this data is as safe as possible, security needs to be embedded into core data operations. Without this component, companies use ad-hoc processes and tools, leaving sensitive user data open to breaches from insiders, whether intentional or not.

The session will cover the best practices in securing sensitive personal, health and financial data with DataSecOps from hyper-growth startups including Wealthsimple, Canada’s largest fintech, ActiveCampaign, B2B SaaS Unicorn and Innovaccer, the healthcare technology disruptor.

Join Satori and discover how they created the first DataSecOps platform, which streamlines data access by automating access controls, security and compliance for the modern data infrastructure; and learn:


  • The key elements and benefits of DataSecOps
  • Why you should implement DataSecOps in your organization
  • Key industries that can benefit from a DataSecOps program
  • How a robust DataSecOps platform can ensure your data is secure and compliant
  • DataSecOps predictions for 2022 and beyond

Presented by

Ben Herzberg, Chief Scientist at Satori and author of "Snowflake Security: Securing Your Snowflake Data Cloud" (Apress, 2021)

Rule-Based and Pattern Matching for Entity Recognition in Spark NLP  

Recorded on: February 16th @ 2PM EST

Finding patterns and matching strategies are well-known NLP procedures to extract information from text.

Spark NLP library has two annotators that can use these techniques to extract relevant information or recognize entities of interest in large-scale environments when dealing with lots of documents from medical records, web pages, or data gathered from social media.
In this talk, we will see how to retrieve the information we are looking for by using the following annotators:

  • Entity Ruler, an annotator available in open-source Spark NLP.
  • Contextual Parser, an annotator available only in Spark NLP for Healthcare.
  • In addition, we will enumerate use cases where we can apply these annotators.

After this webinar, you will know when to use a rule approach to extract information from your data and the best way to set the available parameters in these annotators.

Presented by

Danilo Burbano

Danilo Burbano is a Software and Machine Learning Engineer at John Snow Labs. He holds an MSc in Computer Science and has 12 years of commercial experience.
He has previously developed several software solutions over distributed system environments like microservices and big data pipelines across different industries and countries. Danilo has contributed to Spark NLP for the last 4 years. He is now working to maintain and evolve the Spark NLP library by continuously adding state-of-the-art NLP tools to allow the community to implement and deploy cutting-edge large-scale projects for AI and NLP.

How One AI & NLP Company Achieved 5x Customer Growth

rEcorded on: February 2nd @ 2PM EST

Data scientists want to learn, grow, and get things done. To better serve them, it’s time to stop highlighting how awesome your products and services are and start providing experiences that actually help your audience.

There are effective ways to attract and engage data scientists: community building and experiential marketing. Join this webinar to learn how to build a global community and memorable experiences that resonate with your audience, help your industry grow faster, and increase your bottom-line revenue – based on a real-word case study.

David Talby (CTO at John Snow Labs), Anna Anisin (CEO and Founder at Formulatedby) and Esther Rietmann (Director of Content and Programming at Formulatedby) will talk about the journey of building the largest online NLP community and event series, leading to 5x customer growth in one year.


  • The 3 pillars of successful community building in practice
  • Building an audience with open-source software and content marketing
  • Creating memorable experiences and long-term relationships
  • What to expect when executing a community building strategy (with real numbers!)

Join us on Wednesday, February 2, 2022 from 2:00-2:45 pm ET for this interactive webinar and learn how successful community building initiatives can be a real game changer in your organization’s marketing strategy. Includes live Q&A with the speakers and access to on-demand recording.

Presented by

Anna Anisin

Anna Anisin is an Entrepreneur with 15+ years in data driven digital marketing experience, she was named a Tech Industry Insider by CNN. Anna raised $10 million in Series A funding and oversaw two acquisitions in her career. She specializes in enterprise B2B and experiential marketing. Currently she’s running a boutique experiential marketing firm, Anna is also an advocate for women in tech and a contributor on data science and marketing at Forbes.

Presented by

David Talby

David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise.

He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK.

David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was selected by CV as the Most Influential CTO of the Year 2019 in Artificial Intelligence.

Presented by

Esther Rietmann

Esther Rietmann has worked in B2B marketing for AI and data science companies for 5+ years, gaining experience in building communities, creating outstanding content and engaging audiences through memorable brand experiences. As the Director of Content and Programming at and Data Science Salon, she is passionate about bringing data scientists together and, especially, empowering women in data. In addition to curating content, she also manages the event speakers and community for the Data Science Salon; and is key to collaborating with’s partners to build strong, ongoing relationships with their customers.

What Can You Expect for ML in 2022

Recorded on: January 19th @ 2:00 p.m. ET

As 2021 comes to a close, the challenges of productionizing and scaling team’s machine learning projects still remain at large. With all the big advancements in AI/ML, what are some actionable learnings for your team? 

In this webinar join us as we talk about 

  • Noteworthy highlights in the AI/ML space from 2021 
  • Upcoming trends in ML/AI for 2022 
  • Market movement (AIIA member valuations, acquisition and consolidation)
  • The shift to ‘Data Monitoring’ from ‘Model monitoring’ OR the evolution of ‘DataOps’ – reference Andrew Ng
  • Bedfellows for 2022: How hardware will meet ISV e.g. Nvidia/HPE and the AIIA ecosystem.
  • Winners & Losers – Best of breed vs Production ML on ‘so-called’ E2E solutions. The new data providers and the role of analytics.

Presented by

Lee Baker, General Secretary at the AI Infrastructure Alliance

Lee is the General Secretary at the AI Infrastructure Alliance. Based out of the UK, he is responsible for the proceedings and growth of the AI Infrastructure Alliance (AIIA) as well as working at Pachyderm to deploy programs and initiatives that enable customer-facing teams to execute the core aspects of their jobs more effectively. When not shuttling his 3 children around, he can most often be found cycling, running and swimming around England’s South Coast.

Automating Clinical Trial Master File Migration & Information Extraction  

Recorded on: January 12th @ 2:00 p.m. ET

Pharmaceutical Companies who conduct clinical trials, looking to get new treatments to market as quickly as possible, possess a high volume of documents. Millions of documents can be created as part of one trial and are stored in a document management system. In case migrating these documents to a new system is needed – for example, when a pharma company acquires the rights to a drug or trial – all these documents must often be read manually in order to classify them and extract metadata that is legally required and must be accurate. Traditionally, this migration is a long, complex, and labor-intensive process.

We present a solution based on the natural language processing (NLP) system which provides:

  • Speed – 80% reduction of manual labor and migration timeline, proven in major real-world projects
  • State of the art accuracy – based on Spark NLP for Healthcare, integrated in a human-in-the-loop solution
  • End-to-end, secure and compliant solution – Air-gap deployment, GxP and GAMP 5 validated

We will share lessons learned from an end-to-end migration process of the trial master file in Novartis.

Presented by

Jiri Dobes - Head of Solutions at John Snow Labs

Jiri Dobes is the Head of Solutions in John Snow Labs. He has been leading the development of machine learning solutions in healthcare and other domains for the past five years. Jiri is a PMP certified project manager. His previous experience includes delivering large projects in the power generation sector and consulting for the Boston Consulting Group and large pharma. Jiri holds a Ph.D. in mathematical modeling.

Enterprise-Scale Data Labelling & Automated Model Training with the Free Annotation Lab

Recorded on: wednesday, December 15th @ 2:00 p.m. ET

Extracting data from unstructured documents is a very common requirement in many verticals from finance and insurance to pharma and healthcare. Recent research has proven that Deep Learning offers impressive results in this task when models are trained on significant data.

However, getting high-quality data involves a lot of manual effort. An annotation project is defined, with annotation guidelines and examples, documents are imported, tasks are distributed among domain expert annotators, a manager tracks team statuses and performance, inter annotators agreement is reached, and the resulting annotations are exported in some standard formats. At enterprise-scale, the management complexity grows drastically due to the volume of projects, tasks, and users.

John Snow Labs’ Annotation Lab is a free annotation tool that has already been deployed and used by large-scale enterprises for more than two years. This webinar presents how you can exploit the tool’s capabilities to easily manage any complex annotation project. Also, it illustrated how models can be trained automatically without writing a line of code and how any Spark NLP model can be used to preannotate documents so that projects are bootstrapped and your domain experts never have to start annotating documents from scratch but just correct and improve the models.

Presented by

Nabin Khadka, Lead Developer - Annotation Lab at John Snow Labs

Nabin Khada leads the team building the Annotation Lab at John Snow Labs. He has 7 years of experience as a software engineer, covering a broad range of technologies from web & mobile apps to distributed systems and large-scale machine learning.

When AI Goes Wrong and How to Fix it Fast

Recorded on: Thursday, November 18 @ 11:00 a.m ET

With algorithms making more and more decisions in our lives, from who gets a job, to who gets hired and fired, and even who goes to jail, it’s more critical than ever to get our intelligent systems talking to us so people can step in when things go wrong.

In the coming decade, organizations will face incredible pressure from regulators and the general public and that means every team needs a plan in place to find and fix mistakes fast or risk PR nightmares and financial disasters.

Join us to learn how to build an AI Red Team to deal with everything from edge cases to outright AI breakdowns, while getting you ready to embrace the latest breakthroughs in explainable AI tomorrow.

Presented by

Dan Jeffries, Chief Technology Evangelist at Pachyderm

Dan Jeffries is Chief Technology Evangelist at Pachyderm. He’s also an author, engineer, futurist, pro blogger and he’s given talks all over the world on AI and cryptographic platforms. He’s spent more than two decades in IT as a consultant and at open source pioneer Red Hat.

Creating Clinical Knowledge Graph By Spark NLP and Neo4j

Recorded on: November 17, 2021 @ 2pm ET

The knowledge graph represents a collection of connected entities and their relations. A knowledge graph that is fueled by machine learning utilizes natural language processing to construct a comprehensive and semantic view of the entities. A complete knowledge graph allows answering and search systems to retrieve answers to given queries. In this study, we built a knowledge graph using Spark NLP models and Neo4j. The marriage of Spark NLP and Neo4j is very promising for creating clinical knowledge graphs to do a deeper analysis, Q&A tasks, and get insights.

Presented by

Ali Emre Varol, Data Scientist at John Snow Labs

Ali Emre Varol is a data scientist working on Spark NLP for Healthcare at John Snow Labs with a decade of industry experience. He has previously worked as a software engineer to develop ERP solutions and led teams and projects building machine learning solutions in a variety of industries. He is also pursuing his Ph.D. in Industrial Engineering at Middle East Technical University and holds an MS degree in Industrial Engineering.

Reliable Data Pipelines 101

Recorded on: November 16, 2021 @ 2pm ET

Whether it’s for an executive dashboard or an ML model, reliable data is critical for the applications that make data-driven companies tick. But when it comes to creating reliable data pipelines, where do you start and what tools and processes do you need in place?

Egor Gryaznov is the co-founder and CTO of Bigeye and was one of the first data engineers at Uber. Egor will draw from his experience supporting thousands of internal users and mission-critical workloads at Uber to provide an actionable guide to data pipeline reliability. 

In this presentation, you will learn:

  • How to approach building data pipelines for a data application
  • What tools you will encounter in development and what you need to know about each
  • How to create SLAs to better align with stakeholders
  • How to manage the data that your application creates

Presented by

Egor Gryaznov, CTO of Bigeye

Egor Gryaznov is co-founder and CTO of Bigeye. Prior to co-founding Bigeye, Egor was one of the early engineers at Uber where he scaled the company’s first data warehouse, supporting thousands of internal users and mission-critical workloads. Egor wrote ELT frameworks used by Uber analysts to generate complete data pipelines for KPI reporting from configuration. He developed custom tooling to automate backfills, replication, and critical data warehousing tasks.

What is NLP? Basics for Beginners

Recorded on: November 9, 2021 @ 2:00 pm ET

If you are interested in learning the basic concepts of NLP, this session is for you. In this short webinar we will start with what is Natural Language Processing, why and how to turn it into data that a computer can process, common NLP tasks, and enterprise applications.

Things you’ll learn:

  • What is natural language
  • Not every language is natural
  • Natural language processing
  • Converting text to numbers
  • Basic tasks
  • Combining tasks for applications
  • Enterprise applications

Presented by

Kinga Parrott, AI Strategist at IBM

Kinga Parrott is an advocate for “Human Centered AI” and an AI Strategist at IBM. She is also a mindfulness enthusiast and loves hiking.

AI Inference Workloads: Solving Challenges Beyond Training Models

recorded on: TUEsday, September 14 @ 2:00 p.m ET

In recent years, enterprise AI initiatives have made great strides in solving the challenges of training massive, distributed computational models. Data wrangling, experimentation management, GPU resource allocation…these challenges have spawned an ever-growing market of new tools and considerable investment. But despite all this, the ability to productize AI is still stunted in most enterprises.

In our survey of more than 200 data scientists and MLOps/IT experts, a majority of AI/ML models still aren’t making it to production. AI/ML teams are now under pressure to optimize and manage AI inference workloads in production and deliver a return on investment.

In this webinar, we will walk through the distinct characteristics of each stage in the ML lifecycle and their computational requirements. We’ll discuss solutions to improve throughput and reduce latency, and finally, we’ll show how one organization built an efficient inference platform on top of Kubernetes to support their scaling AI initiatives.


“Rapid AI development is what this is all about for us. What Run:AI helps us do is to move from a company doing pure research, to a company with results in production.”

Siddharth Sharma, Sr. Research Engineer, Wayve

Presented by

Dr. Ronen Dar

As Run:AI CTO, Dr. Ronen Dar manages research and product roadmap for Run:AI, a start-up he co-founded in 2018. Previously, Ronen worked at Bell Labs, Anobit Technologies (acquired by Apple), and in the algorithms research group in Intel’s mobile wireless division. He is the co-author of many patents in the fields of communication, coding and signal processing. Ronen received his B.Sc., M.Sc. and Ph.D. degrees from Tel Aviv University.

Zero to Advanced Analytics and Machine Learning in 90 Minutes with TigerGraph Cloud

Recorded on: Tuesday, august 31 @ 3:00 p.m ET

New to Graph Databases or to Advanced Analytics? Don’t know how to use graph algorithms for Machine Learning? No worries!

Join this workshop, and we will teach you the concepts of TigerGraph Cloud, graph databases, advanced graph analytics, and machine learning algorithms – all in under 90 minutes!

This hands-on workshop will cover:

  • Graph Fundamentals
  • Graph Use-Cases
  • Introduction to TigerGraph Cloud
  • Integrating Python with TigerGraph Cloud
  • Feature Generation for Supervised Machine Learning. Bonus algorithms will be shared!

Presented by

Jeffrey Tyzzer

Jeff has over 25 years’ experience in database and data management and has worked at companies such as Apple, Oracle, and Accenture in both solution consulting and technology delivery. At TigerGraph, Jeff helps customers understand TigerGraph’s graph-native distributed database and how they can successfully apply it to their most pressing use cases.

Jeff has a B.S. in Information Systems and an M.S. in Library and Information Science.

Vladimir Slesarev

Vlad has worked with some of TigerGraph’s largest financial services customers strategizing ML and graph analytics approaches. He is intimately familiar with the complexities and existing limitations of deep data exploration, having worked on complex government data initiatives with the State Department and customer data integrations and development for Deloitte LLP.

Vlad has a BS in Computer and Information Science from the University of Maryland College Park.

Building your Data Science portfolio using GitHub

recorded on: Tuesday, August 24 @ 3:00 p.m ET

Join DSS and IBM to learn how the key to a successful data science practice is storytelling.

Do you want to step into the data science profession? Showcasing your work through a portfolio is as important as building the required skills. After all, the key to a successful data science practice is storytelling.

In this session, you will learn:

What is GitHub?
What is a repository and how do I create one?
Where to develop your project?
Ideas for projects
Share your projects on social media (LinkedIn, Medium,
And more!

Join us for this valuable webinar and learn more about Florida Hacks, organized by the University of Florida in partnership with IBM. We’re calling on the country’s brightest minds to join a virtual innovation challenge aimed at addressing the Sunshine State’s most pressing conservation issues.

Innovators of all experience levels are invited to participate in the hackathon, where students will be asked to help combat the significant observable effects occurring in the environment. Ideation: Aug 25-Sep 10, Hack: Sep 13-Oct 22. Click here for details and registration.

Presented by

Saishruthi Swaminathan

Saishruthi Swaminathan is an advocate for Trustworthy AI and an Advisory Data Scientist at IBM. She has a Masters in Electrical Engineering specializing in Data Science and a Bachelor’s degree in Electronics and Instrumentation. Her passion is to dive deep into the ocean of data, extract insights, and use AI for social good. Previously, she worked as a Software Developer. She is on a mission to spread the knowledge and experience she acquired in her learning process. She also leads an education for rural children initiative and speaks at meetups focussing on Trustworthy AI and Women’s Empowerment.

Accurate Table Extraction from Documents & Images with Spark OCR

Recorded on: August 11th @ 2:00 p.m ET

Extracting data formatted as a table (tabular data) is a common task — whether you’re analyzing financial statements, academic research papers, or clinical trial documentation. Table-based information varies heavily in appearance, fonts, borders, and layouts. This makes the data extraction task challenging even when the text is searchable – but more so when the table is only available as an image.

This webinar presents how Spark OCR automatically extracts tabular data from images. This end-to-end solution includes computer vision models for table detection and table structure recognition, as well as OCR models for extracting text & numbers from each cell. The implemented approach provides state-of-the-art accuracy for the ICDAR 2013 and TableBank benchmark datasets.

Presented by

Mykola Melnyk

Mykola Melnyk is a senior Scala, Python, and Spark software engineer with 15 years of industry experience. He has led teams and projects building machine learning and big data solutions in a variety of industries – and is currently the lead developer of the Spark OCR library at John Snow Labs.

Crossing the AI Valley of Death: Deploying and Monitoring Models in Production at Scale

Recorded on: Thursday, June 24 @ 3:00 p.m ET

It’s happened again. You built another AI model that will never see the light of day because it won’t make it past the AI “valley of death” – the crossover of model development to model deployment across your enterprise. The handoff between data science and engineering teams is fraught with friction, outstanding questions around governance and accountability, and who is responsible for different parts of the pipeline and process. Even worse? The patchwork approach when building an AI pipeline leaves many organizations open to risks because of a lack of a holistic approach to security and monitoring.

Join us to learn about approaches and solutions for configuring a ModelOps pipeline that’s right for your organization. You’ll discover why it’s never too early to plan for operationalization of models, regardless of whether your organization has 1, 10, 100, or 1,000 models in production.

The discussion will also reveal the merits of an open container specification that allows you to easily package and deploy models in production from everywhere. Finally, new approaches for monitoring model drift and explainability will be revealed that will help manage expectations with business leaders all through a centralized AI software platform called Modzy®.

Presented by

Clayton Davis - Head of Data Science at Modzy

Clayton Davis is Head of Data Science at Modzy where he oversees model development, operational data science capability development, and AI research. Prior to his role at Modzy, Mr. Davis spent over 15 years leading data science work for commercial and government organizations. His experience has spanned the data science spectrum, from analytic macro creation to cloud based deep learning research and petabyte scale big data processing on Hadoop clusters. He has a passion for solving complex puzzles and holds a graduate degree in Physics.

Monitoring your Production NLP Models

recorded on: Wednesday, June 23 @ 1:00 p.m ET

Today, NLP (Natural Language Processing) algorithms power a wide range of intelligent applications from smart devices, customer service chatbots, document processing to search, and targeting. It’s hard to develop a state-of-the-art NLP application and it’s even harder to monitor and guarantee quality and consistency in production.

With the models making key product and business decisions it’s imperative that we have access to specialized production monitoring tools and techniques designed with the complexity and unique approaches of NLP algorithms in mind. For example, to know if your production model is making inaccurate predictions requires ground truth which is very complex and time-consuming to obtain as you consider languages, geographies, context, emotions, and other NLP nuances. On top of that ground truth for NLP is ambiguous and not always black and white.

In this talk we will discuss why monitoring your NLP models is a fundamentally complex problem and key considerations of a model monitoring system. Finally, we will dig into a specific NLP use case and demonstrate how we can leverage the new Verta Model Monitoring capability to easily monitor any NLP model performance, identify model/data drifts and errors, segment model inputs, and outputs by cohorts, and perform root cause analysis.

Presented by

Meeta Dash - VP Product at Verta

As VP Product at Verta Meeta Dash is building MLOps tools to help data science teams track, deploy, operate and monitor models and bring order to Enterprise AI/ML chaos. Prior to Verta, Meeta held several product leadership roles in Appen, Figure Eight, Cisco Systems, Tokbox/Telefonica and Computer Associates building ML data platform, Voice & Conversation AI products and Analytics/Operational Monitoring Tools. Meeta has an MBA Degree from UC Davis and an engineering degree from National Institute of Technology, India.

Speed Optimization & Benchmarks in Spark NLP 3: Making the Most of Modern Hardware

recorded on: Wednesday, June 16 @ 2:00 p.m ET

Spark NLP is the most widely used NLP library in the enterprise, thanks to implementing production-grade, trainable, and scalable versions of state-of-the-art deep learning & transfer learning NLP research. It is also Open Source with a permissive Apache 2.0 license that officially supports Python, Java, and Scala languages backed by a highly active community and JSL members.

Spark NLP library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, multi-class and multi-label text classification, sentiment analysis, emotion detection, unsupervised keyword extraction, and state-of-the-art Transformers such as BERT, ELECTRA, ELMO, ALBERT, XLNet, and Universal Sentence Encoder.

The latest release of Spark NLP 3.0 comes with over 1100+ pretrained models, pipelines, and Transformers in 190+ different languages. It also delivers massive speeds up on both CPU & GPU devices while extending support for the latest computing platforms such as new Databricks runtimes and EMR versions.

The talk will focus on how to scale Apache Spark / PySpark applications in YARN clusters, use GPU in Databricks new Apache Spark 3.x runtimes, and manage large-scale datasets in resource-demanding NLP applications efficiently. We will share benchmarks, tips & tricks, and lessons learned when scaling Spark NLP.

Presented by

Maziyar Panahi - Spark NLP Lead at John Snow Labs

Maziyar Panahi is a Senior Data Scientist and Spark NLP Lead at John Snow Labs with over a decade long experience in public research. He is a senior Big Data engineer and a Cloud architect with extensive experience in computer networks and software engineering. He has been developing software and planning networks for the last 15 years. In the past, he also worked as a network engineer in high-level places after he completed his Microsoft and Cisco training (MCSE, MCSA, and CCNA).

He has been designing and implementing large-scale databases and real-time Web services in public and private Clouds such as AWS, Azure, and OpenStack for the past decade. He is one of the early adopters and main maintainers of the Spark NLP library. He is currently employed by The French National Centre for Scientific Research (CNRS) as a Big Data engineer and System/Network Administrator working at the Institute of Complex Systems of Paris (ISCPIF).

Introduction to TigerGraph Cloud – How to Run TigerGraph on AWS, Azure and GCP

recorded on: Tuesday, June 15 @ 2pm ET

TigerGraph Cloud, a cloud-agnostic database-as-a-service offering, enables users to leverage graph analytics on AWS, Azure, and GCP. TigerGraph Cloud not only fully manages clusters, but also provides a use case library via starter kits, which greatly reduces customers’ time to business value and decreases the learning curve to specific graph database use cases such as fraud detection, recommendation, customer 360, AI, and in-database machine learning. Come learn and ask questions from our TigerGraph Cloud Product Team and watch an end-to-end overview of TigerGraph Cloud with a short demonstration that showcases starter kits and other unique functionality.

Presented by

Benyue (Emma) Liu

Emma has a mission to empower developers, data scientists and enterprise data analysts with easier access to graph analytics and features. She is a senior product manager at TigerGraph leading efforts in Kafka integration, cloud, enterprise features, ecosystem connectors. Emma has a background in data management, cloud computing and complex system designs. Prior to TigerGraph, she worked at Oracle and MarkLogic. Emma holds a Bachelor of Science degree from Harvey Mudd College and a Master of Science degree from MIT.

Duc Le

Engineering Manager at TigerGraph
Master in Management Information Systems from Carnegie Mellon University
Lead Developer for TigerGraph Cloud, GraphStudio and Admin Portal
Areas of specialty: Full-stack Development, Cloud, Containers and Connectors.

Leveraging Open Source in the Enterprise

Recorded on: Tuesday, May 25 @ 6:00 p.m ET

Travis Oliphant, CEO of OpenTeams and founder of Anaconda, joins Coiled to discuss the adoption of Python in the enterprise, how Open Source is such a productive modern development for businesses, and roadmaps for how the Enterprise can leverage Open Source to deliver business value.

Python is the de facto language for machine learning and AI, and is used across many verticals, including healthcare, finance, retail, and space exploration! What are the key features of Python and its community that made this happen? What gaps remain to even more robust adoption and usage in the Enterprise?

Python was created in 1991 as a better scripting language. In the late 1990s, many scientists and engineers started to use it to write scientific code that “fit their brains”. That community swelled organically to tens of millions of users over the next 20 years despite the lack of consistent and targeted funding. What are the aspects of both the language and the community behind Python that contributed to this? What lessons can be learned about making the most of the many open-source projects, frameworks, and languages that are available today? Despite Python’s success, there are still gaps in features and systems that can be a barrier to Enterprise adoption – what are some of these challenges, and what solutions are being proposed?

After attending, you’ll know:

– How Python and PyData (NumPy, pandas, scikit-learn, and many more!) got started;
– What are key reasons AI and ML practitioners use Python;
– What challenges had to be overcome to get Python adopted in the Enterprise;
– What challenges remain for Python adoption in organizations;
– What companies and service providers are helping the enterprise get more value faster out of their data.

Join us Tuesday, May 25 at 6 pm US Eastern time by signing up here and dive into the wonderful world of unlocking business value with Open Source Software!

Presented by

Travis Oliphant

CEO of Open Teams & founder of Anaconda

Building the Best AI Infrastructure Stack to Accelerate Your Data Science

Recorded on: May 19th @ 11am ET

In recent years, enterprises have invested heavily in AI and Deep Learning initiatives. The three stages of most AI implementations, data ingestion and preparation, model building and training, and inference in production, require Data Science, IT and Ops teams to work in new and sometimes confounding ways. Adding complexity, many organizations are choosing to build AI infrastructure from the ground up using containers and Kubernetes.

Join Data Science Salon as we host NVIDIA, RUN:AI and The AI Center for Value-Based Healthcare (at King’s College London) who will share best practices for building an optimized AI infrastructure stack using NVIDIA GPUs, Kubernetes and Run:AI.

In the webinar, we will walk through an example of a well-architected AI Infrastructure stack and discuss how Kubernetes and advanced GPU scheduling can be used to maximize efficiency and speed up data science initiatives. We’ll cover how one organization solved the orchestration challenges that enterprises face when they begin to scale.


Prof. M. Jorge Cardoso - Associate Professor & Senior Lecturer, King’s College London & CTO, AI Centre for Value-Based Healthcare

The London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare is a consortium of academic, healthcare and industry partners, led by King’s College London. It uses medical images and electronic healthcare data held by the UK National Health Service to train sophisticated computer vision and natural-language processing deep learning algorithms.

Michael Balint, Sr. Product Manager, NVIDIA

NVIDIA’s (NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market and has redefined modern computer graphics, high performance computing and artificial intelligence. The company’s pioneering work in accelerated computing and AI is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others.

Omri Geller - CEO and co-founder of Run:AI

Run:AI provides a cloud-native computer resource management platform for the AI era. With Run:AI, data scientists get access to all of the pooled compute power they need to accelerate AI experimentation – whether on-premises or cloud. The company’s Kubernetes-based platform provides IT and MLOps with real-time visibility and control over scheduling and dynamic provisioning of GPUs – and gains of more than 2X in utilization of existing infrastructure.

Credit Risk – Why Model Fairness is Needed

Recorded on: October 28th @ 2PM ET

In this talk, Jen Burningham, Jorge Castañón and Rakshith Dasenahalli will discuss the applications of Machine Learning and AI Fairness techniques in credit risk models for banking institutions. They will cover the typical use cases and the approaches the Data Science Elite (DSE) team used to address the challenges and necessity in mitigating model bias. This session will introduce the Credit Risk accelerator that was developed based on the experience of the DSE team.

Presented by

Jen Burningham

Jen Burningham is a manager, Senior Data Scientist, and Machine Learning Engineer in the IBM Data Science and AI Elite Team. She has a strong background in healthcare, tech, and startups. Jen is passionate about using AI to improve businesses and lives. She received her Bachelor’s Degree in Applied Mathematics with Concentration in Statistics from Yale University and earned her MBA from Yale School Of Management.

Rakshith Dasenahalli

Rakshith is a Data Scientist with the IBM Data Science Elite (DSE) Team and has 3 years of experience on the team. He has worked with various enterprises across different industries helping them understand and use their data to improve their businesses. He has experience working in Machine Learning, Data Visualization, and AIfairness customer engagements. He holds a M.S. in Electrical Engineering from University of Southern California.

Jorge Castañón

Jorge Castañón hails from Mexico City and received his Ph.D. in Computational and Applied Mathematics from Rice University. He has a genuine passion for data science and machine learning business applications. For 15+ years he has been developing data science and AI models as well as algorithms to solve numerical optimization and regularized inverse problems. At IBM, Jorge is the Lead Data Scientist of the Machine Learning Hub North America, a team that empowers organizations to create business value through data science and AI. In 2020, Jorge was certified as a Level 3 Thought Leader and Distinguished Data Scientist by The Open Group.

Euphoria Convenience

RECORDED ON: September 3rd @ 2pm EDT

Euphoria Convenience is big data and predictive analytics perfectly attuned: it means satisfying goods and services arriving even before consumers sense that they want them. Already we see slivers of it on Netflix when the next, curated video begins even before the previous one ends. We experience it on Amazon when suggestions occasionally fulfil an urge we didn’t know we had.

The key to perfected entertainment and service is our personal information, and the underlying human dilemma is about how much of ourselves we should expose to get what we want. In this webinar, real world cases and philosophical thought experiments will help us investigate by exploring questions including: What is privacy? Why is Kim Kardashian the most private person in the world? Is it more satisfying to want and then get, or to get before wanting? Answers are elusive, but the goal is to identify and clarify the ethical dilemmas arising when personal information is gathered and leveraged in the media, entertainment, marketing and retail sectors. This webinar is tailored for professionals in data governance, but open to all those interested in the regulatory and human dilemmas inhabiting privacy and AI powered convenience.

Presented by

James Brusseau

James Brusseau (PhD, Philosophy) is author of books, articles, and digital media in the history of philosophy and ethics. He has taught in Europe, Mexico, and currently at Pace University near his home in New York City. As Director of AI Ethics Site, a research institute currently incubating at Pace University, he explores the human experience of artificial intelligence.

Breaking into AI: ML in the Real World

WATCH LIVE: August 20th @ 2pm EDT

Do you want to build a career in the field of AI? Join the session where 2 IBM data scientists from Data Science and AI Elite Team will share with you their journey to AI, the challenges they faced, interview tips and useful resources, along with their experience applying ML to real-world client problems by leveraging their earlier experiences.

Presented by

Safura Suleymanova - Data Scientist at the IBM Data Science & AI Elite Team

Safura Suleymanova is a Data Scientist at the IBM Data Science and AI Elite Team and develops Machine Learning powered solutions for customers across various industries, including finance, telecom, supply chain, among others. She feels passionate about how data can be used to streamline more routine processes ultimately improving business decisions. To improve existing operational processes at her previous workplaces she started leveraging the data, which led her to the field of Data Science. Safura has a bachelor’s degree in Mathematics, with a specialization in Risk Management and Statistics from University of Waterloo.

Aishwarya Srinivasan - Data Scientist, Data Science Elite Team, IBM Data & AI

Aishwarya joined DSE Team in June 2018 as a Data Scientist. She is a researcher in machine learning and reinforcement learning, and she strives for innovation. During her engagements, she primarily focuses on bringing in machine learning research to create business value.  

With her interest in integrating machine learning in finance, she has presented her work to leading Financial firms like Morgan Stanley, Credit Suisse and World Bank. She is very focused on expanding her horizons in the machine learning research community including her recent Patent Award won in 2018 for developing Reinforcement Learning model for Machine Trading. She is a post-graduate in Data Science from Columbia University.  

She is an ambassador for Women in Data Science community, originating from Stanford University. She has 50k+ follower base on LinkedIn and actively participates/organizes in events and conferences to inspire budding data scientists.  Besides being a data junkie, she is a fitness fanatic who is into martial arts (Krav Maga) and yoga. To spread her knowledge in the space of data science, and to inspire budding Data Scientists, she actively writes blogs related to machine learning in her website

Using Python with a Massively Parallel Database to Predict COVID-19 Numbers

RECORDED ON: July 7th @ 11am PDT

Python is a powerful programming language that is a good choice for many types of analytics. It is rapidly becoming the language of choice for scientists and researchers of many types. Now by combining a massively parallel (MPP) database like Vertica with Python, you can overcome many scale and analytics challenges that can limit Python users. Join this session to see a live demo of Vertica’s built-in integration with Python to measure and predict key numbers of the COVID-19 pandemic by using historical data on the number of cases and fatalities per state to predict the results for the following day.

Presented by

Badr Ouali, Data Scientist at Vertica

Statistical Paradoxes & Logical Fallacies: Don’t Believe the Lies your Data Tells

Recorded on: June 30th @ 11am PDT

I hate to admit it, but your data is lying to you — and more often than you think. Having clean data with high volume, velocity, and variety doesn’t necessarily protect one from the possibility of reaching faulty conclusions to research questions of interest. Despite what you may have learned in Statistics 101, a significant p-value isn’t always groundbreaking. All data can be coerced and bribed to tell any story; thus, as data practitioners, it’s our duty to be cognizant of the possible pitfalls that abound and how to navigate around common traps — responsibly.

By showcasing enterprise use cases and sample projects in Dataiku’s Data Science Studio, Christopher Peter Makris will address the following questions:

– Is more data always better? How can the inclusion or exclusion of data obfuscate a previously held conclusion?

– Is an observed event truly a trend? How can previously noted behaviors be a marker for the complete opposite behavior in the future?

– Is an association worth my time/money/effort? When do conclusive conclusions lead us not to act on a valid association?

Presented by

Christopher Peter Makris, Lead Data Scientist at Dataiku

Christopher Peter Makris (CPM) is a Lead Data Scientist at Dataiku. With a background in Logic, Discrete Mathematics, & Statistics, he brings experience from both industry and academia. CPM previously stood as Director of Data Science at the NYC Data Science Academy and Executive Director of the Master’s of Statistical Practice Program at Carnegie Mellon University; at both institutions, CPM took pride in professionally developing students, aiding in the redesign of technical curricula, and delivering over 3,000 hours of lectures to graduate audiences. In industry, CPM most recently helped thwart cybercrime as the Director of Security Data Science at Empower Retirement. When not crunching numbers on his computer, CPM is crunching his abs in the studio as he’s an avid dancer, choreographer, and CrossFitter (so, while he considers himself a data scientist, he can only count to 5-6-7-8…)

Choosing the Right Deep Learning Framework: A deep learning approach

Recorded on: May 20th @ 2pm EST

Deep Learning Frameworks allow for the easy construction of neural networks and have made Python the go-to language for Artificial Intelligence development. This webinar will show how deep learning can be used to help align developers and data scientists to a particular framework as well as demonstrate some examples of how IBM is extending the value of deep learning frameworks via open source projects, and how new advancements in NLP like transformers are changing how and why to use a particular framework.

Presented by

Nick Acosta, IBM

Before becoming an AI Advocate at IBM, Nick studied Computer Science at Purdue University and the University of Southern California, and was an HPC Consultant for Hewlett-Packard in Grenoble, France. He now specializes in machine learning and utilizes it to understand machine learning developers of various communities, startups, and enterprises in order to help them succeed on IBM’s data science platform and in open source. He has a strong interest in open source software and all things Kardashian.

Everything You Need to Know to Start Data Science with Python

RECORDED ON: May 14th @ 11am PST

Python is becoming a common language between developers and data scientists, which is why so many of the open-source tools and libraries are built for Python. However, navigating through the hundreds of thousands of available packages for data science and machine learning (DS/ML) can be a daunting task.

Are you ready to benefit from Python for data science? Join us to discover new and innovative ways to enhance your models, algorithms, and visualizations.

During the webinar, we’ll discuss:

  • Popular open-source tools for DS/ML
  • Use cases
  • Helpful resources to get started with data science with Python

Presented by

Stan Seibert - Sr. Director, Community Innovation at Anaconda

Stan leads the Community Innovation team at Anaconda, where his work focuses on high performance GPU computing and designing data analysis, simulation, and processing pipelines.He is a longtime advocate of the use of Python and GPU computing for research. Prior to joiningAnaconda, Stan served as Chief Data Scientist at Mobi, where he worked on vehicle fleet tracking and route planning.Stan received a PhD in experimental high energy physics from the University of Texas at Austin,and performed research at Los Alamos National Laboratory, University of Pennsylvania, and theSudbury Neutrino Observatory.

Explainable Data Science Workflows


Learn explainable workflows using open source software, semi-automated pipelines, and Python! From data ingestion, cleaning, piping, and modeling, explainability and trust are at the forefront of enterprise data science initiatives. In this talk, learn the best practices for codifying and relaying explainable data science to stakeholders, management, and the end user in a reproducible and portable fashion.

Presented by

Austin Eovito - Data Scientist, IBM

Austin is a Data Scientist on the Technical Marketing and Evangelism team in San Francisco, California. As a recent graduate student of Florida State University, Austin is focused on the balancing the bleeding-edge research produced by academia and the tools used in applied data science. His Masters thesis was on White Collar Crime using Time-aware Joint-Topic-Sentiment Analysis (TTS). Austin leads IBM’s Data Science Masterclass on AI Explainability and currently resides in San Francisco, with his fiance, dog, and two cats.

Interested in putting your brand in front of the most diverse data science community in the space?