Best Practices & Tools for Accurate Document Annotation and Data Abstraction

WATCH LIVE: May 27th @ 2pm EST

Are you working on machine learning tasks such as sentiment analysis, named entity recognition, text classification, image classification or audio segmentation? If so, you need training data adapted for your particular domain and task. This webinar will explain the best practices and strategies for getting the training data you need. We will go over the setup of the annotation team, the workflows that need to be in place for guaranteeing high accuracy and labeler agreement, and the tools that will help you increase productivity and eliminate errors.

Presented by

Dia Trambitas - Computer scientist with a rich background in Natural Language Processing

Dia Trambitas is a computer scientist with a rich background in Natural Language Processing. She has a Ph.D. in Semantic Web from the University of Grenoble, France, where she worked on ways of describing spatial and temporal data using OWL ontologies and reasoning based on semantic annotations. She then changed her interest to text processing and data extraction from unstructured documents, a subject she has been working on for the last 10 years. She has a rich experience working with different annotation tools and leading document classification and NER extraction projects in verticals such as Finance, Investment, Banking, and Healthcare.

Global Leader in RetailAI Upgrades from Netezza

WATCH LIVE: May 28th @ 11am PT

Join us to hear about how Symphony Retail uses the Yellowbrick platform for 3-5x greater performance and faster time to insights. Symphony Retail is a global leader in AI-enabled decision platforms, solutions and insights for driving profitable revenue growth for retailers and CPG manufacturers.

In this webinar you will learn how:

  • Symphony Retail was able to consolidate their disparate database systems
  • Symphony Retail saved more than 10 hours per week with their data processes
  • Symphony Retail migrated away from their Netezza systems quickly and easily

Don’t miss this opportunity to see for yourself how Symphony Retail took advantage of Yellowbrick’s innovations to streamline and accelerate their data warehouse environment.

Nigel Pratt

Senior Vice President at Symphony Retail AI

Hema Ganapathy

Sr. Director, Product Marketing at Yellowbrick

Rapidly Onboard and Prep Your Data for Machine Learning

WATCH LIVE: Thursday, May 28 @ 1pm ET / 10am PT

Building machine learning models is a creative process that requires deep thinking and expertise. But there’s a lot of drudgery required to find and prep the data to train your ML model, not to mention getting that model into production or re-training it later. Whether you’re using, Spark MLlib or any other machine learning framework, building your ML model is just one step in the longer process of getting machine learning into production.

Join Infoworks and Databricks at this live webinar for discussion and demo on how to:

  • Find and prep the data to train your ML model in hours instead of days
  • Easily integrate the data with ML environments such as Databricks Notebooks
  • Get an ML model into production more quickly
  • Re-train the model with a push of the button

Dave Finnegan - Sr. Solution Architect, Infoworks

Dave brings over 20 years experience to his current role of senior solution architect at, where he specializes in solutions for enterprise data operations for machine learning and advanced analytics. Prior to Infoworks, Dave worked at, Cask and MongoDB, working with clients to unlock insight from their data.

Prasad Kona - Partner Solutions Architect, Databricks

With over 15 years of industry experience, Prasad Kona is a trusted advisor and thought leader who has built and executed complex data analytics and AI strategy for customers and partners. With his deep knowledge around data engineering, data science, and machine learning, Prasad helps companies attain quantifiable business value from their technology investments.

COVID-19 Recovery Planning using Advanced Analytics

WATCH LIVE: June 10th @ 11am PST

Join us on June 10 at 11 AM PST for a virtual event to learn how leading companies leverage data science and advanced analytics to overcome the COVID-19 crisis.

Key takeaways:

  • How advanced analytics helps to stabilize operations
  • How to adjust forecasting models using international and macroeconomic data
  • How to optimize post-lockdown operations using machine learning

11:00-11:30 am PST – “Demand and Revenue Forecasting for Lockdown and Post-Lockdown: Case Studies and Lessons Learned” by Ilya Katsov, Head of Data Science, Grid Dynamics

11:30-12:00 pm PST – Ambrish Verma, Product, Credit Karma

Maximizing Text Recognition Accuracy with Image Transformers in Spark OCR

WATCH LIVE: June 24th @ 2pm EST

Spark OCR is an object character recognition library that can scale natively on any Spark cluster; enables processing documents privately without uploading them to a cloud service; and most importantly, provides state-of-the-art accuracy for a variety of common use cases. A primary method of maximizing accuracy is using a set of pre-built image pre-processing transformers – for noise reduction, skew correction, object removal, automated scaling, erosion, binarization, and dilation. These transformers can be combined into OCR pipelines that effectively resolve common ‘document noise’ issues that reduce OCR accuracy.

This webinar describes real-world OCR use cases, common accuracy issues they bring, and how to use image transformers in Spark OCR in order to resolve them at scale. Example Python code will be shared using executable notebooks that will be made publicly available.

Presented by

Mykola Melnyk - Senior Scala, Python, and Spark Software Engineer

Mykola Melnyk is a senior Scala, Python, and Spark software engineer with 15 years of industry experience. He has led teams and projects building machine learning and big data solutions in a variety of industries – and is currently the lead developer of the Spark OCR library at John Snow Labs.

Scalable Machine Learning Using Python and a Distributed Analytical Database

WATCH LIVE: July 7th @ 11am PDT

Python is a leading programming language for machine learning today due to its flexibility, portability, and libraries. Another major benefit it provides to data scientists is its ability to work well with other analytics tools and frameworks. However, Python has issues around scalability that can make getting machine learning models into production a challenge. Many machine learning projects stall when trying to make the leap to high-scale production.

Financial institutions have huge amounts of structured data which usually resides in distributed data stores. Instead of using Python to extract sample data from those distributed data stores for building machine learning models, Vertica offers the capability to execute Python computations inside the database where the full dataset resides. This both simplifies model training and boosts accuracy by removing the need to downsample. It also greatly speeds model deployment into full-scale production. You can get proven models deployed in minutes, not months.

In this session, we will demonstrate a credit card fraud detection example of how Python can be combined with a distributed analytical database, Vertica, to parallelize and simplify your machine learning model training and deployment.

Presented by

Badr Ouali, Data Scientist at Vertica

Past webinars

Choosing the Right Deep Learning Framework: A deep learning approach

Recorded on: May 20th @ 2pm EST

Deep Learning Frameworks allow for the easy construction of neural networks and have made Python the go-to language for Artificial Intelligence development. This webinar will show how deep learning can be used to help align developers and data scientists to a particular framework as well as demonstrate some examples of how IBM is extending the value of deep learning frameworks via open source projects, and how new advancements in NLP like transformers are changing how and why to use a particular framework.

Presented by

Nick Acosta, IBM

Before becoming an AI Advocate at IBM, Nick studied Computer Science at Purdue University and the University of Southern California, and was an HPC Consultant for Hewlett-Packard in Grenoble, France. He now specializes in machine learning and utilizes it to understand machine learning developers of various communities, startups, and enterprises in order to help them succeed on IBM’s data science platform and in open source. He has a strong interest in open source software and all things Kardashian.

Everything You Need to Know to Start Data Science with Python

RECORDED ON: May 14th @ 11am PST

Python is becoming a common language between developers and data scientists, which is why so many of the open-source tools and libraries are built for Python. However, navigating through the hundreds of thousands of available packages for data science and machine learning (DS/ML) can be a daunting task.

Are you ready to benefit from Python for data science? Join us to discover new and innovative ways to enhance your models, algorithms, and visualizations.

During the webinar, we’ll discuss:

  • Popular open-source tools for DS/ML
  • Use cases
  • Helpful resources to get started with data science with Python

Presented by

Stan Seibert - Sr. Director, Community Innovation at Anaconda

Stan leads the Community Innovation team at Anaconda, where his work focuses on high performance GPU computing and designing data analysis, simulation, and processing pipelines.He is a longtime advocate of the use of Python and GPU computing for research. Prior to joiningAnaconda, Stan served as Chief Data Scientist at Mobi, where he worked on vehicle fleet tracking and route planning.Stan received a PhD in experimental high energy physics from the University of Texas at Austin,and performed research at Los Alamos National Laboratory, University of Pennsylvania, and theSudbury Neutrino Observatory.

Explainable Data Science Workflows


Learn explainable workflows using open source software, semi-automated pipelines, and Python! From data ingestion, cleaning, piping, and modeling, explainability and trust are at the forefront of enterprise data science initiatives. In this talk, learn the best practices for codifying and relaying explainable data science to stakeholders, management, and the end user in a reproducible and portable fashion.

Presented by

Austin Eovito - Data Scientist, IBM

Austin is a Data Scientist on the Technical Marketing and Evangelism team in San Francisco, California. As a recent graduate student of Florida State University, Austin is focused on the balancing the bleeding-edge research produced by academia and the tools used in applied data science. His Masters thesis was on White Collar Crime using Time-aware Joint-Topic-Sentiment Analysis (TTS). Austin leads IBM’s Data Science Masterclass on AI Explainability and currently resides in San Francisco, with his fiance, dog, and two cats.