Using Python with a Massively Parallel Database to Predict COVID-19 Numbers

WATCH LIVE: July 7th @ 11am PDT

Python is a powerful programming language that is a good choice for many types of analytics. It is rapidly becoming the language of choice for scientists and researchers of many types. Now by combining a massively parallel (MPP) database like Vertica with Python, you can overcome many scale and analytics challenges that can limit Python users. Join this session to see a live demo of Vertica’s built-in integration with Python to measure and predict key numbers of the COVID-19 pandemic by using historical data on the number of cases and fatalities per state to predict the results for the following day.

Presented by

Badr Ouali, Data Scientist at Vertica

MLOps Virtual Salon

LIVE EVENT: July 15th @ 12.30pm – 2.30pm

The MLOps Virtual Salon will focus on the impact of stable and reliable Machine Learning Operations. This two-hour virtual event showcases best-practices and real-world case studies for managing the full ML lifecycle. Hear from three experts who will provide technical demonstrations as well as success stories of real-life examples. Connect & engage with others through live Q&A and discussions.

From Data Scientist to Machine Learning Engineer

WATCH LIVE: July 16th @ 1pm PDT

Are you interested in taking your Data Science career to the next level by becoming a Machine Learning Engineer?

Join Data Science Salon and Springboard for a live webinar on Thursday, July 16th, where we’ll cover how to transition from data science to machine learning engineering.

At the end of the webinar, we’ll share some helpful resources to help you get started in your career transition from Data Scientist to Machine Learning Engineer.


1. What is Machine Learning Engineering

2. From Data Science to Machine Learning

3. Springboard’s Machine Learning Career Track

• Curriculum Depth
• Learning Methodology
• Program Structure & Job Guarantee
• Q&A

Sébastien Arnaud - Lead Subject Matter Expert at Springboard

Sebastien has over 17 years of experience working in Data Science, Software Engineering, and Machine Learning Engineering. He reached the Master level on in 2014 for his past competitive entries, in particular for his real-time competitive solution for the “Job Salary Prediction” using Lucene similarities and Genetic Programming, which ranked him in the top 150 machine learning professionals in March 2013. Sebastian is responsible for the structure of the Machine Learning Engineering Career Track, the philosophy of the program, and the engineering units.

Lucy Chen - Technical Instructional Designer at Springboard

Lucy Chen led the design of Springboard’s Machine Learning Engineering program. She is in charge of leading effective learning experiences with a scientific mindset at Springboard. Previously, she led projects at Linkedin Learning, NovoEd, and Minerva Schools.

Hardening a Cleanroom AI Platform to allow model training & inference on Protected Health Information

WATCH LIVE: July 22nd @ 11am PDT

Artificial intelligence projects in high-compliance industries, like healthcare and life science, often require processing Protected Health Information (PHI). This may happen because the nature of the projects does not allow full de-identification in advance – for example, when dealing with rare diseases, genetic sequencing data, identify theft, or training de-identification models – or when training is anonymized data but inference must happen on data with PHI.

In such scenarios, the alternative is to create an “AI cleanroom” – an isolated, hardened, air-gap environment where the work happens. Such a software platform should enable data scientists to log into the cleanroom, and do all the development work inside it – from initial data exploration & experimentation to model deployment & operations – while no data, computation, or generated assets ever leave the cleanroom.

This webinar presents the architecture of such a Cleanroom AI Platform, which has been actively used by Fortune 500 companies for the past three years. Second, it will survey the hundreds of DevOps & SecOps features requires to realize such a platform – from multi-factor authentication and point-to-point encryption to vulnerability scanning and network isolation. Third, it will explain how a Kubernetes-based architecture enables “Cleanroom AI” without giving up on the main benefits of cloud computing: elasticity, scalability, turnkey deployment, and a fully managed environment.

Presented by

Ali Naqvi - Lead product manager of the AI Platform at John Snow Labs

Dos and Don’ts of Hosting Engaging Virtual Events (and the tech stack to do it)

WATCH LIVE: July 23 @ 2PM ET

Martech—the tools marketing teams are using to empower their digital marketing—has become a key component in almost any organization. But only recently did martech’s long lost cousin gain notoriety: the event tech stack. While in-person events have come to a halt, event planners are being required to quickly pivot and create an engaging virtual experience.

Join us for a live webinar on Thursday, July 23 at 2 PM ET to learn how to:

  • Engage your community and create a first-class virtual experience while maintaining the human component.
  • Evaluate your event tech stack including your event platform, breakout sessions, and networking applications
  • Integrate your entire tech stack to create a cohesive and engaging experience.
  • How to use your tech stack to make your event more visible to your community.

Julia Mohler

Director, Marketing & Operations at Formulatedby

Lyssa Myska Allen

Social Media & Community Manager at Formulatedby

Accurate de-identification, obfuscation, and editing of scanned medical documents and images

WATCH LIVE: August 19th @ 11am PDT

One kind of noisy data that healthcare data scientists deal with is scanned documents and images: from PDF attachments of lab results, referrals, or genetic testing to DICOM files with medical imaging. These files are challenging to de-identify, because personal health information (PHI) can appear anywhere in free text – so cannot be removed with rules or regular expressions – or “burned” into images so that it’s not even available as digital text to begin with.

This webinar presents a software system that tackles these challenges, with lessons learned from applying it in real-world production systems. The workflow uses:

  • Spark OCR to extract both digital and scanned text from PDF and DICOM files
  • Spark NLP for Healthcare to recognize sensitive data in the extracted free text
  • The de-identification module to delete, replace, or obfuscate PHI
  • Spark OCR to generate new PDF or DICOM file with the de-identified data
  • Run the whole workflow within a local secure environment, with no need to share data with any third party or a public cloud API

Presented by

Dr. Alina Petukhova - Data Scientist at John Snow Labs

Euphoria Convenience

WATCH LIVE: September 3rd @ 2pm EDT

Euphoria Convenience is big data and predictive analytics perfectly attuned: it means satisfying goods and services arriving even before consumers sense that they want them. Already we see slivers of it on Netflix when the next, curated video begins even before the previous one ends. We experience it on Amazon when suggestions occasionally fulfil an urge we didn’t know we had.

The key to perfected entertainment and service is our personal information, and the underlying human dilemma is about how much of ourselves we should expose to get what we want. In this webinar, real world cases and philosophical thought experiments will help us investigate by exploring questions including: What is privacy? Why is Kim Kardashian the most private person in the world? Is it more satisfying to want and then get, or to get before wanting? Answers are elusive, but the goal is to identify and clarify the ethical dilemmas arising when personal information is gathered and leveraged in the media, entertainment, marketing and retail sectors. This webinar is tailored for professionals in data governance, but open to all those interested in the regulatory and human dilemmas inhabiting privacy and AI powered convenience.

Presented by

James Brusseau

James Brusseau (PhD, Philosophy) is author of books, articles, and digital media in the history of philosophy and ethics. He has taught in Europe, Mexico, and currently at Pace University near his home in New York City. As Director of AI Ethics Site, a research institute currently incubating at Pace University, he explores the human experience of artificial intelligence.

Answering natural language questions in medicine

WATCH LIVE: September 16th @ 11am PDT

The ability to directly answer medical questions asked in natural language either about a single patient (“what drugs has this patient been prescribed?”) or a cohort of patients (“list stage 4 lung cancer patients with no history of smoking”) has been a longstanding healthcare industry goal, given its broad applicability across use cases like order validation, pre-authorization, cohort selection, clinical quality reporting, and real-world evidence.

While “natural language BI” systems do exist, they generally fail on clinical questions since they lack healthcare-specific models and reference knowledge. Such systems fail to infer, for example, that a patient with a T1N2M0 tumor has stage 3 lung cancer, that a patient taking Zoloft for years has a history of depression, or that an A1C lab result of 7.5 indicates diabetes.

This webinar presents a software solution, based on state-of-the-art deep learning and transfer learning research, for translating natural language questions to SQL statements. An actual case study will be a system which answers clinical questions by training domain-specific models and learning from clinical reference. This is a production-grade, trainable and scalable capability of Spark NLP Enterprise & Healthcare. Live Python notebooks will be shared to explain how you can use it in your own projects.

Presented by

Prabod Rathnayaka - Graduate Research Assistant and PhD Student at La Trobe University

Past webinars

Statistical Paradoxes & Logical Fallacies: Don’t Believe the Lies your Data Tells

Recorded on: June 30th @ 11am PDT

I hate to admit it, but your data is lying to you — and more often than you think. Having clean data with high volume, velocity, and variety doesn’t necessarily protect one from the possibility of reaching faulty conclusions to research questions of interest. Despite what you may have learned in Statistics 101, a significant p-value isn’t always groundbreaking. All data can be coerced and bribed to tell any story; thus, as data practitioners, it’s our duty to be cognizant of the possible pitfalls that abound and how to navigate around common traps — responsibly.

By showcasing enterprise use cases and sample projects in Dataiku’s Data Science Studio, Christopher Peter Makris will address the following questions:

– Is more data always better? How can the inclusion or exclusion of data obfuscate a previously held conclusion?

– Is an observed event truly a trend? How can previously noted behaviors be a marker for the complete opposite behavior in the future?

– Is an association worth my time/money/effort? When do conclusive conclusions lead us not to act on a valid association?

Presented by

Christopher Peter Makris, Lead Data Scientist at Dataiku

Christopher Peter Makris (CPM) is a Lead Data Scientist at Dataiku. With a background in Logic, Discrete Mathematics, & Statistics, he brings experience from both industry and academia. CPM previously stood as Director of Data Science at the NYC Data Science Academy and Executive Director of the Master’s of Statistical Practice Program at Carnegie Mellon University; at both institutions, CPM took pride in professionally developing students, aiding in the redesign of technical curricula, and delivering over 3,000 hours of lectures to graduate audiences. In industry, CPM most recently helped thwart cybercrime as the Director of Security Data Science at Empower Retirement. When not crunching numbers on his computer, CPM is crunching his abs in the studio as he’s an avid dancer, choreographer, and CrossFitter (so, while he considers himself a data scientist, he can only count to 5-6-7-8…)

Choosing the Right Deep Learning Framework: A deep learning approach

Recorded on: May 20th @ 2pm EST

Deep Learning Frameworks allow for the easy construction of neural networks and have made Python the go-to language for Artificial Intelligence development. This webinar will show how deep learning can be used to help align developers and data scientists to a particular framework as well as demonstrate some examples of how IBM is extending the value of deep learning frameworks via open source projects, and how new advancements in NLP like transformers are changing how and why to use a particular framework.

Presented by

Nick Acosta, IBM

Before becoming an AI Advocate at IBM, Nick studied Computer Science at Purdue University and the University of Southern California, and was an HPC Consultant for Hewlett-Packard in Grenoble, France. He now specializes in machine learning and utilizes it to understand machine learning developers of various communities, startups, and enterprises in order to help them succeed on IBM’s data science platform and in open source. He has a strong interest in open source software and all things Kardashian.

Everything You Need to Know to Start Data Science with Python

RECORDED ON: May 14th @ 11am PST

Python is becoming a common language between developers and data scientists, which is why so many of the open-source tools and libraries are built for Python. However, navigating through the hundreds of thousands of available packages for data science and machine learning (DS/ML) can be a daunting task.

Are you ready to benefit from Python for data science? Join us to discover new and innovative ways to enhance your models, algorithms, and visualizations.

During the webinar, we’ll discuss:

  • Popular open-source tools for DS/ML
  • Use cases
  • Helpful resources to get started with data science with Python

Presented by

Stan Seibert - Sr. Director, Community Innovation at Anaconda

Stan leads the Community Innovation team at Anaconda, where his work focuses on high performance GPU computing and designing data analysis, simulation, and processing pipelines.He is a longtime advocate of the use of Python and GPU computing for research. Prior to joiningAnaconda, Stan served as Chief Data Scientist at Mobi, where he worked on vehicle fleet tracking and route planning.Stan received a PhD in experimental high energy physics from the University of Texas at Austin,and performed research at Los Alamos National Laboratory, University of Pennsylvania, and theSudbury Neutrino Observatory.

Explainable Data Science Workflows


Learn explainable workflows using open source software, semi-automated pipelines, and Python! From data ingestion, cleaning, piping, and modeling, explainability and trust are at the forefront of enterprise data science initiatives. In this talk, learn the best practices for codifying and relaying explainable data science to stakeholders, management, and the end user in a reproducible and portable fashion.

Presented by

Austin Eovito - Data Scientist, IBM

Austin is a Data Scientist on the Technical Marketing and Evangelism team in San Francisco, California. As a recent graduate student of Florida State University, Austin is focused on the balancing the bleeding-edge research produced by academia and the tools used in applied data science. His Masters thesis was on White Collar Crime using Time-aware Joint-Topic-Sentiment Analysis (TTS). Austin leads IBM’s Data Science Masterclass on AI Explainability and currently resides in San Francisco, with his fiance, dog, and two cats.