Accurate de-identification, obfuscation, and editing of scanned medical documents and images

WATCH LIVE: August 19th @ 11am PDT

One kind of noisy data that healthcare data scientists deal with is scanned documents and images: from PDF attachments of lab results, referrals, or genetic testing to DICOM files with medical imaging. These files are challenging to de-identify, because personal health information (PHI) can appear anywhere in free text – so cannot be removed with rules or regular expressions – or “burned” into images so that it’s not even available as digital text to begin with.

This webinar presents a software system that tackles these challenges, with lessons learned from applying it in real-world production systems. The workflow uses:

  • Spark OCR to extract both digital and scanned text from PDF and DICOM files
  • Spark NLP for Healthcare to recognize sensitive data in the extracted free text
  • The de-identification module to delete, replace, or obfuscate PHI
  • Spark OCR to generate new PDF or DICOM file with the de-identified data
  • Run the whole workflow within a local secure environment, with no need to share data with any third party or a public cloud API

Presented by

Dr. Alina Petukhova - Data Scientist at John Snow Labs

Breaking into AI: ML in the Real World

WATCH LIVE: August 20th @ 2pm EDT

Do you want to build a career in the field of AI? Join the session where 2 IBM data scientists from Data Science and AI Elite Team will share with you their journey to AI, the challenges they faced, interview tips and useful resources, along with their experience applying ML to real-world client problems by leveraging their earlier experiences.

Presented by

Safura Suleymanova - Data Scientist at the IBM Data Science & AI Elite Team

Safura Suleymanova is a Data Scientist at the IBM Data Science and AI Elite Team and develops Machine Learning powered solutions for customers across various industries, including finance, telecom, supply chain, among others. She feels passionate about how data can be used to streamline more routine processes ultimately improving business decisions. To improve existing operational processes at her previous workplaces she started leveraging the data, which led her to the field of Data Science. Safura has a bachelor’s degree in Mathematics, with a specialization in Risk Management and Statistics from University of Waterloo.

Aishwarya Srinivasan - Data Scientist, Data Science Elite Team, IBM Data & AI

Aishwarya joined DSE Team in June 2018 as a Data Scientist. She is a researcher in machine learning and reinforcement learning, and she strives for innovation. During her engagements, she primarily focuses on bringing in machine learning research to create business value.  

With her interest in integrating machine learning in finance, she has presented her work to leading Financial firms like Morgan Stanley, Credit Suisse and World Bank. She is very focused on expanding her horizons in the machine learning research community including her recent Patent Award won in 2018 for developing Reinforcement Learning model for Machine Trading. She is a post-graduate in Data Science from Columbia University.  

She is an ambassador for Women in Data Science community, originating from Stanford University. She has 50k+ follower base on LinkedIn and actively participates/organizes in events and conferences to inspire budding data scientists.  Besides being a data junkie, she is a fitness fanatic who is into martial arts (Krav Maga) and yoga. To spread her knowledge in the space of data science, and to inspire budding Data Scientists, she actively writes blogs related to machine learning in her website https://aishgrt.wixsite.com/thepeculiarunicorn

Euphoria Convenience

WATCH LIVE: September 3rd @ 2pm EDT

Euphoria Convenience is big data and predictive analytics perfectly attuned: it means satisfying goods and services arriving even before consumers sense that they want them. Already we see slivers of it on Netflix when the next, curated video begins even before the previous one ends. We experience it on Amazon when suggestions occasionally fulfil an urge we didn’t know we had.

The key to perfected entertainment and service is our personal information, and the underlying human dilemma is about how much of ourselves we should expose to get what we want. In this webinar, real world cases and philosophical thought experiments will help us investigate by exploring questions including: What is privacy? Why is Kim Kardashian the most private person in the world? Is it more satisfying to want and then get, or to get before wanting? Answers are elusive, but the goal is to identify and clarify the ethical dilemmas arising when personal information is gathered and leveraged in the media, entertainment, marketing and retail sectors. This webinar is tailored for professionals in data governance, but open to all those interested in the regulatory and human dilemmas inhabiting privacy and AI powered convenience.

Presented by

James Brusseau

James Brusseau (PhD, Philosophy) is author of books, articles, and digital media in the history of philosophy and ethics. He has taught in Europe, Mexico, and currently at Pace University near his home in New York City. As Director of AI Ethics Site, a research institute currently incubating at Pace University, he explores the human experience of artificial intelligence.

Answering natural language questions in medicine

WATCH LIVE: September 16th @ 11am PDT

The ability to directly answer medical questions asked in natural language either about a single patient (“what drugs has this patient been prescribed?”) or a cohort of patients (“list stage 4 lung cancer patients with no history of smoking”) has been a longstanding healthcare industry goal, given its broad applicability across use cases like order validation, pre-authorization, cohort selection, clinical quality reporting, and real-world evidence.

While “natural language BI” systems do exist, they generally fail on clinical questions since they lack healthcare-specific models and reference knowledge. Such systems fail to infer, for example, that a patient with a T1N2M0 tumor has stage 3 lung cancer, that a patient taking Zoloft for years has a history of depression, or that an A1C lab result of 7.5 indicates diabetes.

This webinar presents a software solution, based on state-of-the-art deep learning and transfer learning research, for translating natural language questions to SQL statements. An actual case study will be a system which answers clinical questions by training domain-specific models and learning from clinical reference. This is a production-grade, trainable and scalable capability of Spark NLP Enterprise & Healthcare. Live Python notebooks will be shared to explain how you can use it in your own projects.

Presented by

Prabod Rathnayaka - Graduate Research Assistant and PhD Student at La Trobe University

Past webinars

Dos and Don’ts of Hosting Engaging Virtual Events (and the tech stack to do it)

Recorded on: July 23 @ 2PM ET

Martech—the tools marketing teams are using to empower their digital marketing—has become a key component in almost any organization. But only recently did martech’s long lost cousin gain notoriety: the event tech stack. While in-person events have come to a halt, event planners are being required to quickly pivot and create an engaging virtual experience.

Join us for a live webinar on Thursday, July 23 at 2 PM ET to learn how to:

  • Engage your community and create a first-class virtual experience while maintaining the human component.
  • Evaluate your event tech stack including your event platform, breakout sessions, and networking applications
  • Integrate your entire tech stack to create a cohesive and engaging experience.
  • How to use your tech stack to make your event more visible to your community.

Julia Mohler

Director, Marketing & Operations at Formulatedby

Lyssa Myska Allen

Social Media & Community Manager at Formulatedby

Using Python with a Massively Parallel Database to Predict COVID-19 Numbers

RECORDED ON: July 7th @ 11am PDT

Python is a powerful programming language that is a good choice for many types of analytics. It is rapidly becoming the language of choice for scientists and researchers of many types. Now by combining a massively parallel (MPP) database like Vertica with Python, you can overcome many scale and analytics challenges that can limit Python users. Join this session to see a live demo of Vertica’s built-in integration with Python to measure and predict key numbers of the COVID-19 pandemic by using historical data on the number of cases and fatalities per state to predict the results for the following day.

Presented by

Badr Ouali, Data Scientist at Vertica

Statistical Paradoxes & Logical Fallacies: Don’t Believe the Lies your Data Tells

Recorded on: June 30th @ 11am PDT

I hate to admit it, but your data is lying to you — and more often than you think. Having clean data with high volume, velocity, and variety doesn’t necessarily protect one from the possibility of reaching faulty conclusions to research questions of interest. Despite what you may have learned in Statistics 101, a significant p-value isn’t always groundbreaking. All data can be coerced and bribed to tell any story; thus, as data practitioners, it’s our duty to be cognizant of the possible pitfalls that abound and how to navigate around common traps — responsibly.

By showcasing enterprise use cases and sample projects in Dataiku’s Data Science Studio, Christopher Peter Makris will address the following questions:

– Is more data always better? How can the inclusion or exclusion of data obfuscate a previously held conclusion?

– Is an observed event truly a trend? How can previously noted behaviors be a marker for the complete opposite behavior in the future?

– Is an association worth my time/money/effort? When do conclusive conclusions lead us not to act on a valid association?

Presented by

Christopher Peter Makris, Lead Data Scientist at Dataiku

Christopher Peter Makris (CPM) is a Lead Data Scientist at Dataiku. With a background in Logic, Discrete Mathematics, & Statistics, he brings experience from both industry and academia. CPM previously stood as Director of Data Science at the NYC Data Science Academy and Executive Director of the Master’s of Statistical Practice Program at Carnegie Mellon University; at both institutions, CPM took pride in professionally developing students, aiding in the redesign of technical curricula, and delivering over 3,000 hours of lectures to graduate audiences. In industry, CPM most recently helped thwart cybercrime as the Director of Security Data Science at Empower Retirement. When not crunching numbers on his computer, CPM is crunching his abs in the studio as he’s an avid dancer, choreographer, and CrossFitter (so, while he considers himself a data scientist, he can only count to 5-6-7-8…)

Choosing the Right Deep Learning Framework: A deep learning approach

Recorded on: May 20th @ 2pm EST

Deep Learning Frameworks allow for the easy construction of neural networks and have made Python the go-to language for Artificial Intelligence development. This webinar will show how deep learning can be used to help align developers and data scientists to a particular framework as well as demonstrate some examples of how IBM is extending the value of deep learning frameworks via open source projects, and how new advancements in NLP like transformers are changing how and why to use a particular framework.

Presented by

Nick Acosta, IBM

Before becoming an AI Advocate at IBM, Nick studied Computer Science at Purdue University and the University of Southern California, and was an HPC Consultant for Hewlett-Packard in Grenoble, France. He now specializes in machine learning and utilizes it to understand machine learning developers of various communities, startups, and enterprises in order to help them succeed on IBM’s data science platform and in open source. He has a strong interest in open source software and all things Kardashian.

Everything You Need to Know to Start Data Science with Python

RECORDED ON: May 14th @ 11am PST

Python is becoming a common language between developers and data scientists, which is why so many of the open-source tools and libraries are built for Python. However, navigating through the hundreds of thousands of available packages for data science and machine learning (DS/ML) can be a daunting task.

Are you ready to benefit from Python for data science? Join us to discover new and innovative ways to enhance your models, algorithms, and visualizations.

During the webinar, we’ll discuss:

  • Popular open-source tools for DS/ML
  • Use cases
  • Helpful resources to get started with data science with Python

Presented by

Stan Seibert - Sr. Director, Community Innovation at Anaconda

Stan leads the Community Innovation team at Anaconda, where his work focuses on high performance GPU computing and designing data analysis, simulation, and processing pipelines.He is a longtime advocate of the use of Python and GPU computing for research. Prior to joiningAnaconda, Stan served as Chief Data Scientist at Mobi, where he worked on vehicle fleet tracking and route planning.Stan received a PhD in experimental high energy physics from the University of Texas at Austin,and performed research at Los Alamos National Laboratory, University of Pennsylvania, and theSudbury Neutrino Observatory.

Explainable Data Science Workflows

RECORDED ON: WED APR 15 @ 11.00AM PST

Learn explainable workflows using open source software, semi-automated pipelines, and Python! From data ingestion, cleaning, piping, and modeling, explainability and trust are at the forefront of enterprise data science initiatives. In this talk, learn the best practices for codifying and relaying explainable data science to stakeholders, management, and the end user in a reproducible and portable fashion.

Presented by

Austin Eovito - Data Scientist, IBM

Austin is a Data Scientist on the Technical Marketing and Evangelism team in San Francisco, California. As a recent graduate student of Florida State University, Austin is focused on the balancing the bleeding-edge research produced by academia and the tools used in applied data science. His Masters thesis was on White Collar Crime using Time-aware Joint-Topic-Sentiment Analysis (TTS). Austin leads IBM’s Data Science Masterclass on AI Explainability and currently resides in San Francisco, with his fiance, dog, and two cats.