Past webinars
Building Patient Journeys from Multi-Modal Healthcare Data Using Medical Language Models
Recorded on: Thursday, October 31 @ 2pm ET
Data integration has been an enormous challenge in healthcare for decades. Issues of data standardization, data quality, legacy formats, unstructured data, and semantic inconsistencies have made it hard to answer basic questions about how a hospital operates or what should be done next for a patient. Recent advances in Healthcare AI combine to transform this age-old problem – enabling you to automatically ingest large volumes of raw, multi-format, multi-modal, untrusted medical data into coherent longitudinal patient stories in an industry-standard format.
This webinar presents an integrated solution in action that uses John Snow Labs’ state-of-the-art Medical Language Models, healthcare-specific data preparation pipelines, and Text-to-OMOP question answering models running on Databricks’ secure, scalable, and compute-optimized AI platform. The solution takes in multi-modal data – structured (tabular), semi-structured (FHIR resources), and unstructured (free-text) – and generates an OMOP/OHDSI standard data model that:
– Builds a unified view of each patient over time.
– Builds this unified patient view from multi-modal source data.
– Reasons at the patient level.
We’ll then show how the resulting patient data model can then be used for either “AI” (building patient cohorts with natural language queries) or for “BI” (dashboards for patient risk scoring and quality measures), all from the same source of truth, with full explainability and traceability.
Kayla Grieme
Senior Solutions Architect – Academic Medical Centers, Databricks
Kate Weber
Senior Data Scientist, John Snow Labs
Fireside Chat – Efficiency Unlocked: The Power of Synthetic Data in Modern Enterprises
Recorded on: Wednesday, October 2 @ 2pm ET
Synthetic data has emerged as a powerful tool for organizations looking to enhance their operational efficiency and reduce costs. By generating artificial data that mimics real-world data, companies can overcome limitations related to data scarcity, privacy concerns, and high costs associated with data collection and storage. This conversation will delve into the practical benefits and real-world applications of synthetic data, providing insights on how it can drive significant efficiency gains and cost savings.
Varun Nakra
Vice President, Deutsche Bank
Raghavan Muthuregunathan
Senior Engineering Manager, Search AI, Linkedin
Alex Watson
Co-Founder & CPO, Gretel.ai
Polars and Time Series: What It Can Do, What It Can’t (Yet!), and What You Can Do About It
Recorded on: Tuesday, September 10 @ 12:00pm ET
Time series analysis is ubiquitous in applied data science because of the value it delivers. In order to do effective time series analysis, you need to know your tools well. Polars has excellent built-in time series support, and it’s also possible to extend it where necessary.
In this talk, we will cover:
– Basic built-in time series operations with Polars (e.g. “what’s the average number of sales per month?”).
– Advanced, custom time series operations, and how you can implement them as Polars plugins (e.g. business day arithmetic).
– What Polars can’t (yet!) do, and what we can do about it
Audience level:
Basic interest and knowledge of Python and data will be assumed, but no prior Polars experience is required.
—
This event is hosted by OpenTeams (https://www.openteams.com), the sponsor of REPOS, a platform that promotes sustainable funding for open source projects. To learn more, visit https://repos.openteams.com.
Marco Gorelli
Sr. Software Engineer, Quansight Labs
Inessa Pawson
OSPO Manager, OpenTeams
Why You Should Consider Being a Trainer
Recorded on: Thursday, August 22 @ 12:00pm ET
Have you ever considered pursuing a career in training? For our guest speaker, Reuven Lerner, this path was not obvious at first. With a degree in Computer Science, the expectation was a career focused on software engineering and architecture. However, after many years of teaching Python and pandas full-time, he has found both greater financial rewards and a deeper sense of fulfillment and excitement in his work. In this talk, he will share valuable insights into the world of corporate training, his journey into this field, and whether it might be a suitable path for you. The session will conclude with an open Q&A, providing an opportunity for attendees to ask questions and learn more.
Reuven Lerner
Python Trainer, Lerner Python
Open Source Qualified Leads: A Marketing Framework for Commercial Open Source
Recorded on: Wednesday, July 31 @ 12:00pm ET
Avi Press
CEO & Founder, Scarf
Making Multimodal Generative AI Work
Recorded on: Wednesday, June 17 @ 2:00pm ET
The future of generative-AI is multimodal and yet, projects involving LLM and Large Vision Models in production are already struggling as teams try to scale up (and out) awe-inspiring Twitter demos. We’ll discuss what teams need to be doing to support the development and integration of multimodal models into product features.
Mikiko Bazeley
AI & MLOps Engineer, Labelbox
Fast, Cheap, Scalable: Open-Source LLM Inference with Spark NLP
Recorded on: Wednesday, June 26 @ 2pm ET
Learn how the open-source Spark NLP library provides optimized and scalable LLM inference for high-volume text and image processing pipelines. This session dives into optimized LLM inference without the overhead of commercial APIs or extensive hardware setups. We will show live code examples and benchmarks comparing Spark NLP’s performance and cost-effectiveness against both commercial APIs and other open-source solutions.
Key Takeaways:
- Learn how to efficiently process millions of LLM interactions daily, circumventing the costs associated with traditional LLM deployments.
- Discover advanced methods for embedding LLM inference within existing data processing pipelines, enhancing throughput and reducing latency.
- Review benchmarks that compare Spark NLP’s speed and cost metrics relative to commercial and open-source alternatives.
Danilo Burbano
Software and Machine Learning Engineer at John Snow Labs
Danilo Burbano is a Software and Machine Learning Engineer at John Snow Labs. He holds an MSc in Computer Science and has 13 years of commercial experience. He has previously developed several software solutions over distributed system environments like microservices and big data pipelines across different industries and countries. Danilo has contributed to Spark NLP for the last 6 years.
Turning Open Source Into a Sustainable Business
Recorded on: Wednesday, June 12 @ 2:00pm ET
Turning open-source projects into sustainable businesses is a crucial challenge for those in the tech industry. This AMA session explores practical strategies for building profitable business models within the open-source landscape. Travis and Peter will talk about various revenue streams such as support services, premium features, dual licensing, and partnerships. Participants will learn how to effectively communicate the value of open-source solutions, engage with communities for support and feedback, and balance open-source ethos with commercial objectives.
Join us and bring your questions for our speakers, Peter Wang from Anaconda and Travis Oliphant from OpenTeams, as they provide insights on transforming open-source initiatives into lasting ventures.
Travis Oliphant
CEO, Open Teams and Quansight
Peter Wang
Chief AI and Innovation Officer & Co-founder, Anaconda
New State-of-the-art Accuracy for the 3 Primary Uses of Healthcare Language Models
Recorded on: Tuesday, May 29, 2024 @ 2pm ET
This talk presents new levels of accuracy that have very recently been achieved, on public and independently reproducible benchmarks, on the three most common use cases for language models in healthcare:
- Understanding clinical documents: Such as information extraction from clinical notes and reports; detecting entities, relationships, and medical codes; de-identification; and summarization.
- Reasoning about patients: Fusing information across multiple modalities (tabular data, free text, imaging, omics) to create a longitudinal view of each patient, including making reasonable inferences and explaining them.
- Answering medical questions: Answering medical licensing exam questions, biomedical research questions, and similar medical knowledge questions – accurately, without hallucinations, and while citing relevant sources.
David Talby
CTO at John Snow Labs
David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards in 2022 and Game Changers Awards in 2023.
Supporting Open Source in the Post-xz utils Backdoor World
Recorded on: Thursday, May 22 @ 2pm ET
On April 5th, 2024, the open-source community faced a significant challenge when a malicious backdoor was discovered in xz Utils, a popular data compression utility. This incident, triggered by the original maintainer’s burnout and subsequent manipulation by an attacker, underscores the vulnerabilities that arise from insufficient community support and the pressures faced by open-source maintainers.
Armin Nehzat
Chief Product & Marketing Officer
at Thanks.dev
Guiding organisations to accelerate innovation and digital transformation. Over 15 years experience in managing projects and product development life cycles. Strong knowledge of Sales, Marketing, Manufacturing tech stack. Strong knowledge of DevOps in IoT, eCommerce, web and mobile application design.
The 2024 Generative AI in Healthcare Survey
Recorded on: Tuesday, April 30, 2024 @ 2pm ET
This webinar presents key findings from the 2024 Generative AI in Healthcare Survey, conducted in February & March of 2024 by Gradient Flow to assess the key use cases, priorities, and concerns of professionals and technology leaders in this space. Topics covered:
- Current levels of adoption and budget allocation
- Types of language models being used
- Use cases for LLMs
- Priorities for evaluating LLMs and roadblocks
- LLM model enhancement strategies
- LLM testing for Responsible AI requirements
Ben Lorica
The Founder of Gradient Flow
Ben Lorica is founder at Gradient Flow. He is a highly respected data scientist, having served leading roles at O’Reilly Media (Chief Data Scientist, Program Chair of the Strata Data Conference, O’Reilly Artificial Intelligence Conference, and TensorFlow World), at Databricks, and as an advisor to startups. He serves as co-chair for several leading industry conferences: the AI Conference, the NLP Summit, the Data+AI Summit, Ray Summit, and K1st World. He is the host of the Data Exchange podcast and edits the Gradient Flow newsletter.
David Talby
CTO at John Snow Labs
David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was named USA CTO of the Year by the Global 100 Awards in 2022 and Game Changers Awards in 2023.
The Journey to Healthy Leadership Habits
Recorded on: Thursday, April 18 @ 2pm ET
This webinar dives into the essence of effective leadership and the challenges inherent in translating leadership theory into practice. Acknowledging that most leadership behaviors are deeply ingrained and habitual, the session will explore the complexities of habit formation in a professional context. Drawing on new insights from cognitive sciences, the webinar will dissect why forming new leadership habits is a challenging endeavor and will outline strategies for successful habit creation. Tobi will illustrate this journey through three foundational leadership habits: Being Present, Sharing Context, and Setting Boundaries. Each habit will be discussed in detail, providing you with a roadmap for connecting with colleagues, sharing knowledge and perspectives, and establishing clear principles and boundaries. The session aimed to equip leaders with practical advice and actionable strategies to foster healthy leadership habits that can significantly impact their effectiveness and the well-being of their teams.
Tobi Tonner
Growth Leadership Coach at Workband
Tobi Tonner has a ICF Level 1 Diploma in Coaching which is regarded as the gold standard. I’ve also trained in Situational Leadership II, NeuroLeadership and the Entrepreneurial Operating System. He’s passionated about coaching because I believe it’s a powerful way to unlock the potential of people and organisations.
Monetizing Open Source:
Making a Living from Your OS Product
Recorded on: Thursday, April 10 | 2pm ET 11am PT
This webinar is designed to address one of the most pressing questions for creators and sales professionals in the open-source space: how to generate sustainable revenue from open-source products. It will explore various monetization strategies, discuss the challenges and opportunities inherent in open-source business models, and provide actionable insights for effectively selling and scaling open-source solutions.
Wes McKinney
Principal Architect, Posit
Peter Wang
Chief AI and Innovation Officer & Co-Founder, Anaconda
Travis Oliphant
CEO, Open Teams
Carol WillingPresident, Willing Consulting
Andy Terrel
Principal Technical Product Manager, NVIDIA
John Snow Labs’ Native Integrations with LangChain and HayStack
Recorded on: Wednesday, February 28st @ 2pm ET
Learn to enhance Retrieval Augmented Generation (RAG) pipelines in this webinar on John Snow Labs’ integrations with LangChain and HayStack. This session highlights the ability to retain your existing pipeline structure while upgrading its accuracy and scalability. Accuracy is improved thanks to customizable embedding collection and document splitting. Using Spark NLP’s optimized pipelines greatly improves scalability, runtime speed, and as a result cost.
Learn how these native integrations enable an easy transition to more effective methods, enhancing document ingestion from diverse sources without overhauling existing systems. Whether your goal is to enhance data privacy, optimize NLP & LLM accuracy, or scale your RAG applications to millions of documents, this webinar will equip you with the knowledge and tools to fully leverage John Snow Labs’ software to get it done. Join us to unlock the potential of your applications with the latest innovations in Generative AI, without departing from the familiar toolset of your current pipeline.
Muhammet Santas
Sr. Data Scientist at John Snow Labs
Muhammet Santas holds a Master’s Degree in Artificial Intelligence and currently serves as a Senior Data Scientist at John Snow Labs, where he is an integral part of the Healthcare NLP Team. With a robust background in AI, Muhammet contributes his expertise to advancing NLP technologies within the healthcare sector.
Next-Gen Table Extraction from Visual Documents: Leveraging Multimodal AI
Recorded on: Tuesday, January 30st @ 2pm ET
Explore the latest advancements in multimodal AI for extracting tabular data from visual documents. This session will delve into novel methods implemented in John Snow Labs’ Visual NLP library, which has significantly improved the accuracy of information extraction and question answering from tables in PDFs and image files.
The webinar will cover a range of practical applications, demonstrating how this technology is adept at handling complex documents such as financial disclosures, clinical trial results, insurance rates, lab scores, and academic research. The focus will be zero-shot models, where the AI model directly interprets and responds to queries from source images, eliminating the need for specialized training or tuning.
We’ll also cover Visual NLP capabilities that have been specifically designed to enhance table extraction quality, especially in challenging cases like multi-line cells or borderless tables. We’ll discuss the technical underpinnings of this feature, including the integration of computer vision and object character recognition for detecting tables and individual cells within them. We’ll touch upon how that extends to support for tables with custom borders, dark & noisy backgrounds, uncommon table layouts, multilingual text, and international number & currency formats.
Alberto Andreotti
Senior Data Scientist at John Snow Labs
Alberto Andreotti is a data scientist at John Snow Labs, specializing in Machine Learning, Natural Language Processing, and Distributed Computing. With a background in Computer Engineering, he has expertise in developing software for both Embedded Systems and Distributed Applications. Alberto is skilled in Java and C++ programming, particularly for mobile platforms. His focus includes Machine Learning, High-Performance Computing (HPC), and Distributed Systems, making him a pivotal member of the John Snow Labs team.
Operationalizing Large Language Models
Recorded on: Wednesday, January 31 @ 2pm ET
In this webinar, we’ll be exploring the intricate process of effectively managing and scaling multiple LLM applications. Our focus will extend beyond the basic use of notebook-based methods, diving into more robust and advanced strategies. From the initial development stages right through to deployment, this webinar is tailored to provide a comprehensive understanding of the complexities and nuances involved in operationalizing LLMs at scale.
Key Topics We’ll Cover:
- Advanced Management Techniques: Step beyond traditional approaches with insights into state-of-the-art strategies for managing and scaling LLM applications.
- Operationalization Challenges: Uncover the critical aspects often missed in the journey from development to deployment, emphasizing the importance of ongoing maintenance and monitoring for sustained success.
- Scalability in Practice: Equip yourself with practical solutions to effectively scale LLM applications, ensuring they meet the evolving demands and technological trends in today’s fast-paced environment.
Amanda Milberg
Senior Sales Engineer – Partnerships at Dataiku
Amanda Milberg (She/Her) is a seasoned Data Scientist and Senior Sales Engineer at Dataiku, specializing in NLP and AI/ML business solutions. With a Bachelor’s degree in Computer Science and Mathematics from Colgate University, she combines deep technical expertise in Java, Python, C, Neo4j, SQL, HTML/CSS, Dash, and JavaScript with a strong acumen for fostering partnerships. As a WiDS Ambassador, Amanda is also dedicated to promoting women’s involvement in data science, making her a respected leader and innovator in the tech community.
Building a RAG LLM Clinical Chatbot with John Snow Labs in Databricks
Recorded on: Tuesday, DECEMBER 12 @ 2pm ET
In the era of rapidly evolving Large Language Models (LLMs) and chatbot systems, we highlight the advantages of using LLM systems based on RAG (Retrieval Augmented Generation). These systems excel when accurate answers are preferred over creative ones, such as when answering questions about medical patients or clinical guidelines. The RAG LLMs have the advantage of reducing hallucinations, by explaining the source of each fact. They also enable near-real-time data updates without re-tuning the LLM.
This session walks through the construction of a RAG (Retrieval Augmented Generation) Large Language Model (LLM) clinical chatbot system, leveraging John Snow Labs’ healthcare-specific LLM and NLP models within the Databricks platform.
Coupled with a user-friendly graphical interface, this setup allows users to engage in productive conversations with the system, enhancing the efficiency and effectiveness of healthcare workflows. Acknowledging the need for data privacy, security, and compliance, this system runs fully within customers’ cloud infrastructure – with zero data sharing and no calls to external API’s.
Amir Kermany
Sr. Industry Solutions Director, HLS at Datbricks
Amir is the Technical Industry Lead for Healthcare & Life Sciences at Databricks, where he focuses on developing advanced analytics solution accelerators to help health care and life sciences organizations in their data and AI journey.
Veysel Kocaman
Head of Data Science at John Snow Labs
Unifying Minds: Unleashing the Synergy Between LLM’s and Knowledge Graphs
Recorded on: Tuesday, August 29 @ 2pm ET
This presentation delves into the seamless integration of Knowledge Graphs (KGs) and Large Language Models (LLMs) to unlock their collective potential in revolutionizing natural language understanding. By synergizing the structured knowledge of KGs and the text comprehension capabilities of language models, we unlock a powerful combination that bridges the gap between domain-specific, potentially sensitive data, and the vast general knowledge embedded within LLMs. Furthermore, we explore how language models can enrich KGs through efficient knowledge extraction and refinement techniques. This integration opens up a realm of opportunities across diverse domains, from dynamic question-answering systems to context-aware and intelligent chatbots, paving the way for truly transformative applications. Don’t miss this insightful exploration into the world of unified minds!
Jörg Schad
CTO at ArangoDB
Jörg Schad, PhD, is the CTO. In a previous life, he worked on Machine Learning Infrastructure in health care, distributed systems at Mesosphere, implemented distributed and in-memory databases, and conducted research in the Hadoop and Cloud area. He’s a frequent speaker at meetups, international conferences, and lecture halls. Jörg is fluent in three different languages and is passionate about science & technology, education, and the environment.
Lessons from the Trenches: Implementing and Optimizing LLMs
Recorded on: Wednesday, August 16th @ 2pm ET
In today’s era of technological innovation, Large Language Models (LLMs) stand at the forefront of artificial intelligence applications. This Webinar presents a unique opportunity to explore the real-world experiences and expert insights on the development, deployment, and optimization of these groundbreaking systems.
The session is structured as a fireside chat, moderated by: Vikram Chatterji, Co-founder and CEO at Galileo
Joining Vikram for the conversation are:
Valliappan Narayanan, Associate Director of Engineering at AT&T Inc: A technology management expert, Valliappan will shed light on the challenges of scaling and deploying LLMs within large corporations, based on his hands-on work at AT&T.
Nishan Subedi, Head of Marketplace DSML, Director of Engineering at Doordash: Providing insights into the data science and machine learning landscape within the fast-paced marketplace ecosystem, Nishan will discuss the practical application of LLMs at Doordash.
Together, they will explore key areas such as:
1. Design Principles: How to tailor LLMs for specific applications and requirements.
2. Optimization Techniques: Methods and best practices for enhancing model performance.
3. Scalability Challenges: Insights into overcoming hurdles in large-scale deployment.
4. Real-World Deployment: Firsthand experiences in implementing LLMs across various industries.
5. Ethical Considerations and Bias Mitigation: Strategies for responsible AI development.
6. Performance Monitoring and Maintenance: The ongoing work of ensuring optimal model functionality.
Join this candid and insightful conversation and get a rare glimpse into the minds of industry leaders who have been on the front lines of LLM implementation and optimization.
Nishan Subedi
Head of Marketplace DSML, Director of Engineering at Doordash
Nishan is an Engineering Director at Doordash, where he leads the Core Marketplace Machine Learning organization. He has been on the forefront of LLM adoption at DoorDash and has been an active voice online on the potentials of leveraging LLMs for e-commerce.
Jayeeta Putatunda
Senior Data Scientist at Fitch Ratings
Jayeeta is a Senior Data Scientist with several years of industry experience in Natural Language Processing (NLP), Statistical Modeling, Product Analytics and implementing ML solutions for specialized use cases in B2C as well as B2B domains. Currently, Jayeeta works at Fitch Ratings, a global leader in financial information services. She is an avid NLP researcher and gets to explore a lot of state-of-the-art open-source models to build impactful products and firmly believes that data, of all forms, is the best storyteller.
Vikram Chatterji
Co-founder and CEO at Galileo
Vikram is the co-founder and CEO at Galileo, a ML data quality intelligence platform for data science teams building computer vision, NLP and generative AI models.
Prior to founding Galileo, Vikram led product management at Google AI.
Valliappan Narayanan
Associate Director of Engineering at AT&T Inc
Valli is currently engaged in the development of an AI assistant utilizing Generative AI for the largest customer service application worldwide. With a considerable amount of experience in constructing and launching direct-to-consumer products, Valli has worked with renowned companies such as Walmart, AT&T, and Bestbuy. Some noteworthy projects under Valli’s belt include the creation of the Walmart Grocery e-commerce platform and the Bestbuy e-commerce platform. Specializing in areas such as no-code development, AI/ML, and cloud infrastructure, Valli brings a diverse skill set to the table
Debugging LLMs: Best Practices for Better Prompts and Data Quality
Recorded on: Thursday, June 22nd @ 2pm ET
Large language models (LLMs), have raised concerns about incorrectly generated outputs and the potential impact on spreading disinformation and yielding poor outcomes in critical domains like healthcare and financial services.
In this panel conversation, participants from BuzzFeed and Galileo will discuss best practices for data science teams to effectively debug, manage and control results from LLMs that were not anticipated.
They will explore the importance of implementing better controls around fairness and bias, and share their insights on best practices to ensure the safe and ethical usage of LLMs at scale.
Furthermore, the panel will delve into the pressing need for faster and more robust LLM fine-tuning and prompt evaluations.
Join this discussion to gain valuable insights and practical recommendations on how to improve the quality of your prompts, data integrity, and fairness while building your LLM powered apps with your data.
Archi Mitra, Senior Director of Machine Learning at BuzzFeed
Archi is the Head of ML at BuzzFeed and leads cross-functional ML teams driving Gen AI and Personalization efforts across all BuzzFeed brands. He combines with deep tech experience in Search, RecSys, Computer Vision & MLOps + ability built consensus and rally large disparate groups of people towards delivering compelling internet scale user products. Prior to BuzzFeed, he led Search & Recs and Computer Vision teams at Wayfair and researched Ethical AI systems at Northeastern University CCIS lab. While not at musing about neural network architectures, he likes to wonder about cosmology, astronomy and space.
Atindriyo Sanyal, Co-Founder at Galileo
Atindriyo is a Co-Founder and CTO at Galileo. Prior to that, he has spent 10+ years building large scale ML platforms at Uber and Apple. Formerly, he was a Staff Software Engineer and Tech Lead on Uber’s Michelangelo ML platform and a co-architect of Michelangelo’s Feature Store.
His work scaled the Uber’s Feature Store to serve 20000+ ML Features across all of Uber Machine Learning. He led ML Data Quality efforts for Uber. The solutions and tooling his team built improved the production performance of over 10000 models powering Uber’s ML.
Later on, his work with the Stanford AI Lab conceptualized Embedding Stores – a Feature Platform for managing and serving time sensitive entity embeddings to downstream ML models.
The Rising Demand for Vector Databases in the Age of ChatGPT
Recorded on: May 16th @ 2pm ET
ChatGPT, the artificial intelligence chatbot from OpenAI, has taken the world by storm and has transformed the way we perform everyday tasks. With its advanced language processing capabilities, it has become a go-to solution for people looking for quick answers, whether it’s about the weather, the news, or how to cook a recipe.
New open source plugins have made it possible for developers to easily build their own ChatGPT knowledge base solutions. At the heart of these solutions is a data model that converts unstructured data from text, videos, images and more into vector embeddings.
Vector databases are designed to efficiently store and retrieve these high-dimensional embeddings, making them an ideal solution for chat applications that require fast and accurate vector search results. Join Frank Liu, ML architect at Zilliz, for a session on why vector databases are critical to the success of these LLM-based chat solutions and how developers can build powerful chat solutions that deliver quick answers.
In this session, you’ll learn:
- What is a vector database
- Why is it important to store your embeddings in a purpose-built database
- Why dumping your embeddings into Postgres is a bad idea
- How we built a chat knowledge base for open-source projects using Zilliz, prompts-as-code, and ChatGPT
Presented by
Frank Liu - Director of Operations & ML Architect at Zilliz
Frank Liu is the Director of Operations & ML Architect at Zilliz, where he serves as a maintainer for the Towhee open-source project. Prior to Zilliz, Frank co-founded Orion Innovations, an ML-powered indoor positioning startup based in Shanghai and worked as an ML engineer at Yahoo in San Francisco. In his free time, Frank enjoys playing chess, swimming, and powerlifting. Frank holds MS and BS degrees in Electrical Engineering from Stanford University.
Building Computer Vision Models 10x Faster with High-quality Data
Recorded on: Tuesday, May 9th @ 2pm ET
Data is the most critical input to produce high performing computer vision models, and yet data scientists need to manually rummage through the data for days to identify poor quality data that pulls model performance down.
This leads to biased models, lost opportunities for quick model performance improvements and a lot of wasted time for data scientists.
In this webinar, we will learn from leaders in the data science community (from Pinterest and Elbit Systems of America) about the criticality of fixing the data, best practices to do so, and how platforms like Galileo are pivotal in the workflow.
Anirudh Koul - Head of Machine Learning Data Sciences at Pinterest
Anirudh Koul is a noted AI expert, UN/TEDx speaker, author of O’Reilly’s Practical Deep Learning book and a former scientist at Microsoft Research, where he founded Seeing AI, considered the most used technology among the blind community after the iPhone.
He works at Pinterest helping incubate emerging technologies. With features shipped to a billion users, he brings over 17 years of production-oriented applied research experience on petabyte-scale datasets. He also serves as an ML Lead for Frontier Development Labs & SpaceML – NASA’s AI Accelerator, and coaches a podium-winning team for Roborace autonomous driving championship @175mph.
Alexandra Levinson - Principal Algorithms / AI Engineer at Elbit Systems of America
Alexandra is a Principal Algorithms Engineer specializing in Tracking Systems at Elbit Systems of America, based in Miami FL. Previously, Alexandra worked as the VP of Analytics at an ag-tech start up, and as a Signal Processing Engineer on the Advanced Algorithms team at Raytheon Missile Systems. Alexandra graduated with honors from the University of Florida with a degree in Mechanical Engineering and holds an algorithm design patent in the field of Image Restoration.
Vikram Chatterji - Co-founder and CEO at Galileo
Vikram is the co-founder and CEO at Galileo, a ML data quality intelligence platform for data science teams building computer vision, NLP and generative AI models.
Prior to founding Galileo, Vikram led product management at Google AI.
Unifying Data and Models for Cross-domain Personalized Fashion Recommendations
Recorded on: January 31, 2023 @ 2pm ET
Across industries, people have come to expect highly personalized and dynamic experiences – but for those providing them, managing extensive sets of data across various channels to ensure personalized touchpoints can be an incredibly complex task.
In this webinar, we will explore how Stitch Fix evolved its large suite of recommender models into a novel model architecture that unifies data from client interactions to deliver a holistic and real-time understanding of their style preferences. Stitch Fix’s Client Time Series Model (CTSM) is a scalable and flexible sequence-based recommender system that models client preferences over time, based on event data from various sources, to provide multi-domain, channel-specific recommendation outputs.
Data Science Manager, Kevin Zielnicki, will share how the model has enabled Stitch Fix to continuously provide personalized fashion at scale, like no other apparel retailer.
Presented by
Kevin Zielnicki - Data Science Manager at Stitch Fix
Kevin Zielnicki is Principal Data Scientist, Styling Recommendations Lead at Stitch Fix, where he has worked for over six years. Kevin works on the Product Algorithms team and is interested in the impact recommendation algorithms can have when used as an amplifier for human creativity. Prior to Stitch Fix, Kevin spent time at Silicon Valley Data Science and Seldn in various data science and engineering roles. Kevin holds a PhD in Physics from University of Illinois Urbana-Champaign, where he focused his studies on optical quantum information processing, and a Bachelors of Science in Physics from Harvey Mudd College.
Recorded on: November 23, 2022 @ 2pm ET
Using Amazon SageMaker to train, test and deploy models is an efficient way of handling the machine learning lifecycle. Being an iterative process, using SageMaker allows data scientists to manage the entire machine learning pipeline with autoscaling provisions, advanced data security, data monitoring, model monitoring, high performance and low cost ML development. Compared to local or on demand set up, using SageMaker can save up to 67% of total resource utilization.
However, deploying custom machine learning models in the cloud such as AWS can be daunting. First, there is not much documentation available for individual use cases and secondly, each model is different.
This tutorial will walk you through the framework for deployment of any custom model in SageMaker using Docker. It will cover how to deploy the model as a Rest API and how to test it through Postman to showcase the model to customers and stakeholders.
What you’ll learn in this webinar:
- How to create custom ML models in AWS SageMaker
- How to deploy the model as a REST API
- All you need to know about AWS services such as SageMaker, ECR (Elastic Container Registry), Lambda, API Gateway and non AWS Services- Docker
Presented by
Swagata Ashwani - Senior Data Scientist at Boomi
Swagata Ashwani is a Senior Data Scientist at Boomi where she focuses on deploying large scale models in production.
She has created various custom models in the integration space in SageMaker, and realized that although this should be a common problem, it is something that is not straightforward. While SageMaker supports bringing your own custom model to life, the process involves knowledge of a lot of concepts.
Swagata loves to talk about the possibilities of AI in the future and is keen on making an impact in the world. She believes in collaboration and sharing all the knowledge and journey with like minded professionals.
Outside of work, she is an avid reader, dancer and found sipping a hot cup of masala chai at all other times.
Recorded on: November 2, 2022 @ 2:00 PM ET
Zero-Shot Learning (ZSL) is a new paradigm that has gained massive popularity recently due to its potential of reducing data annotations and high generalisability. In the pursuit of bringing product-ready latest ML research to our community, we have implemented ZSL for two major tasks in Spark NLP for Healthcare: Named Entity Recognition (NER) and Relation Extraction (RE).
In this session, we will explore ZSL models available as part of Spark NLP for healthcare library, how to use them using automatic prompt generation using Q&A models, and finally, how they perform on real data and help reduce data annotation requirements.
Presented by
Hasham Ul Haq - Machine Learning Engineer at John Snow Labs
Hasham Ul Haq is a Data Scientist at John Snow Labs, and an AI scholar and researcher at PI School of AI. During his carrier, he has worked on numerous projects across various sectors, including healthcare. At John Snow Labs, his primary focus is to build scalable and pragmatic systems for NLP, that are both, production-ready, and give SOTA performance. In particular, he has been working on Span detection, Natural Language Inference, disambiguation, Named Entity Recognition, and a lot more! Hasham also has an active research profile with publications in NeurIPS, AAAI, and multiple scholarship grants and affiliations.
Prior to John Snow Labs, he was leading search engine and knowledge base development at one of Europe’s largest telecom providers. He has also been mentoring startups in computer vision by providing trainings and designing ML architectures
Recorded on: September 27 @ 2PM EST
As companies are continuing to explore new data-driven initiatives, integrating more business applications into their data lake or lakehouse, and implementing new data-focused tooling to power business decisions, the imperative to create reliable data infrastructure becomes paramount. In this demo, we’ll explore Monte Carlo’s approach to identifying where data incidents occur, whether across your entire data stack or down to the field level, so that you can mitigate data downtime to increase trust in your team’s work and provide quality data to your business.
Presented by
Ethan Post - Sales Engineer at Monte Carlo
Ethan Post is a Sales Engineer at Monte Carlo, a data reliability company, where he helps data teams achieve more trustworthy data with end-to-end data observability. Ethan began his career as BI implementation consultant, building end-to-end data solutions for some of the most recognizable brands in the world. Most recently, he led the PreSales Center of Enablement for an analytics startup where he witnessed firsthand the true cost of poor data quality and observability. He received his MS in Information Systems from Indiana University. In his spare time, he loves cooking, woodworking, and spending time with his wife and two young children.
Brandon Chen - Product Marketing at Monte Carlo
Brandon is currently part of the Product Marketing team at Monte Carlo. Prior to Monte Carlo, Brandon had experience working in both the Sales Engineering and Product Marketing teams at Fivetran. He currently resides in Oakland, CA, and is an avid fisherman in his off-time.
Data Masking, Redaction & Obfuscation: How to Anonymize Data at Scale
Recorded on: September 13 @ 2PM EST
Join this webinar to learn about different ways to anonymize sensitive data when working with data at scale.
Ben Herzberg, Chief Scientist at Satori, will present different types of data anonymization, such as data masking, redaction, obfuscation and more.
In addition, you will hear some of the challenges in applying anonymization when data is accessed by many users in modern environments, and how to overcome them.
Presented by
Ben Herzberg - Chief Scientist at Satori
Ben Herzberg is an experienced tech leader and book author with a background in endpoint security, analytics, and application & data security.
Ben filled roles such as the CTO of Cynet, and Director of Threat Research at Imperva.
Currently, Ben is the Chief Scientist for Satori, the DataSecOps platform.
Recorded on: June 29 @ 2PM EST
Broken data is costly, time-consuming, and nowadays, an all-too-common reality for even the most advanced data teams. In this talk, we will introduce this problem, called “data downtime” — periods of time when data is partial, erroneous, missing or otherwise inaccurate — and discuss how to eliminate it in your data ecosystem with end-to-end data observability. Drawing corollaries to application observability in software engineering, data observability is a critical component of the modern DataOps workflow and the key to ensuring data trust at scale. We will share why data observability matters when it comes to building a better data quality strategy and highlight tactics you can use to address it today.
Presented by
Jon So - Head of Product Marketing at Monte Carlo
Jon So leads product and partner marketing at Monte Carlo. Prior to Monte Carlo, Jon led product marketing, pricing strategy, and growth teams at Twilio Segment, Nauto, and Oracle | Opower, and advised Fortune 500 companies while working in Deloitte Consulting’s Strategy & Operations practice. He resides in San Francisco, CA, and graduated from the University of North Carolina at Chapel Hill.
Katie Noonan - Support Engineer at Monte Carlo
Katie Noonan leads the technical support department at Monte Carlo. Prior to Monte Carlo, Katie specialized in technical enablement for enterprise accounts, and product process and operations at Twilio Segment. She currently resides in San Francisco, CA, and graduated from the University of Oregon.
Recorded on: June 22 @ 2PM ET
Relation Extraction, which is one of the most important tasks of NLP applications in healthcare, is an expensive process to find competent people who can label the data and label the data in order to train the models. By using the Zero-Shot Learning method, which has recently been used in the field of NLP, it has now become possible to train Relation Extraction models without the need for data labeling. In this presentation, we will explain how to use the Zero-Shot Learning method for Relation Extraction in unstructured texts.
Presented by
Muhammet Santas - Data Scientist at John Snow Labs
Muhammet Santas has a Master’s Degree St. in Artificial Intelligence and works as a Data Scientist at John Snow Labs as part of the Healthcare NLP Team.
As teams look to build and deploy models into production, they need tools that can adequately scale with them. In particular, the tools they need must allow them to quickly monitor, segment, retrain, and experiment on the data.
Join us as the team at Pachyderm and Superwise discuss:
- What is MLOps and why is data critical for it
- How to architect a scalable and automated platform
- Why your team should adopt a Production-First data approach.
Presented by
Oryan Omer - Lead Software Engineer at Superwise
Oryan Omer is on the lead software engineering team at Superwise with 7 years experience developing ML products. Oryan has also led an elite unit of engineers for Israel Defense Forces. In recent years, the focus has been on MLOps solutions to simplify the ML Life cycle. In his spare time, he surfs on every board there is (Wave surfing, snowboarding, carving, etc..)
Harpreet Sahotar - Developer Relations at Pachyderm
Harpreet is part of the Developer Relations team at Pachyderm, where he works across marketing, content, evangelism, and product teams. He’s also host of The Artists of Data Science podcast, father, avid reader, perpetual learner, craft beer snob.
Sophisticated financial criminals are constantly changing their tactics.
AI-based detection systems, dependent on large training sets of historical data, may be good at catching the low-value frauds but may struggle to pick up on the newest highest-risk approaches.
In that case, you need a system that gives your seasoned fraud professionals an easy and quick way to pull together a 360-degree view of the people, accounts, transactions and communications of suspicious actors, so you can react before they take too much.
This talk will explore how graph databases can help improve your AI-based detection systems and give you tools for intelligent investigation, transforming your ability to respond to fraud and sending a strong message to organized criminals to leave your bank alone.
Presented by
Harry Powell
Harry Powell is Head of Industry Solutions at TigerGraph. He joined TigerGraph from Jaguar Land Rover where he spent four years running the Data and Analytics business, where the team contributed $800 million profit over four years. Prior to that, he was the Director of Advanced Analytics at Barclays which released a number of world-class data science innovations to production, including the first Apache Spark application in the European financial services industry. Harry has been working with graph data on and off for 10 years. His team at Barclays built a number of graph applications, and at JLR, he was an early adopter of TigerGraph, using a graph database to solve supply chain, manufacturing and purchasing challenges at the height of Covid shutdown and the semiconductor shortage.
Supreet Kaur
Supreet Kaur is working as an Assistant Vice President at Morgan Stanley where she is the product owner of various AI-driven products. Prior to this, she was working at ZS where she developed various technology-driven solutions to automate workflows and drive business decisions for top pharmaceutical clients.
She is the founder of DataBuzz which is a volunteer-driven organization aimed at engaging the tech and AI enthusiasts through mentorship and educating them about the latest tech and AI trends through fireside chat series with leaders from the tech industry, and blogs, etc. She is also the US-Analytics Lead for Women in AI and a mentor at Rutgers University. Columbia University and Weavr.She is also an ardent writer on Medium and has written various blogs to inspire others who also want to pivot in the technology industry.
A lot of ink has been spilled on the modern data stack. Everyone talks about its architecture, it’s tools, and how to organize teams to support it. As the data community, it’s easy to be excited about this impressive new technology.
Succeeding with the modern data stack, however, goes far beyond a collection of architectural diagrams; it’s an experience. It’s rushing to answer your CEO’s urgent question just before the meeting; it’s spending more time trying to verify that a dashboard is accurate than talking about what the dashboard actually says; it’s piecing together this month’s board deck from disconnected BI reports, SQL queries, and Excel files.
An experiential roadmap provides a solution to this problem. It helps companies move from building the modern data stack to creating the modern data experience, which is key to use data effortlessly.
In this webinar, Benn Stancil, Co-Founder and Chief Analytics Officer at Mode, will explain why the modern data stack isn’t sufficient enough and how companies can overcome the challenges of creating sustainable user experiences. He’ll lay out a vision for the modern data experience, along with some actionable, implementable recommendations.
In this webinar you’ll learn:
- How to establish a data culture that facilitates better automation, better understanding between cross-functional teams, and more effective workflows.
- How to merge Business Intelligence (BI) and data science.
- Why it’s important to make status and trust explicit and be transparent about where the data is coming from.
- Why sharing the same business logic is crucial when creating a more seamless experience for people who are consuming the data.
Presented by
Benn Stancil
Benn Stancil is an accomplished data analyst with deep expertise in collaborative Business Intelligence and Interactive Data Science. Stancil is Co-founder, President, and Chief Analytics Officer of Mode, an award-winning SaaS company that combines the best elements of Business Intelligence (ABI), Data Science (DS) and Machine Learning (ML) to empower data teams to answer impactful questions and collaborate on analysis across a range of business functions. Under Stancil’s leadership, the Mode platform has evolved to enable data teams to explore, visualize, analyze and share data in a powerful end-to-end workflow. Prior to founding Mode, Stancil served in senior Analytics positions at Microsoft and Yammer, and worked as a researcher for the International Economics Program at the Carnegie Endowment for International Peace. Stancil also served as an Undergraduate Research Fellow at Wake Forest University, where he received his B.S. in Mathematics and Economics. He believes in fostering a shared sense of humility and gratitude
Recorded on : April 7 @ 11am-1pm ET
Join us on Thursday, April 7, 2022, 11am-1pm ET for the second edition of the new event series by AI Infrastructure Alliance, focusing on model serving.
Learn how to serve models in simple and complex environments, see the difference between batch and online, sunset old models, use shadow and canary deployments, monitor the models for speed and latency and much, much more.
This month’s edition will feature four lightning talks and one 1-hour presentation including DEMOS and CODE, which means less slides and more hands on.
Clayton Davis
Saumil Dave
Brennan Smith
Juha Kiili
Ed Shee
Python is used daily by millions of engineers, scientists, analysts, and other people trained in fields outside of computer science and software engineering. If you are one of those people, you probably appreciate that Python helps you get your job done. Still, you may wonder what the “real” software engineers might know that would help you do an even better job.
In this talk, we discuss the career journey of Matt Kramer, Sr. Software Engineer at Anaconda, from science and engineering into Python software development; and how the tools and practices of software engineering would have helped his original workflows as a domain expert. He’ll show you how version control, packaging, and automated testing aren’t just for software engineers; they can help everyone tame their workflows. He’ll also share some Python tools from HoloViz.org that let you easily share your results with other stakeholders, without spending all your time writing plotting code and without having to become (or hire!) a front-end engineer.
Presented by
Matt Kramer
Matthew Kramer is a Senior Software Engineer at Anaconda. Previously he spent eight years working in the Energy sector for Chevron as an engineer and leading a team in Chevron’s Surface Data and Insights Platform. Prior to joining the industry, he received his BSE, MSE, and Ph.D. in Naval Architecture & Marine Engineering from the University of Michigan. During the Ph.D. program, he studied the fluid-structure interaction of highly-flexible membrane structures interfacing with the ocean surface (hovercraft skirts). This work involved several disciplines related to computing, including Computational Fluid Dynamics and Finite Element Methods. During this time (c. 2011), he began using Python for data processing and automation of engineering analysis workflows.
Your data team migrated to Snowflake. Congratulations! What happens next?
Snowflake has revolutionized what’s possible for data analytics and engineering, but it takes much more than lightning-fast queries and flexible compute to drive results. At the end of the day, if you can’t operationalize and trust your data, your best laid plans for innovation and access are all for naught.
Join Monte Carlo and Snowflake to learn how best-in-class data teams are making the most of their Snowflake adoption with data quality, governance, and reliability in mind. We’ll demonstrate the critical features and functionalities in the Snowflake Data Cloud and Monte Carlo’s end-to-end data observability platform that will help you maximize impact with Snowflake by giving your company confidence in the reliability of your data and the products it powers.
Topics include:
- Snowflake Data Quality metrics
- Automated end-to-end lineage from Snowflake to the BI layer
- Data discovery
- Data auditing
- PPI classification
- How to set SLAs, SLIs, and SLOs for Snowflake data pipelines
- Operational analytics for your data platform
- And much more
Presented by
Keith Smith
Keith Smith works as a Senior Partner Sales Engineer at Snowflake, helping Technology Partners integrate their solutions so our customers can easily get data into Snowflake and insights out of Snowflake by creating a single copy of data for their cloud data analytics strategy. Prior to this role, he worked as a Senior Solution Architect within Snowflake Professional Services. After leading several Quick Start client engagements, he worked with several large clients focusing on SQL optimization for various financial reporting workloads moving to Snowflake, where he gained a new appreciation for query profile diagrams, “most expensive nodes”, micro-partitions, and pruning statistics. Keith holds degrees from Brigham Young University (BS in Business Management, MIS emphasis) and Purdue University (MBA)
Presented by
Glen Willis
Experienced Solutions Architect with a demonstrated history of success working in SAAS, specifically the Analytics space. Skilled in Python, JQL, Analytics, and bridging the gap between customer feedback and Product. Master’s degree in Product Development Engineering from University of Southern California.
As companies invest more heavily in AI/ML projects, teams will need to move rapidly from experimentation into production. But even with new advances in technology and processes, many teams struggle to get their projects into the later stages of the Machine Learning Lifecycle.
Join us as representatives from the AI Infrastructure Alliance (AIIA), Pachyderm, Superb AI, and WhyLabs discuss the big trends in MLOps, useful strategies for accelerating model development, and best practices all teams can incorporate.
Presented by
Lee Baker
Lee is the General Secretary at the AI Infrastructure Alliance. Based out of the UK, he is responsible for the proceedings and growth of the AI Infrastructure Alliance (AIIA) as well as working at Pachyderm to deploy programs and initiatives that enable customer-facing teams to execute the core aspects of their jobs more effectively. When not shuttling his 3 children around, he can most often be found cycling, running and swimming around England’s South Coast.
Jimmy Whitaker
Jimmy Whitaker is the Data Science Evangelist at Pachyderm. He focuses on creating a great data science experience and sharing best practices for how to use Pachyderm. When he isn’t at work, he’s either playing music or trying to learn something new, because “You suddenly understand something you’ve understood all your life, but in a new way.”
Danny Leybzon
Danny D. Leybzon has worn many hats, all of them related to data. He studied computational statistics at UCLA, before becoming first an analyst and then a product manager at a big data platform named Qubole. He went on to be the primary field engineer for data science and machine learning at Imply, before taking on his current role as MLOps Architect at WhyLabs. He has worked to evangelize machine learning best practices, talking on subjects such as distributed deep learning, productionizing machine learning models, automated machine learning, and lately has been talking about AI observability and data logging. When Danny’s not researching, practicing, or talking about data science, he’s usually doing one of his numerous outside hobbies: rock climbing, backcountry backpacking, skiing, etc.
James Le
James Le currently runs Data Relations at Superb AI, a Series A ML data management startup. As part of his role, James executes content and partnership initiatives – while working cross-functionally with growth, product, customer success, sales, marketing, and community functions to drive Go-To-Market strategy.
Before joining Superb AI, he completed his Computer Science Master’s degree at RIT, where his research thesis lies at the intersection of deep learning and recommendation systems. Outside of work, he is highly active in the broader data and ML community – writing data-centric blog posts, hosting a data-focused podcast, and organizing in-person community events.
Recognizing entities is a fundamental step towards understanding a piece of text – but entities alone only tell half the story. The other half comes from explaining the relationships between entities. Spark NLP for Healthcare includes state-of-the-art (SOTA) deep learning models that address this issue by semantically relating entities in unstructured data.
John Snow Labs has developed multiple models utilizing BERT architectures with custom feature generation to achieve peer-reviewed SOTA accuracy on multiple benchmark datasets. This session will shed light on the background and motivation behind relation extraction, techniques, real-world use cases, and practical code implementation.
Presented by
Hasham Ul Haq
Hasham Ul Haq is a Data Scientist at John Snow Labs, and an AI scholar and researcher at PI School of AI. During his carrier, he has worked on numerous projects across various sectors, including healthcare. At John Snow Labs, his primary focus is to build scalable and pragmatic systems for NLP, that are both, production-ready, and give SOTA performance. In particular, he has been working on Natural Language Inference, disambiguation, Named Entity Recognition, and a lot more! Hasham also has an active research profile with a publications in NeurIPS, AAAI, and multiple scholarship grants and affiliations.
Prior to John Snow Labs, he was leading search engine and knowledge base development at one of Europe’s largest telecom providers. He has also been mentoring startups in computer vision by providing trainings and designing ML architectures
Reliable and trustworthy data is critical for any data-driven company, but it can be a big challenge when there are multiple teams each producing, transforming, and consuming data from one another simultaneously. Data teams are often caught in the middle, struggling to keep everyone happy. SLAs can remedy this challenge, serving as a powerful tool for aligning on the definition of quality and who’s responsible for addressing issues when they arise.
In this webinar, Kyle Kirwan, CEO of Bigeye and Uber’s first metadata product manager, will discuss the power of data SLAs, and how data teams can use them to align with their stakeholders and increase reliability.
In this presentation, find out:
- The challenges posed when data quality hasn’t been quantified
- How to build SLAs for data including creating SLIs and SLOs
- Using SLAs as a prioritization tool
- Measuring success and using SLAs to communicate reliability
Presented by
Kyle Kirwan
Bio Coming Soon.
DataSecOps is an evolution in the way organizations treat security as part of their data operations. It is an understanding that security should be a continuous part of the data operations processes and not an afterthought.
As companies grow and data users expand, the need to secure sensitive personal information, including key areas such as health and financial data, becomes increasingly critical. To ensure this data is as safe as possible, security needs to be embedded into core data operations. Without this component, companies use ad-hoc processes and tools, leaving sensitive user data open to breaches from insiders, whether intentional or not.
The session will cover the best practices in securing sensitive personal, health and financial data with DataSecOps from hyper-growth startups including Wealthsimple, Canada’s largest fintech, ActiveCampaign, B2B SaaS Unicorn and Innovaccer, the healthcare technology disruptor.
Join Satori and discover how they created the first DataSecOps platform, which streamlines data access by automating access controls, security and compliance for the modern data infrastructure; and learn:
- The key elements and benefits of DataSecOps
- Why you should implement DataSecOps in your organization
- Key industries that can benefit from a DataSecOps program
- How a robust DataSecOps platform can ensure your data is secure and compliant
- DataSecOps predictions for 2022 and beyond
Presented by
Ben Herzberg, Chief Scientist at Satori and author of "Snowflake Security: Securing Your Snowflake Data Cloud" (Apress, 2021)
Finding patterns and matching strategies are well-known NLP procedures to extract information from text.
Spark NLP library has two annotators that can use these techniques to extract relevant information or recognize entities of interest in large-scale environments when dealing with lots of documents from medical records, web pages, or data gathered from social media.
In this talk, we will see how to retrieve the information we are looking for by using the following annotators:
- Entity Ruler, an annotator available in open-source Spark NLP.
- Contextual Parser, an annotator available only in Spark NLP for Healthcare.
- In addition, we will enumerate use cases where we can apply these annotators.
After this webinar, you will know when to use a rule approach to extract information from your data and the best way to set the available parameters in these annotators.
Presented by
Danilo Burbano
Danilo Burbano is a Software and Machine Learning Engineer at John Snow Labs. He holds an MSc in Computer Science and has 12 years of commercial experience.
He has previously developed several software solutions over distributed system environments like microservices and big data pipelines across different industries and countries. Danilo has contributed to Spark NLP for the last 4 years. He is now working to maintain and evolve the Spark NLP library by continuously adding state-of-the-art NLP tools to allow the community to implement and deploy cutting-edge large-scale projects for AI and NLP.
Data scientists want to learn, grow, and get things done. To better serve them, it’s time to stop highlighting how awesome your products and services are and start providing experiences that actually help your audience.
There are effective ways to attract and engage data scientists: community building and experiential marketing. Join this webinar to learn how to build a global community and memorable experiences that resonate with your audience, help your industry grow faster, and increase your bottom-line revenue – based on a real-word case study.
David Talby (CTO at John Snow Labs), Anna Anisin (CEO and Founder at Formulatedby) and Esther Rietmann (Director of Content and Programming at Formulatedby) will talk about the journey of building the largest online NLP community and event series, leading to 5x customer growth in one year.
Takeaways:
- The 3 pillars of successful community building in practice
- Building an audience with open-source software and content marketing
- Creating memorable experiences and long-term relationships
- What to expect when executing a community building strategy (with real numbers!)
Join us on Wednesday, February 2, 2022 from 2:00-2:45 pm ET for this interactive webinar and learn how successful community building initiatives can be a real game changer in your organization’s marketing strategy. Includes live Q&A with the speakers and access to on-demand recording.
Presented by
Anna Anisin
Anna Anisin is an Entrepreneur with 15+ years in data driven digital marketing experience, she was named a Tech Industry Insider by CNN. Anna raised $10 million in Series A funding and oversaw two acquisitions in her career. She specializes in enterprise B2B and experiential marketing. Currently she’s running a boutique experiential marketing firm, Formulated.by. Anna is also an advocate for women in tech and a contributor on data science and marketing at Forbes.
Presented by
David Talby
David Talby is the Chief Technology Officer at John Snow Labs, helping companies apply artificial intelligence to solve real-world problems in healthcare and life science. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise.
He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK.
David holds a Ph.D. in Computer Science and Master’s degrees in both Computer Science and Business Administration. He was selected by CV as the Most Influential CTO of the Year 2019 in Artificial Intelligence.
Presented by
Esther Rietmann
Esther Rietmann has worked in B2B marketing for AI and data science companies for 5+ years, gaining experience in building communities, creating outstanding content and engaging audiences through memorable brand experiences. As the Director of Content and Programming at Formulated.by and Data Science Salon, she is passionate about bringing data scientists together and, especially, empowering women in data. In addition to curating content, she also manages the event speakers and community for the Data Science Salon; and is key to collaborating with Formulated.by’s partners to build strong, ongoing relationships with their customers.
As 2021 comes to a close, the challenges of productionizing and scaling team’s machine learning projects still remain at large. With all the big advancements in AI/ML, what are some actionable learnings for your team?
In this webinar join us as we talk about
- Noteworthy highlights in the AI/ML space from 2021
- Upcoming trends in ML/AI for 2022
- Market movement (AIIA member valuations, acquisition and consolidation)
- The shift to ‘Data Monitoring’ from ‘Model monitoring’ OR the evolution of ‘DataOps’ – reference Andrew Ng
- Bedfellows for 2022: How hardware will meet ISV e.g. Nvidia/HPE and the AIIA ecosystem.
- Winners & Losers – Best of breed vs Production ML on ‘so-called’ E2E solutions. The new data providers and the role of analytics.
Presented by
Lee Baker, General Secretary at the AI Infrastructure Alliance
Lee is the General Secretary at the AI Infrastructure Alliance. Based out of the UK, he is responsible for the proceedings and growth of the AI Infrastructure Alliance (AIIA) as well as working at Pachyderm to deploy programs and initiatives that enable customer-facing teams to execute the core aspects of their jobs more effectively. When not shuttling his 3 children around, he can most often be found cycling, running and swimming around England’s South Coast.
Pharmaceutical Companies who conduct clinical trials, looking to get new treatments to market as quickly as possible, possess a high volume of documents. Millions of documents can be created as part of one trial and are stored in a document management system. In case migrating these documents to a new system is needed – for example, when a pharma company acquires the rights to a drug or trial – all these documents must often be read manually in order to classify them and extract metadata that is legally required and must be accurate. Traditionally, this migration is a long, complex, and labor-intensive process.
We present a solution based on the natural language processing (NLP) system which provides:
- Speed – 80% reduction of manual labor and migration timeline, proven in major real-world projects
- State of the art accuracy – based on Spark NLP for Healthcare, integrated in a human-in-the-loop solution
- End-to-end, secure and compliant solution – Air-gap deployment, GxP and GAMP 5 validated
We will share lessons learned from an end-to-end migration process of the trial master file in Novartis.
Presented by
Jiri Dobes - Head of Solutions at John Snow Labs
Jiri Dobes is the Head of Solutions in John Snow Labs. He has been leading the development of machine learning solutions in healthcare and other domains for the past five years. Jiri is a PMP certified project manager. His previous experience includes delivering large projects in the power generation sector and consulting for the Boston Consulting Group and large pharma. Jiri holds a Ph.D. in mathematical modeling.
Extracting data from unstructured documents is a very common requirement in many verticals from finance and insurance to pharma and healthcare. Recent research has proven that Deep Learning offers impressive results in this task when models are trained on significant data.
However, getting high-quality data involves a lot of manual effort. An annotation project is defined, with annotation guidelines and examples, documents are imported, tasks are distributed among domain expert annotators, a manager tracks team statuses and performance, inter annotators agreement is reached, and the resulting annotations are exported in some standard formats. At enterprise-scale, the management complexity grows drastically due to the volume of projects, tasks, and users.
John Snow Labs’ Annotation Lab is a free annotation tool that has already been deployed and used by large-scale enterprises for more than two years. This webinar presents how you can exploit the tool’s capabilities to easily manage any complex annotation project. Also, it illustrated how models can be trained automatically without writing a line of code and how any Spark NLP model can be used to preannotate documents so that projects are bootstrapped and your domain experts never have to start annotating documents from scratch but just correct and improve the models.
Presented by
Nabin Khadka, Lead Developer - Annotation Lab at John Snow Labs
Nabin Khada leads the team building the Annotation Lab at John Snow Labs. He has 7 years of experience as a software engineer, covering a broad range of technologies from web & mobile apps to distributed systems and large-scale machine learning.
With algorithms making more and more decisions in our lives, from who gets a job, to who gets hired and fired, and even who goes to jail, it’s more critical than ever to get our intelligent systems talking to us so people can step in when things go wrong.
In the coming decade, organizations will face incredible pressure from regulators and the general public and that means every team needs a plan in place to find and fix mistakes fast or risk PR nightmares and financial disasters.
Join us to learn how to build an AI Red Team to deal with everything from edge cases to outright AI breakdowns, while getting you ready to embrace the latest breakthroughs in explainable AI tomorrow.
Presented by
Dan Jeffries, Chief Technology Evangelist at Pachyderm
Dan Jeffries is Chief Technology Evangelist at Pachyderm. He’s also an author, engineer, futurist, pro blogger and he’s given talks all over the world on AI and cryptographic platforms. He’s spent more than two decades in IT as a consultant and at open source pioneer Red Hat.
The knowledge graph represents a collection of connected entities and their relations. A knowledge graph that is fueled by machine learning utilizes natural language processing to construct a comprehensive and semantic view of the entities. A complete knowledge graph allows answering and search systems to retrieve answers to given queries. In this study, we built a knowledge graph using Spark NLP models and Neo4j. The marriage of Spark NLP and Neo4j is very promising for creating clinical knowledge graphs to do a deeper analysis, Q&A tasks, and get insights.
Presented by
Ali Emre Varol, Data Scientist at John Snow Labs
Ali Emre Varol is a data scientist working on Spark NLP for Healthcare at John Snow Labs with a decade of industry experience. He has previously worked as a software engineer to develop ERP solutions and led teams and projects building machine learning solutions in a variety of industries. He is also pursuing his Ph.D. in Industrial Engineering at Middle East Technical University and holds an MS degree in Industrial Engineering.
Whether it’s for an executive dashboard or an ML model, reliable data is critical for the applications that make data-driven companies tick. But when it comes to creating reliable data pipelines, where do you start and what tools and processes do you need in place?
Egor Gryaznov is the co-founder and CTO of Bigeye and was one of the first data engineers at Uber. Egor will draw from his experience supporting thousands of internal users and mission-critical workloads at Uber to provide an actionable guide to data pipeline reliability.
In this presentation, you will learn:
- How to approach building data pipelines for a data application
- What tools you will encounter in development and what you need to know about each
- How to create SLAs to better align with stakeholders
- How to manage the data that your application creates
Presented by
Egor Gryaznov, CTO of Bigeye
Egor Gryaznov is co-founder and CTO of Bigeye. Prior to co-founding Bigeye, Egor was one of the early engineers at Uber where he scaled the company’s first data warehouse, supporting thousands of internal users and mission-critical workloads. Egor wrote ELT frameworks used by Uber analysts to generate complete data pipelines for KPI reporting from configuration. He developed custom tooling to automate backfills, replication, and critical data warehousing tasks.
If you are interested in learning the basic concepts of NLP, this session is for you. In this short webinar we will start with what is Natural Language Processing, why and how to turn it into data that a computer can process, common NLP tasks, and enterprise applications.
Things you’ll learn:
- What is natural language
- Not every language is natural
- Natural language processing
- Converting text to numbers
- Basic tasks
- Combining tasks for applications
- Enterprise applications
Presented by
Kinga Parrott, AI Strategist at IBM
Kinga Parrott is an advocate for “Human Centered AI” and an AI Strategist at IBM. She is also a mindfulness enthusiast and loves hiking.
In recent years, enterprise AI initiatives have made great strides in solving the challenges of training massive, distributed computational models. Data wrangling, experimentation management, GPU resource allocation…these challenges have spawned an ever-growing market of new tools and considerable investment. But despite all this, the ability to productize AI is still stunted in most enterprises.
In our survey of more than 200 data scientists and MLOps/IT experts, a majority of AI/ML models still aren’t making it to production. AI/ML teams are now under pressure to optimize and manage AI inference workloads in production and deliver a return on investment.
In this webinar, we will walk through the distinct characteristics of each stage in the ML lifecycle and their computational requirements. We’ll discuss solutions to improve throughput and reduce latency, and finally, we’ll show how one organization built an efficient inference platform on top of Kubernetes to support their scaling AI initiatives.
“Rapid AI development is what this is all about for us. What Run:AI helps us do is to move from a company doing pure research, to a company with results in production.”
Siddharth Sharma, Sr. Research Engineer, Wayve
Presented by
Dr. Ronen Dar
As Run:AI CTO, Dr. Ronen Dar manages research and product roadmap for Run:AI, a start-up he co-founded in 2018. Previously, Ronen worked at Bell Labs, Anobit Technologies (acquired by Apple), and in the algorithms research group in Intel’s mobile wireless division. He is the co-author of many patents in the fields of communication, coding and signal processing. Ronen received his B.Sc., M.Sc. and Ph.D. degrees from Tel Aviv University.
New to Graph Databases or to Advanced Analytics? Don’t know how to use graph algorithms for Machine Learning? No worries!
Join this workshop, and we will teach you the concepts of TigerGraph Cloud, graph databases, advanced graph analytics, and machine learning algorithms – all in under 90 minutes!
This hands-on workshop will cover:
- Graph Fundamentals
- Graph Use-Cases
- Introduction to TigerGraph Cloud
- Integrating Python with TigerGraph Cloud
- Feature Generation for Supervised Machine Learning. Bonus algorithms will be shared!
Presented by
Jeffrey Tyzzer
Jeff has over 25 years’ experience in database and data management and has worked at companies such as Apple, Oracle, and Accenture in both solution consulting and technology delivery. At TigerGraph, Jeff helps customers understand TigerGraph’s graph-native distributed database and how they can successfully apply it to their most pressing use cases.
Jeff has a B.S. in Information Systems and an M.S. in Library and Information Science.
Vladimir Slesarev
Vlad has worked with some of TigerGraph’s largest financial services customers strategizing ML and graph analytics approaches. He is intimately familiar with the complexities and existing limitations of deep data exploration, having worked on complex government data initiatives with the State Department and customer data integrations and development for Deloitte LLP.
Vlad has a BS in Computer and Information Science from the University of Maryland College Park.
Join DSS and IBM to learn how the key to a successful data science practice is storytelling.
Do you want to step into the data science profession? Showcasing your work through a portfolio is as important as building the required skills. After all, the key to a successful data science practice is storytelling.
In this session, you will learn:
What is GitHub?
What is a repository and how do I create one?
Where to develop your project?
Ideas for projects
README tips
Share your projects on social media (LinkedIn, Medium, Dev.to)
And more!
Join us for this valuable webinar and learn more about Florida Hacks, organized by the University of Florida in partnership with IBM. We’re calling on the country’s brightest minds to join a virtual innovation challenge aimed at addressing the Sunshine State’s most pressing conservation issues.
Innovators of all experience levels are invited to participate in the hackathon, where students will be asked to help combat the significant observable effects occurring in the environment. Ideation: Aug 25-Sep 10, Hack: Sep 13-Oct 22. Click here for details and registration.
Presented by
Saishruthi Swaminathan
Saishruthi Swaminathan is an advocate for Trustworthy AI and an Advisory Data Scientist at IBM. She has a Masters in Electrical Engineering specializing in Data Science and a Bachelor’s degree in Electronics and Instrumentation. Her passion is to dive deep into the ocean of data, extract insights, and use AI for social good. Previously, she worked as a Software Developer. She is on a mission to spread the knowledge and experience she acquired in her learning process. She also leads an education for rural children initiative and speaks at meetups focussing on Trustworthy AI and Women’s Empowerment.
Extracting data formatted as a table (tabular data) is a common task — whether you’re analyzing financial statements, academic research papers, or clinical trial documentation. Table-based information varies heavily in appearance, fonts, borders, and layouts. This makes the data extraction task challenging even when the text is searchable – but more so when the table is only available as an image.
This webinar presents how Spark OCR automatically extracts tabular data from images. This end-to-end solution includes computer vision models for table detection and table structure recognition, as well as OCR models for extracting text & numbers from each cell. The implemented approach provides state-of-the-art accuracy for the ICDAR 2013 and TableBank benchmark datasets.
Presented by
Mykola Melnyk
Mykola Melnyk is a senior Scala, Python, and Spark software engineer with 15 years of industry experience. He has led teams and projects building machine learning and big data solutions in a variety of industries – and is currently the lead developer of the Spark OCR library at John Snow Labs.
It’s happened again. You built another AI model that will never see the light of day because it won’t make it past the AI “valley of death” – the crossover of model development to model deployment across your enterprise. The handoff between data science and engineering teams is fraught with friction, outstanding questions around governance and accountability, and who is responsible for different parts of the pipeline and process. Even worse? The patchwork approach when building an AI pipeline leaves many organizations open to risks because of a lack of a holistic approach to security and monitoring.
Join us to learn about approaches and solutions for configuring a ModelOps pipeline that’s right for your organization. You’ll discover why it’s never too early to plan for operationalization of models, regardless of whether your organization has 1, 10, 100, or 1,000 models in production.
The discussion will also reveal the merits of an open container specification that allows you to easily package and deploy models in production from everywhere. Finally, new approaches for monitoring model drift and explainability will be revealed that will help manage expectations with business leaders all through a centralized AI software platform called Modzy®.
Presented by
Clayton Davis - Head of Data Science at Modzy
Clayton Davis is Head of Data Science at Modzy where he oversees model development, operational data science capability development, and AI research. Prior to his role at Modzy, Mr. Davis spent over 15 years leading data science work for commercial and government organizations. His experience has spanned the data science spectrum, from analytic macro creation to cloud based deep learning research and petabyte scale big data processing on Hadoop clusters. He has a passion for solving complex puzzles and holds a graduate degree in Physics.
Today, NLP (Natural Language Processing) algorithms power a wide range of intelligent applications from smart devices, customer service chatbots, document processing to search, and targeting. It’s hard to develop a state-of-the-art NLP application and it’s even harder to monitor and guarantee quality and consistency in production.
With the models making key product and business decisions it’s imperative that we have access to specialized production monitoring tools and techniques designed with the complexity and unique approaches of NLP algorithms in mind. For example, to know if your production model is making inaccurate predictions requires ground truth which is very complex and time-consuming to obtain as you consider languages, geographies, context, emotions, and other NLP nuances. On top of that ground truth for NLP is ambiguous and not always black and white.
In this talk we will discuss why monitoring your NLP models is a fundamentally complex problem and key considerations of a model monitoring system. Finally, we will dig into a specific NLP use case and demonstrate how we can leverage the new Verta Model Monitoring capability to easily monitor any NLP model performance, identify model/data drifts and errors, segment model inputs, and outputs by cohorts, and perform root cause analysis.
Presented by
Meeta Dash - VP Product at Verta
As VP Product at Verta Meeta Dash is building MLOps tools to help data science teams track, deploy, operate and monitor models and bring order to Enterprise AI/ML chaos. Prior to Verta, Meeta held several product leadership roles in Appen, Figure Eight, Cisco Systems, Tokbox/Telefonica and Computer Associates building ML data platform, Voice & Conversation AI products and Analytics/Operational Monitoring Tools. Meeta has an MBA Degree from UC Davis and an engineering degree from National Institute of Technology, India.
Spark NLP is the most widely used NLP library in the enterprise, thanks to implementing production-grade, trainable, and scalable versions of state-of-the-art deep learning & transfer learning NLP research. It is also Open Source with a permissive Apache 2.0 license that officially supports Python, Java, and Scala languages backed by a highly active community and JSL members.
Spark NLP library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, multi-class and multi-label text classification, sentiment analysis, emotion detection, unsupervised keyword extraction, and state-of-the-art Transformers such as BERT, ELECTRA, ELMO, ALBERT, XLNet, and Universal Sentence Encoder.
The latest release of Spark NLP 3.0 comes with over 1100+ pretrained models, pipelines, and Transformers in 190+ different languages. It also delivers massive speeds up on both CPU & GPU devices while extending support for the latest computing platforms such as new Databricks runtimes and EMR versions.
The talk will focus on how to scale Apache Spark / PySpark applications in YARN clusters, use GPU in Databricks new Apache Spark 3.x runtimes, and manage large-scale datasets in resource-demanding NLP applications efficiently. We will share benchmarks, tips & tricks, and lessons learned when scaling Spark NLP.
Presented by
Maziyar Panahi - Spark NLP Lead at John Snow Labs
Maziyar Panahi is a Senior Data Scientist and Spark NLP Lead at John Snow Labs with over a decade long experience in public research. He is a senior Big Data engineer and a Cloud architect with extensive experience in computer networks and software engineering. He has been developing software and planning networks for the last 15 years. In the past, he also worked as a network engineer in high-level places after he completed his Microsoft and Cisco training (MCSE, MCSA, and CCNA).
He has been designing and implementing large-scale databases and real-time Web services in public and private Clouds such as AWS, Azure, and OpenStack for the past decade. He is one of the early adopters and main maintainers of the Spark NLP library. He is currently employed by The French National Centre for Scientific Research (CNRS) as a Big Data engineer and System/Network Administrator working at the Institute of Complex Systems of Paris (ISCPIF).
TigerGraph Cloud, a cloud-agnostic database-as-a-service offering, enables users to leverage graph analytics on AWS, Azure, and GCP. TigerGraph Cloud not only fully manages clusters, but also provides a use case library via starter kits, which greatly reduces customers’ time to business value and decreases the learning curve to specific graph database use cases such as fraud detection, recommendation, customer 360, AI, and in-database machine learning. Come learn and ask questions from our TigerGraph Cloud Product Team and watch an end-to-end overview of TigerGraph Cloud with a short demonstration that showcases starter kits and other unique functionality.
Presented by
Benyue (Emma) Liu
Emma has a mission to empower developers, data scientists and enterprise data analysts with easier access to graph analytics and features. She is a senior product manager at TigerGraph leading efforts in Kafka integration, cloud, enterprise features, ecosystem connectors. Emma has a background in data management, cloud computing and complex system designs. Prior to TigerGraph, she worked at Oracle and MarkLogic. Emma holds a Bachelor of Science degree from Harvey Mudd College and a Master of Science degree from MIT.
Duc Le
Engineering Manager at TigerGraph
Master in Management Information Systems from Carnegie Mellon University
Lead Developer for TigerGraph Cloud, GraphStudio and Admin Portal
Areas of specialty: Full-stack Development, Cloud, Containers and Connectors.
Travis Oliphant, CEO of OpenTeams and founder of Anaconda, joins Coiled to discuss the adoption of Python in the enterprise, how Open Source is such a productive modern development for businesses, and roadmaps for how the Enterprise can leverage Open Source to deliver business value.
Python is the de facto language for machine learning and AI, and is used across many verticals, including healthcare, finance, retail, and space exploration! What are the key features of Python and its community that made this happen? What gaps remain to even more robust adoption and usage in the Enterprise?
Python was created in 1991 as a better scripting language. In the late 1990s, many scientists and engineers started to use it to write scientific code that “fit their brains”. That community swelled organically to tens of millions of users over the next 20 years despite the lack of consistent and targeted funding. What are the aspects of both the language and the community behind Python that contributed to this? What lessons can be learned about making the most of the many open-source projects, frameworks, and languages that are available today? Despite Python’s success, there are still gaps in features and systems that can be a barrier to Enterprise adoption – what are some of these challenges, and what solutions are being proposed?
After attending, you’ll know:
– How Python and PyData (NumPy, pandas, scikit-learn, and many more!) got started;
– What are key reasons AI and ML practitioners use Python;
– What challenges had to be overcome to get Python adopted in the Enterprise;
– What challenges remain for Python adoption in organizations;
– What companies and service providers are helping the enterprise get more value faster out of their data.
Join us Tuesday, May 25 at 6 pm US Eastern time by signing up here and dive into the wonderful world of unlocking business value with Open Source Software!
Presented by
Travis Oliphant
CEO of Open Teams & founder of Anaconda
Recorded on: May 19th @ 11am ET
In recent years, enterprises have invested heavily in AI and Deep Learning initiatives. The three stages of most AI implementations, data ingestion and preparation, model building and training, and inference in production, require Data Science, IT and Ops teams to work in new and sometimes confounding ways. Adding complexity, many organizations are choosing to build AI infrastructure from the ground up using containers and Kubernetes.
Join Data Science Salon as we host NVIDIA, RUN:AI and The AI Center for Value-Based Healthcare (at King’s College London) who will share best practices for building an optimized AI infrastructure stack using NVIDIA GPUs, Kubernetes and Run:AI.
In the webinar, we will walk through an example of a well-architected AI Infrastructure stack and discuss how Kubernetes and advanced GPU scheduling can be used to maximize efficiency and speed up data science initiatives. We’ll cover how one organization solved the orchestration challenges that enterprises face when they begin to scale.
Speakers
Prof. M. Jorge Cardoso - Associate Professor & Senior Lecturer, King’s College London & CTO, AI Centre for Value-Based Healthcare
The London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare is a consortium of academic, healthcare and industry partners, led by King’s College London. It uses medical images and electronic healthcare data held by the UK National Health Service to train sophisticated computer vision and natural-language processing deep learning algorithms.
Michael Balint, Sr. Product Manager, NVIDIA
NVIDIA’s (NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market and has redefined modern computer graphics, high performance computing and artificial intelligence. The company’s pioneering work in accelerated computing and AI is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others.
Omri Geller - CEO and co-founder of Run:AI
Run:AI provides a cloud-native computer resource management platform for the AI era. With Run:AI, data scientists get access to all of the pooled compute power they need to accelerate AI experimentation – whether on-premises or cloud. The company’s Kubernetes-based platform provides IT and MLOps with real-time visibility and control over scheduling and dynamic provisioning of GPUs – and gains of more than 2X in utilization of existing infrastructure.
In this talk, Jen Burningham, Jorge Castañón and Rakshith Dasenahalli will discuss the applications of Machine Learning and AI Fairness techniques in credit risk models for banking institutions. They will cover the typical use cases and the approaches the Data Science Elite (DSE) team used to address the challenges and necessity in mitigating model bias. This session will introduce the Credit Risk accelerator that was developed based on the experience of the DSE team.
Presented by
Jen Burningham
Jen Burningham is a manager, Senior Data Scientist, and Machine Learning Engineer in the IBM Data Science and AI Elite Team. She has a strong background in healthcare, tech, and startups. Jen is passionate about using AI to improve businesses and lives. She received her Bachelor’s Degree in Applied Mathematics with Concentration in Statistics from Yale University and earned her MBA from Yale School Of Management.
Rakshith Dasenahalli
Rakshith is a Data Scientist with the IBM Data Science Elite (DSE) Team and has 3 years of experience on the team. He has worked with various enterprises across different industries helping them understand and use their data to improve their businesses. He has experience working in Machine Learning, Data Visualization, and AIfairness customer engagements. He holds a M.S. in Electrical Engineering from University of Southern California.
Jorge Castañón
Jorge Castañón hails from Mexico City and received his Ph.D. in Computational and Applied Mathematics from Rice University. He has a genuine passion for data science and machine learning business applications. For 15+ years he has been developing data science and AI models as well as algorithms to solve numerical optimization and regularized inverse problems. At IBM, Jorge is the Lead Data Scientist of the Machine Learning Hub North America, a team that empowers organizations to create business value through data science and AI. In 2020, Jorge was certified as a Level 3 Thought Leader and Distinguished Data Scientist by The Open Group.
RECORDED ON: September 3rd @ 2pm EDT
Euphoria Convenience is big data and predictive analytics perfectly attuned: it means satisfying goods and services arriving even before consumers sense that they want them. Already we see slivers of it on Netflix when the next, curated video begins even before the previous one ends. We experience it on Amazon when suggestions occasionally fulfil an urge we didn’t know we had.
The key to perfected entertainment and service is our personal information, and the underlying human dilemma is about how much of ourselves we should expose to get what we want. In this webinar, real world cases and philosophical thought experiments will help us investigate by exploring questions including: What is privacy? Why is Kim Kardashian the most private person in the world? Is it more satisfying to want and then get, or to get before wanting? Answers are elusive, but the goal is to identify and clarify the ethical dilemmas arising when personal information is gathered and leveraged in the media, entertainment, marketing and retail sectors. This webinar is tailored for professionals in data governance, but open to all those interested in the regulatory and human dilemmas inhabiting privacy and AI powered convenience.
Presented by
James Brusseau
James Brusseau (PhD, Philosophy) is author of books, articles, and digital media in the history of philosophy and ethics. He has taught in Europe, Mexico, and currently at Pace University near his home in New York City. As Director of AI Ethics Site, a research institute currently incubating at Pace University, he explores the human experience of artificial intelligence.
Do you want to build a career in the field of AI? Join the session where 2 IBM data scientists from Data Science and AI Elite Team will share with you their journey to AI, the challenges they faced, interview tips and useful resources, along with their experience applying ML to real-world client problems by leveraging their earlier experiences.
Presented by
Safura Suleymanova - Data Scientist at the IBM Data Science & AI Elite Team
Safura Suleymanova is a Data Scientist at the IBM Data Science and AI Elite Team and develops Machine Learning powered solutions for customers across various industries, including finance, telecom, supply chain, among others. She feels passionate about how data can be used to streamline more routine processes ultimately improving business decisions. To improve existing operational processes at her previous workplaces she started leveraging the data, which led her to the field of Data Science. Safura has a bachelor’s degree in Mathematics, with a specialization in Risk Management and Statistics from University of Waterloo.
Aishwarya Srinivasan - Data Scientist, Data Science Elite Team, IBM Data & AI
Aishwarya joined DSE Team in June 2018 as a Data Scientist. She is a researcher in machine learning and reinforcement learning, and she strives for innovation. During her engagements, she primarily focuses on bringing in machine learning research to create business value.
With her interest in integrating machine learning in finance, she has presented her work to leading Financial firms like Morgan Stanley, Credit Suisse and World Bank. She is very focused on expanding her horizons in the machine learning research community including her recent Patent Award won in 2018 for developing Reinforcement Learning model for Machine Trading. She is a post-graduate in Data Science from Columbia University.
She is an ambassador for Women in Data Science community, originating from Stanford University. She has 50k+ follower base on LinkedIn and actively participates/organizes in events and conferences to inspire budding data scientists. Besides being a data junkie, she is a fitness fanatic who is into martial arts (Krav Maga) and yoga. To spread her knowledge in the space of data science, and to inspire budding Data Scientists, she actively writes blogs related to machine learning in her website https://aishgrt.wixsite.com/thepeculiarunicorn
Python is a powerful programming language that is a good choice for many types of analytics. It is rapidly becoming the language of choice for scientists and researchers of many types. Now by combining a massively parallel (MPP) database like Vertica with Python, you can overcome many scale and analytics challenges that can limit Python users. Join this session to see a live demo of Vertica’s built-in integration with Python to measure and predict key numbers of the COVID-19 pandemic by using historical data on the number of cases and fatalities per state to predict the results for the following day.
Presented by
Badr Ouali, Data Scientist at Vertica
I hate to admit it, but your data is lying to you — and more often than you think. Having clean data with high volume, velocity, and variety doesn’t necessarily protect one from the possibility of reaching faulty conclusions to research questions of interest. Despite what you may have learned in Statistics 101, a significant p-value isn’t always groundbreaking. All data can be coerced and bribed to tell any story; thus, as data practitioners, it’s our duty to be cognizant of the possible pitfalls that abound and how to navigate around common traps — responsibly.
By showcasing enterprise use cases and sample projects in Dataiku’s Data Science Studio, Christopher Peter Makris will address the following questions:
– Is more data always better? How can the inclusion or exclusion of data obfuscate a previously held conclusion?
– Is an observed event truly a trend? How can previously noted behaviors be a marker for the complete opposite behavior in the future?
– Is an association worth my time/money/effort? When do conclusive conclusions lead us not to act on a valid association?
Presented by
Christopher Peter Makris, Lead Data Scientist at Dataiku
Christopher Peter Makris (CPM) is a Lead Data Scientist at Dataiku. With a background in Logic, Discrete Mathematics, & Statistics, he brings experience from both industry and academia. CPM previously stood as Director of Data Science at the NYC Data Science Academy and Executive Director of the Master’s of Statistical Practice Program at Carnegie Mellon University; at both institutions, CPM took pride in professionally developing students, aiding in the redesign of technical curricula, and delivering over 3,000 hours of lectures to graduate audiences. In industry, CPM most recently helped thwart cybercrime as the Director of Security Data Science at Empower Retirement. When not crunching numbers on his computer, CPM is crunching his abs in the studio as he’s an avid dancer, choreographer, and CrossFitter (so, while he considers himself a data scientist, he can only count to 5-6-7-8…)
Deep Learning Frameworks allow for the easy construction of neural networks and have made Python the go-to language for Artificial Intelligence development. This webinar will show how deep learning can be used to help align developers and data scientists to a particular framework as well as demonstrate some examples of how IBM is extending the value of deep learning frameworks via open source projects, and how new advancements in NLP like transformers are changing how and why to use a particular framework.
Presented by
Nick Acosta, IBM
Before becoming an AI Advocate at IBM, Nick studied Computer Science at Purdue University and the University of Southern California, and was an HPC Consultant for Hewlett-Packard in Grenoble, France. He now specializes in machine learning and utilizes it to understand machine learning developers of various communities, startups, and enterprises in order to help them succeed on IBM’s data science platform and in open source. He has a strong interest in open source software and all things Kardashian.
Python is becoming a common language between developers and data scientists, which is why so many of the open-source tools and libraries are built for Python. However, navigating through the hundreds of thousands of available packages for data science and machine learning (DS/ML) can be a daunting task.
Are you ready to benefit from Python for data science? Join us to discover new and innovative ways to enhance your models, algorithms, and visualizations.
During the webinar, we’ll discuss:
- Popular open-source tools for DS/ML
- Use cases
- Helpful resources to get started with data science with Python
Presented by
Stan Seibert - Sr. Director, Community Innovation at Anaconda
Stan leads the Community Innovation team at Anaconda, where his work focuses on high performance GPU computing and designing data analysis, simulation, and processing pipelines.He is a longtime advocate of the use of Python and GPU computing for research. Prior to joiningAnaconda, Stan served as Chief Data Scientist at Mobi, where he worked on vehicle fleet tracking and route planning.Stan received a PhD in experimental high energy physics from the University of Texas at Austin,and performed research at Los Alamos National Laboratory, University of Pennsylvania, and theSudbury Neutrino Observatory.
Learn explainable workflows using open source software, semi-automated pipelines, and Python! From data ingestion, cleaning, piping, and modeling, explainability and trust are at the forefront of enterprise data science initiatives. In this talk, learn the best practices for codifying and relaying explainable data science to stakeholders, management, and the end user in a reproducible and portable fashion.
Presented by
Austin Eovito - Data Scientist, IBM
Austin is a Data Scientist on the Technical Marketing and Evangelism team in San Francisco, California. As a recent graduate student of Florida State University, Austin is focused on the balancing the bleeding-edge research produced by academia and the tools used in applied data science. His Masters thesis was on White Collar Crime using Time-aware Joint-Topic-Sentiment Analysis (TTS). Austin leads IBM’s Data Science Masterclass on AI Explainability and currently resides in San Francisco, with his fiance, dog, and two cats.
Interested in putting your brand in front of the most diverse data science community in the space?