recorded on: TUEsday, September 14 @ 2:00 p.m ET
In recent years, enterprise AI initiatives have made great strides in solving the challenges of training massive, distributed computational models. Data wrangling, experimentation management, GPU resource allocation…these challenges have spawned an ever-growing market of new tools and considerable investment. But despite all this, the ability to productize AI is still stunted in most enterprises.
In our survey of more than 200 data scientists and MLOps/IT experts, a majority of AI/ML models still aren’t making it to production. AI/ML teams are now under pressure to optimize and manage AI inference workloads in production and deliver a return on investment.
In this webinar, we will walk through the distinct characteristics of each stage in the ML lifecycle and their computational requirements. We’ll discuss solutions to improve throughput and reduce latency, and finally, we’ll show how one organization built an efficient inference platform on top of Kubernetes to support their scaling AI initiatives.
“Rapid AI development is what this is all about for us. What Run:AI helps us do is to move from a company doing pure research, to a company with results in production.”
Siddharth Sharma, Sr. Research Engineer, Wayve
Dr. Ronen Dar
As Run:AI CTO, Dr. Ronen Dar manages research and product roadmap for Run:AI, a start-up he co-founded in 2018. Previously, Ronen worked at Bell Labs, Anobit Technologies (acquired by Apple), and in the algorithms research group in Intel’s mobile wireless division. He is the co-author of many patents in the fields of communication, coding and signal processing. Ronen received his B.Sc., M.Sc. and Ph.D. degrees from Tel Aviv University.
Recorded on: Tuesday, august 31 @ 3:00 p.m ET
New to Graph Databases or to Advanced Analytics? Don’t know how to use graph algorithms for Machine Learning? No worries!
Join this workshop, and we will teach you the concepts of TigerGraph Cloud, graph databases, advanced graph analytics, and machine learning algorithms – all in under 90 minutes!
This hands-on workshop will cover:
- Graph Fundamentals
- Graph Use-Cases
- Introduction to TigerGraph Cloud
- Integrating Python with TigerGraph Cloud
- Feature Generation for Supervised Machine Learning. Bonus algorithms will be shared!
Jeff has over 25 years’ experience in database and data management and has worked at companies such as Apple, Oracle, and Accenture in both solution consulting and technology delivery. At TigerGraph, Jeff helps customers understand TigerGraph’s graph-native distributed database and how they can successfully apply it to their most pressing use cases.
Jeff has a B.S. in Information Systems and an M.S. in Library and Information Science.
Vlad has worked with some of TigerGraph’s largest financial services customers strategizing ML and graph analytics approaches. He is intimately familiar with the complexities and existing limitations of deep data exploration, having worked on complex government data initiatives with the State Department and customer data integrations and development for Deloitte LLP.
Vlad has a BS in Computer and Information Science from the University of Maryland College Park.
recorded on: Tuesday, August 24 @ 3:00 p.m ET
Join DSS and IBM to learn how the key to a successful data science practice is storytelling.
Do you want to step into the data science profession? Showcasing your work through a portfolio is as important as building the required skills. After all, the key to a successful data science practice is storytelling.
In this session, you will learn:
What is GitHub?
What is a repository and how do I create one?
Where to develop your project?
Ideas for projects
Share your projects on social media (LinkedIn, Medium, Dev.to)
Join us for this valuable webinar and learn more about Florida Hacks, organized by the University of Florida in partnership with IBM. We’re calling on the country’s brightest minds to join a virtual innovation challenge aimed at addressing the Sunshine State’s most pressing conservation issues.
Innovators of all experience levels are invited to participate in the hackathon, where students will be asked to help combat the significant observable effects occurring in the environment. Ideation: Aug 25-Sep 10, Hack: Sep 13-Oct 22. Click here for details and registration.
Recorded on: August 11th @ 2:00 p.m ET
Extracting data formatted as a table (tabular data) is a common task — whether you’re analyzing financial statements, academic research papers, or clinical trial documentation. Table-based information varies heavily in appearance, fonts, borders, and layouts. This makes the data extraction task challenging even when the text is searchable – but more so when the table is only available as an image.
This webinar presents how Spark OCR automatically extracts tabular data from images. This end-to-end solution includes computer vision models for table detection and table structure recognition, as well as OCR models for extracting text & numbers from each cell. The implemented approach provides state-of-the-art accuracy for the ICDAR 2013 and TableBank benchmark datasets.
Mykola Melnyk is a senior Scala, Python, and Spark software engineer with 15 years of industry experience. He has led teams and projects building machine learning and big data solutions in a variety of industries – and is currently the lead developer of the Spark OCR library at John Snow Labs.
Recorded on: Thursday, June 24 @ 3:00 p.m ET
It’s happened again. You built another AI model that will never see the light of day because it won’t make it past the AI “valley of death” – the crossover of model development to model deployment across your enterprise. The handoff between data science and engineering teams is fraught with friction, outstanding questions around governance and accountability, and who is responsible for different parts of the pipeline and process. Even worse? The patchwork approach when building an AI pipeline leaves many organizations open to risks because of a lack of a holistic approach to security and monitoring.
Join us to learn about approaches and solutions for configuring a ModelOps pipeline that’s right for your organization. You’ll discover why it’s never too early to plan for operationalization of models, regardless of whether your organization has 1, 10, 100, or 1,000 models in production.
The discussion will also reveal the merits of an open container specification that allows you to easily package and deploy models in production from everywhere. Finally, new approaches for monitoring model drift and explainability will be revealed that will help manage expectations with business leaders all through a centralized AI software platform called Modzy®.
Clayton Davis - Head of Data Science at Modzy
Clayton Davis is Head of Data Science at Modzy where he oversees model development, operational data science capability development, and AI research. Prior to his role at Modzy, Mr. Davis spent over 15 years leading data science work for commercial and government organizations. His experience has spanned the data science spectrum, from analytic macro creation to cloud based deep learning research and petabyte scale big data processing on Hadoop clusters. He has a passion for solving complex puzzles and holds a graduate degree in Physics.
recorded on: Wednesday, June 23 @ 1:00 p.m ET
Today, NLP (Natural Language Processing) algorithms power a wide range of intelligent applications from smart devices, customer service chatbots, document processing to search, and targeting. It’s hard to develop a state-of-the-art NLP application and it’s even harder to monitor and guarantee quality and consistency in production.
With the models making key product and business decisions it’s imperative that we have access to specialized production monitoring tools and techniques designed with the complexity and unique approaches of NLP algorithms in mind. For example, to know if your production model is making inaccurate predictions requires ground truth which is very complex and time-consuming to obtain as you consider languages, geographies, context, emotions, and other NLP nuances. On top of that ground truth for NLP is ambiguous and not always black and white.
In this talk we will discuss why monitoring your NLP models is a fundamentally complex problem and key considerations of a model monitoring system. Finally, we will dig into a specific NLP use case and demonstrate how we can leverage the new Verta Model Monitoring capability to easily monitor any NLP model performance, identify model/data drifts and errors, segment model inputs, and outputs by cohorts, and perform root cause analysis.
Meeta Dash - VP Product at Verta
As VP Product at Verta Meeta Dash is building MLOps tools to help data science teams track, deploy, operate and monitor models and bring order to Enterprise AI/ML chaos. Prior to Verta, Meeta held several product leadership roles in Appen, Figure Eight, Cisco Systems, Tokbox/Telefonica and Computer Associates building ML data platform, Voice & Conversation AI products and Analytics/Operational Monitoring Tools. Meeta has an MBA Degree from UC Davis and an engineering degree from National Institute of Technology, India.
recorded on: Wednesday, June 16 @ 2:00 p.m ET
Spark NLP is the most widely used NLP library in the enterprise, thanks to implementing production-grade, trainable, and scalable versions of state-of-the-art deep learning & transfer learning NLP research. It is also Open Source with a permissive Apache 2.0 license that officially supports Python, Java, and Scala languages backed by a highly active community and JSL members.
Spark NLP library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking, multi-class and multi-label text classification, sentiment analysis, emotion detection, unsupervised keyword extraction, and state-of-the-art Transformers such as BERT, ELECTRA, ELMO, ALBERT, XLNet, and Universal Sentence Encoder.
The latest release of Spark NLP 3.0 comes with over 1100+ pretrained models, pipelines, and Transformers in 190+ different languages. It also delivers massive speeds up on both CPU & GPU devices while extending support for the latest computing platforms such as new Databricks runtimes and EMR versions.
The talk will focus on how to scale Apache Spark / PySpark applications in YARN clusters, use GPU in Databricks new Apache Spark 3.x runtimes, and manage large-scale datasets in resource-demanding NLP applications efficiently. We will share benchmarks, tips & tricks, and lessons learned when scaling Spark NLP.
Maziyar Panahi - Spark NLP Lead at John Snow Labs
Maziyar Panahi is a Senior Data Scientist and Spark NLP Lead at John Snow Labs with over a decade long experience in public research. He is a senior Big Data engineer and a Cloud architect with extensive experience in computer networks and software engineering. He has been developing software and planning networks for the last 15 years. In the past, he also worked as a network engineer in high-level places after he completed his Microsoft and Cisco training (MCSE, MCSA, and CCNA).
He has been designing and implementing large-scale databases and real-time Web services in public and private Clouds such as AWS, Azure, and OpenStack for the past decade. He is one of the early adopters and main maintainers of the Spark NLP library. He is currently employed by The French National Centre for Scientific Research (CNRS) as a Big Data engineer and System/Network Administrator working at the Institute of Complex Systems of Paris (ISCPIF).
recorded on: Tuesday, June 15 @ 2pm ET
TigerGraph Cloud, a cloud-agnostic database-as-a-service offering, enables users to leverage graph analytics on AWS, Azure, and GCP. TigerGraph Cloud not only fully manages clusters, but also provides a use case library via starter kits, which greatly reduces customers’ time to business value and decreases the learning curve to specific graph database use cases such as fraud detection, recommendation, customer 360, AI, and in-database machine learning. Come learn and ask questions from our TigerGraph Cloud Product Team and watch an end-to-end overview of TigerGraph Cloud with a short demonstration that showcases starter kits and other unique functionality.
Benyue (Emma) Liu
Emma has a mission to empower developers, data scientists and enterprise data analysts with easier access to graph analytics and features. She is a senior product manager at TigerGraph leading efforts in Kafka integration, cloud, enterprise features, ecosystem connectors. Emma has a background in data management, cloud computing and complex system designs. Prior to TigerGraph, she worked at Oracle and MarkLogic. Emma holds a Bachelor of Science degree from Harvey Mudd College and a Master of Science degree from MIT.
Engineering Manager at TigerGraph
Master in Management Information Systems from Carnegie Mellon University
Lead Developer for TigerGraph Cloud, GraphStudio and Admin Portal
Areas of specialty: Full-stack Development, Cloud, Containers and Connectors.
Recorded on: Tuesday, May 25 @ 6:00 p.m ET
Travis Oliphant, CEO of OpenTeams and founder of Anaconda, joins Coiled to discuss the adoption of Python in the enterprise, how Open Source is such a productive modern development for businesses, and roadmaps for how the Enterprise can leverage Open Source to deliver business value.
Python is the de facto language for machine learning and AI, and is used across many verticals, including healthcare, finance, retail, and space exploration! What are the key features of Python and its community that made this happen? What gaps remain to even more robust adoption and usage in the Enterprise?
Python was created in 1991 as a better scripting language. In the late 1990s, many scientists and engineers started to use it to write scientific code that “fit their brains”. That community swelled organically to tens of millions of users over the next 20 years despite the lack of consistent and targeted funding. What are the aspects of both the language and the community behind Python that contributed to this? What lessons can be learned about making the most of the many open-source projects, frameworks, and languages that are available today? Despite Python’s success, there are still gaps in features and systems that can be a barrier to Enterprise adoption – what are some of these challenges, and what solutions are being proposed?
After attending, you’ll know:
– How Python and PyData (NumPy, pandas, scikit-learn, and many more!) got started;
– What are key reasons AI and ML practitioners use Python;
– What challenges had to be overcome to get Python adopted in the Enterprise;
– What challenges remain for Python adoption in organizations;
– What companies and service providers are helping the enterprise get more value faster out of their data.
Join us Tuesday, May 25 at 6 pm US Eastern time by signing up here and dive into the wonderful world of unlocking business value with Open Source Software!
CEO of Open Teams & founder of Anaconda
Recorded on: May 19th @ 11am ET
In recent years, enterprises have invested heavily in AI and Deep Learning initiatives. The three stages of most AI implementations, data ingestion and preparation, model building and training, and inference in production, require Data Science, IT and Ops teams to work in new and sometimes confounding ways. Adding complexity, many organizations are choosing to build AI infrastructure from the ground up using containers and Kubernetes.
Join Data Science Salon as we host NVIDIA, RUN:AI and The AI Center for Value-Based Healthcare (at King’s College London) who will share best practices for building an optimized AI infrastructure stack using NVIDIA GPUs, Kubernetes and Run:AI.
In the webinar, we will walk through an example of a well-architected AI Infrastructure stack and discuss how Kubernetes and advanced GPU scheduling can be used to maximize efficiency and speed up data science initiatives. We’ll cover how one organization solved the orchestration challenges that enterprises face when they begin to scale.
Prof. M. Jorge Cardoso - Associate Professor & Senior Lecturer, King’s College London & CTO, AI Centre for Value-Based Healthcare
The London Medical Imaging & Artificial Intelligence Centre for Value Based Healthcare is a consortium of academic, healthcare and industry partners, led by King’s College London. It uses medical images and electronic healthcare data held by the UK National Health Service to train sophisticated computer vision and natural-language processing deep learning algorithms.
Michael Balint, Sr. Product Manager, NVIDIA
NVIDIA’s (NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming market and has redefined modern computer graphics, high performance computing and artificial intelligence. The company’s pioneering work in accelerated computing and AI is reshaping trillion-dollar industries, such as transportation, healthcare and manufacturing, and fueling the growth of many others.
Omri Geller - CEO and co-founder of Run:AI
Run:AI provides a cloud-native computer resource management platform for the AI era. With Run:AI, data scientists get access to all of the pooled compute power they need to accelerate AI experimentation – whether on-premises or cloud. The company’s Kubernetes-based platform provides IT and MLOps with real-time visibility and control over scheduling and dynamic provisioning of GPUs – and gains of more than 2X in utilization of existing infrastructure.
Recorded on: October 28th @ 2PM ET
In this talk, Jen Burningham, Jorge Castañón and Rakshith Dasenahalli will discuss the applications of Machine Learning and AI Fairness techniques in credit risk models for banking institutions. They will cover the typical use cases and the approaches the Data Science Elite (DSE) team used to address the challenges and necessity in mitigating model bias. This session will introduce the Credit Risk accelerator that was developed based on the experience of the DSE team.
Jen Burningham is a manager, Senior Data Scientist, and Machine Learning Engineer in the IBM Data Science and AI Elite Team. She has a strong background in healthcare, tech, and startups. Jen is passionate about using AI to improve businesses and lives. She received her Bachelor’s Degree in Applied Mathematics with Concentration in Statistics from Yale University and earned her MBA from Yale School Of Management.
Rakshith is a Data Scientist with the IBM Data Science Elite (DSE) Team and has 3 years of experience on the team. He has worked with various enterprises across different industries helping them understand and use their data to improve their businesses. He has experience working in Machine Learning, Data Visualization, and AIfairness customer engagements. He holds a M.S. in Electrical Engineering from University of Southern California.
Jorge Castañón hails from Mexico City and received his Ph.D. in Computational and Applied Mathematics from Rice University. He has a genuine passion for data science and machine learning business applications. For 15+ years he has been developing data science and AI models as well as algorithms to solve numerical optimization and regularized inverse problems. At IBM, Jorge is the Lead Data Scientist of the Machine Learning Hub North America, a team that empowers organizations to create business value through data science and AI. In 2020, Jorge was certified as a Level 3 Thought Leader and Distinguished Data Scientist by The Open Group.
RECORDED ON: September 3rd @ 2pm EDT
Euphoria Convenience is big data and predictive analytics perfectly attuned: it means satisfying goods and services arriving even before consumers sense that they want them. Already we see slivers of it on Netflix when the next, curated video begins even before the previous one ends. We experience it on Amazon when suggestions occasionally fulfil an urge we didn’t know we had.
The key to perfected entertainment and service is our personal information, and the underlying human dilemma is about how much of ourselves we should expose to get what we want. In this webinar, real world cases and philosophical thought experiments will help us investigate by exploring questions including: What is privacy? Why is Kim Kardashian the most private person in the world? Is it more satisfying to want and then get, or to get before wanting? Answers are elusive, but the goal is to identify and clarify the ethical dilemmas arising when personal information is gathered and leveraged in the media, entertainment, marketing and retail sectors. This webinar is tailored for professionals in data governance, but open to all those interested in the regulatory and human dilemmas inhabiting privacy and AI powered convenience.
James Brusseau (PhD, Philosophy) is author of books, articles, and digital media in the history of philosophy and ethics. He has taught in Europe, Mexico, and currently at Pace University near his home in New York City. As Director of AI Ethics Site, a research institute currently incubating at Pace University, he explores the human experience of artificial intelligence.
WATCH LIVE: August 20th @ 2pm EDT
Do you want to build a career in the field of AI? Join the session where 2 IBM data scientists from Data Science and AI Elite Team will share with you their journey to AI, the challenges they faced, interview tips and useful resources, along with their experience applying ML to real-world client problems by leveraging their earlier experiences.
Safura Suleymanova - Data Scientist at the IBM Data Science & AI Elite Team
Safura Suleymanova is a Data Scientist at the IBM Data Science and AI Elite Team and develops Machine Learning powered solutions for customers across various industries, including finance, telecom, supply chain, among others. She feels passionate about how data can be used to streamline more routine processes ultimately improving business decisions. To improve existing operational processes at her previous workplaces she started leveraging the data, which led her to the field of Data Science. Safura has a bachelor’s degree in Mathematics, with a specialization in Risk Management and Statistics from University of Waterloo.
Aishwarya Srinivasan - Data Scientist, Data Science Elite Team, IBM Data & AI
Aishwarya joined DSE Team in June 2018 as a Data Scientist. She is a researcher in machine learning and reinforcement learning, and she strives for innovation. During her engagements, she primarily focuses on bringing in machine learning research to create business value.
With her interest in integrating machine learning in finance, she has presented her work to leading Financial firms like Morgan Stanley, Credit Suisse and World Bank. She is very focused on expanding her horizons in the machine learning research community including her recent Patent Award won in 2018 for developing Reinforcement Learning model for Machine Trading. She is a post-graduate in Data Science from Columbia University.
She is an ambassador for Women in Data Science community, originating from Stanford University. She has 50k+ follower base on LinkedIn and actively participates/organizes in events and conferences to inspire budding data scientists. Besides being a data junkie, she is a fitness fanatic who is into martial arts (Krav Maga) and yoga. To spread her knowledge in the space of data science, and to inspire budding Data Scientists, she actively writes blogs related to machine learning in her website https://aishgrt.wixsite.com/thepeculiarunicorn
RECORDED ON: July 7th @ 11am PDT
Python is a powerful programming language that is a good choice for many types of analytics. It is rapidly becoming the language of choice for scientists and researchers of many types. Now by combining a massively parallel (MPP) database like Vertica with Python, you can overcome many scale and analytics challenges that can limit Python users. Join this session to see a live demo of Vertica’s built-in integration with Python to measure and predict key numbers of the COVID-19 pandemic by using historical data on the number of cases and fatalities per state to predict the results for the following day.
Badr Ouali, Data Scientist at Vertica
Recorded on: June 30th @ 11am PDT
I hate to admit it, but your data is lying to you — and more often than you think. Having clean data with high volume, velocity, and variety doesn’t necessarily protect one from the possibility of reaching faulty conclusions to research questions of interest. Despite what you may have learned in Statistics 101, a significant p-value isn’t always groundbreaking. All data can be coerced and bribed to tell any story; thus, as data practitioners, it’s our duty to be cognizant of the possible pitfalls that abound and how to navigate around common traps — responsibly.
By showcasing enterprise use cases and sample projects in Dataiku’s Data Science Studio, Christopher Peter Makris will address the following questions:
– Is more data always better? How can the inclusion or exclusion of data obfuscate a previously held conclusion?
– Is an observed event truly a trend? How can previously noted behaviors be a marker for the complete opposite behavior in the future?
– Is an association worth my time/money/effort? When do conclusive conclusions lead us not to act on a valid association?
Christopher Peter Makris, Lead Data Scientist at Dataiku
Christopher Peter Makris (CPM) is a Lead Data Scientist at Dataiku. With a background in Logic, Discrete Mathematics, & Statistics, he brings experience from both industry and academia. CPM previously stood as Director of Data Science at the NYC Data Science Academy and Executive Director of the Master’s of Statistical Practice Program at Carnegie Mellon University; at both institutions, CPM took pride in professionally developing students, aiding in the redesign of technical curricula, and delivering over 3,000 hours of lectures to graduate audiences. In industry, CPM most recently helped thwart cybercrime as the Director of Security Data Science at Empower Retirement. When not crunching numbers on his computer, CPM is crunching his abs in the studio as he’s an avid dancer, choreographer, and CrossFitter (so, while he considers himself a data scientist, he can only count to 5-6-7-8…)
Recorded on: May 20th @ 2pm EST
Deep Learning Frameworks allow for the easy construction of neural networks and have made Python the go-to language for Artificial Intelligence development. This webinar will show how deep learning can be used to help align developers and data scientists to a particular framework as well as demonstrate some examples of how IBM is extending the value of deep learning frameworks via open source projects, and how new advancements in NLP like transformers are changing how and why to use a particular framework.
Nick Acosta, IBM
Before becoming an AI Advocate at IBM, Nick studied Computer Science at Purdue University and the University of Southern California, and was an HPC Consultant for Hewlett-Packard in Grenoble, France. He now specializes in machine learning and utilizes it to understand machine learning developers of various communities, startups, and enterprises in order to help them succeed on IBM’s data science platform and in open source. He has a strong interest in open source software and all things Kardashian.
RECORDED ON: May 14th @ 11am PST
Python is becoming a common language between developers and data scientists, which is why so many of the open-source tools and libraries are built for Python. However, navigating through the hundreds of thousands of available packages for data science and machine learning (DS/ML) can be a daunting task.
Are you ready to benefit from Python for data science? Join us to discover new and innovative ways to enhance your models, algorithms, and visualizations.
During the webinar, we’ll discuss:
- Popular open-source tools for DS/ML
- Use cases
- Helpful resources to get started with data science with Python
Stan Seibert - Sr. Director, Community Innovation at Anaconda
Stan leads the Community Innovation team at Anaconda, where his work focuses on high performance GPU computing and designing data analysis, simulation, and processing pipelines.He is a longtime advocate of the use of Python and GPU computing for research. Prior to joiningAnaconda, Stan served as Chief Data Scientist at Mobi, where he worked on vehicle fleet tracking and route planning.Stan received a PhD in experimental high energy physics from the University of Texas at Austin,and performed research at Los Alamos National Laboratory, University of Pennsylvania, and theSudbury Neutrino Observatory.
RECORDED ON: WED APR 15 @ 11.00AM PST
Learn explainable workflows using open source software, semi-automated pipelines, and Python! From data ingestion, cleaning, piping, and modeling, explainability and trust are at the forefront of enterprise data science initiatives. In this talk, learn the best practices for codifying and relaying explainable data science to stakeholders, management, and the end user in a reproducible and portable fashion.
Austin Eovito - Data Scientist, IBM
Austin is a Data Scientist on the Technical Marketing and Evangelism team in San Francisco, California. As a recent graduate student of Florida State University, Austin is focused on the balancing the bleeding-edge research produced by academia and the tools used in applied data science. His Masters thesis was on White Collar Crime using Time-aware Joint-Topic-Sentiment Analysis (TTS). Austin leads IBM’s Data Science Masterclass on AI Explainability and currently resides in San Francisco, with his fiance, dog, and two cats.