an IBM Community Virtual Event
July 24th, 2018
9am-6pm EDT/
6am-3pm PDT
TALKS
IBM Code
Join developers and their advocates as they talk about projects and technologies they contribute to and depend upon
Svetlana Levitan
Developer Advocate
IBM
SPSS Models in Watson Studio
Watson Studio makes many SPSS models available in the graphical user interface, but there are also many models that can be called by Scala or Python API. This talk with present them, as well as output components that can be used with SPSS or open source models to get nice visualizations of the model results.
Justin McCoy
Developer Advocate
IBM
Accelerate Training of Deep Learning Tasks with PowerAI
Brief history of ML to DL and the rise of GPU accelerated hardware. Introducing the PowerAI platform and the advantages provided while working on deep learning problems.
Luciano Resende
Data Science Platform Architect
IBM
Scaling Jupyter Notebooks with Jupyter Enterprise Gateway
The Jupyter Notebook Stack has become the "de facto" platform used by data scientists to build interactive applications and with the popularity of deep learning, there is an increasing need of resources to make deep learning effective. In this session, we will discuss how one can scale their Jupyter Notebook deployment by enabling kernels to run in a distributed mode using multiple compute nodes, an analytical engine such as Apache Spark or even containers managed by Kubernetes.
Raj Singh
Sr. Developer Advocate
IBM
Automate Data Science Drudgery with Pixiedust
The Jupyter notebook has quickly become one of data scientists’ favorite tools. When using them in IBM's Watson Studio, you get a complete platform for building an application - from data preparation and analytics to building and deploying machine learning models. Jupyter notebooks are a big step up from executing code at the command line, but the basic notebook environment doesn’t do much to automate repetitive tasks. This is where Pixiedust comes in. It puts some of the most common visualization tasks behind a convenient GUI so you don’t have to remember all those obscure arguments that go into the creation of a simple bar chart. Even better, Pixiedust is extensible, so if the function you want to automate isn’t available, you can write a “PixieApp” – a Python class that extends Pixiedust – to do the job. Come learn how to use Pixiedust and build PixieApps.
Victor Terpstra
Senior Data Scientist,
Data Science Elite Team
IBM
Getting Started with Decision Optimization for DSX
Walk-through of how to develop and evaluate a decision optimization model in DSX. How to use the CPLEX Python API in a Jupyter notebook to code an optimization model. Use the DO for DSX add-on to create, run, do what-if analysis and compare multiple scenarios using dashboards. Scenario management. Discussion of best practices.
Gabriela de Queiroz
Senior Data Scientist,
Data Science Elite Team
IBM
Statistics for Data Science: What You Should Know and Why
Data science is not only about machine learning. To be a successful data person, you also need a significant understanding of statistics. Gabriela de Queiroz walks you through the top five statistical concepts every Data Scientist should know to work with data.
Augustina Ragwitz
Open Sourceress
IBM
Level Up your Open SourceRy with Data Science + R
IBM is leading industry in open source community engagement. Reflecting this external engagement into individual performance goals is challenging for both employees and managers. R provides an easy way to publish your code as a personal performance report with R markdown!
Jean Francois Puget
Distinguished Engineer, Machine Learning, Optimization, Advanced Analytics
IBM
Machine Learning Competitions : Engaging, Learning, and Winning
This session will introduce Kaggle machine learning competition platform, and describe an effective methodology to fare well on it, developed by a Kaggle Grand Master. The methodology covers data exploratory analysis, feature engineering, model evaluation, and algorithm tuning. Machine learning competitions are a simplified version of real world machine learning, and we will discuss additional steps that need to be added to the methodology.
Hear from IBM product experts who will discuss the latest technologies in Data Science
IBM Sessions
Shadi Copty
Director, Data Science & Machine Learning Offerings
IBM
Mega Trends in Data Science
Data Science continues to evolve in importance, adoption and impact across industries. In this talk we will go over the mega trends that are shaping this landscape, from the algorithmic and computational all the way to the people and cultural.
Ted Fisher
Senior Offering Manager, Data Science Cross-Product Initiatives
IBM
Data Science Best and Worst Practices
In this session, a data science leader will provide best practices for your data science project as well as tips to avoid landmines that could cause significant problems.
Virginie Granhaye
Offering Manager - IBM Decision Optimization
IBM
How Decision Optimization Can Complement ML in Decision Making
Machine learning, AI, better insight ...How to combine all those techniques, to put in place better decision-making systems. Learn about Data Science Experience platform, and how to benefit from Decision Optimization in it. Prescriptive analytics is a good complement to predictive models, to go one step further. Learn about real use cases, and products available to accomplish this.
Daniel Zilio
Senior Software Developer, IBM Db2 Event Store
IBM
IBM Db2 Event Store - Deriving Deep Insights from Fast Data
Deriving deep insights from Fast Data has historically been challenging. System designers have typically had to choose between fast insights on a window of the data (the streaming analytics solution) or analyzing all of the data, but after a considerable delay (the data lake/warehouse approach). Unfortunately this tradeoff is no longer acceptable. We’ll discuss IBM Db2 Event Store, a new offering capable of ingesting millions of events per second, while at the same time making that data immediately available for analytics. The solution is builton an open source stack (Spark, Parquet) and is integrated into IBM’s Data Science Experience for first-class data analysis. We will provide examples of Db2 Event Store, and how it can be used to quickly ingest data and analyze it through Scala and Python notebooks. Come learn about IBM's newest offering for Fast Data Management.
Sidney Phoon
Chief Data Scientist
IBM
Extend SPSS Modeler Capabilities with Open Source
Extend and embrace open source in SPSS Modeler, to programmatically perform tasks you can’t easily accomplish with out-of-the-box Modeler nodes. In this session, you will learn about the Python and R programming frameworks as implemented in SPSS Modeler V18.1. You will see examples of how Python and R code is embedded in the extension nodes, those nodes programmatically read data, manipulate data, build Spark ML model and pass the results to Modeler nodes downstream.
Steve Barbee
Offering Manager of SPSS Predictive Analytics Algorithms; Data Scientist
IBM
Tips on Generalizing Feature Creation and Assessing Imbalanced-Data Trees
Successful deployments in machine learning require that models generalize well to new data. When building predictive work flows it is easy to overlook where bias may enter when creating features thereby leading to an over-optimistic expectation of a success deployment. The correct placement and use of the Partition Node when creating targeted features can help to avoid this problem. When trees are built using sampling methods to address imbalanced data the coverage and accuracies reported for the resulting rules are based on the balanced training data. A method starting with the Rule Trace Node is shown that finds the coverage and accuracy of individual tree rules on the test data. This will indicate how each rule generalizes to new data.
Susara van Den Heever
IBM Data Science Elite Team, Program Director
IBM
Make Better, Faster, Smarter Decisions by combing Machine Learning and Decision Optimization
What if you could reduce your planning process from 1 week to 1 hour, or from 1 hour to 1 second? What if you could, at the click of a button, improve your bottom line by double digits? In this session, you will learn to do just that by leveraging IBM's powerful Machine Learning (ML) and Decision Optimization (DO) technologies together. You will learn the differences and complementary strengths of ML and DO, learn best practices, and see examples of combining these technologies to achieve financial gains and efficiencies based on some IBM Data Science Elite client projects. The session also includes a demo of combining ML & DO in IBM Data Science Experience (DSX).
Jean Francois Puget
Distinguished Engineer, Machine Learning, Optimization, Advanced Analytics
IBM
Machine Learning Competitions : Engaging, Learning, and Winning
This session will introduce Kaggle machine learning competition platform, and describe an effective methodology to fare well on it, developed by a Kaggle Grand Master. The methodology covers data exploratory analysis, feature engineering, model evaluation, and algorithm tuning. Machine learning competitions are a simplified version of real world machine learning, and we will discuss additional steps that need to be added to the methodology.
Ferenc Katai, Ph.D.
Offering Manager for CPLEX Optimization Studio (CPLEX/CPO/OPL)
IBM
Integrating predictive (SPSS Modeler) and Prescriptive Analytics (CPLEX Optimization Studio) - Campaign Optimization
Optimization is about making plans, schedules, in short, decisions. However, optimisation has to build on good data, which most of the time comes form forecasts, predictions. In this talk a specific use-case will be brought on showing the integration between prediction and optimisation. More specifically, imagine that a bank brings about new products and wants to target its clients on a way that the bank reaches out to those clients who most likely will want to buy some of the new products. It means that the bank has to predict what each client or client groups most likely will buy and then build a campaign to advertise the products to the best candidates. Obviously the bank has a given budget for the campaign, has different channels to approach clients, where each channel has different cost, and the bank wants to maximise the expected revenue minus cost. SPSS Modeler will be used for the prediction part and solving an OPL model in CPLEX Optimization Studio will give the decisions for which clients should be approached by what offer thru what channel.
Diane Reynolds
Chief Data Scientist, Client Insights
IBM Watson FSS
Natural Language Classification in Analyzing Regulation
The application of natural language classification continues to grow. Over a period of 18 months, my team looked at how to effectively classify, map and assess regulations (financial services industry) to gain insights into the critical requirements and “map” them onto available existing controls.
This presentation provides an overview of the process used to train Watson Natural Language Classifier to interpret such financial regulations. What were the critical steps, and what did we learn along the way that you might apply to your own projects? We’ll share those insights during this session.
Hemant Suri
Sr. Manager, Hybrid Data Warehouse Offering Management
IBM
The Hybrid Enterprise Data Warehouse of the Future –Do More with your Data
The Enterprise Data Warehouse (EDW) has traditionally been the foundation for data storage. So how do you leverage current investments while remaining relevant and competitive? It is important for your organization to continue to evolve, accelerate development/deployment times, provide high performance, and a cloud ready platform to drive the future of advanced analytics. In this webinar we will talk about current data challenges, and how Data Science and Machine Learning is driving better, faster data driven decisions. We will also discuss the need for a Hybrid data strategy, and how the IBM Integrated Analytics System, as the mainstay of the Enterprise Data Warehouse of the future, remains an integral part of that strategy. We will talk about IBM’s future vision and how current and future innovative, scalable, and flexible technology is harnessing growing data for even your most advanced workloads. While providing your data scientists with a platform for advanced analytics.
Partner
Learn how IBM Community Partners can help you tackle your Data Science challenges
Dean Wampler
VP, Fast Data Engineering
Lightbend
What You Need to Know About Fast Data
Streaming data systems, so called "Fast Data", promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just "faster" versions of Big Data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. This talk tells you what you need to know to exploit Fast Data successfully, especially from the point of view of a data science team.
Boris Lublinsky
Architect
Lightbend
Operationalizing Model Serving
Traditional approach to model serving is to treat model as code, which means the that machine learning implementation has to be somehow adopted for model serving. As the amount of machine learning tools and techniques grows such approach is becoming more questionable. Additionally machine learning and model serving are driven by very different quality of service requirements - while machine learning is typically batch, mostly concerned with scalability and processing power, model serving is mostly concerned with performance and stability.
An alternative approach to model serving, proposed in the presentation, is treating model as data. Such approach allows complete decoupling between model implementation for machine learning and model serving and allows for easier standardization of model serving implementation. Additionally such approach allows for dynamic update of served model without requirement of restart of the system.
The presentation explores usage of Tensorflow and PMML as model representation and their usage for building “real time updatable” model serving architecture. Boris will also present options for implementing such architecture leveraging Akka Streams and Flink .
Hemen Kapadia
Senior Solution Architect, Customer Success Team
H2O.ai
A Look Under the Hood of Driverless AI
Driverless AI speeds up data science workflows by automating feature engineering, model tuning, ensembling and model deployment. Hemen Kapadia will give a quick overview on Driverless AI and its features – Automatic Feature Engineering, Machine Learning Interpretability, Automatic Visualization.
Driverless AI turns Kaggle-winning recipes into production-ready code and is specifically designed to avoid common mistakes such as under or overfitting, data leakage or improper model validation. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy.
With Driverless AI, everyone can now train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client/server API through a variety of languages such as Python, Java, C++, go, C# and many more. To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware.
For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on premise. There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and interactive model interpretation with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and models.
Erin LeDell
Chief Machine Learning Scientist
H2O.ai
Scalable Automatic Machine Learning with H2O
In this presentation, Erin LeDell (Chief Machine Learning Scientist, H2O.ai), will provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.
Vincent Granville
Co-Founder
Data Science Central
Feature Selection for Unsupervised Learning
After reviewing popular techniques used in supervised, unsupervised and semi-supervised machine learning, this presentation will feature selection methods in these different contexts, especially the metrics used to assess the value of a feature or set of features, be it binary, continuous or categorical variables. In this session we will go into deeper detail and review modern feature selection techniques for unsupervised learning, typically relying on entropy-like criteria. While these criteria are usually model-dependent or scale-dependent, we introduce a new model-free, data-driven methodology in this context, with an application to an interesting number theory problem (simulated data set) in which each feature has a known theoretical entropy.
We will also briefly discuss high precision computing as it is relevant to this peculiar data set, as well as units of information smaller than the bit. Despite the apparent advanced level of this presentation, it is made accessible to a large audience of data scientists, ranging from practitioners to executives.
Kirk Haslbeck
Principal Solutions Engineer
Hortonworks
Data Science at Scale – A Platform Decision
Data science isn’t just creeping into areas of modern business, it’s being targeted in every department. Gone are the days where data science (DS) was a one-off project in hopes to improve a single area of the company. Organizations currently look to take advantage of DS advancements in every business aspect. The challenge arises when you want to operationalize your DS practice. How do you make a DS project repeatable from discovery to feature selection, through training, implementation and deployment? What measures can you use to state a return on investment (ROI) as traditional SDLC won’t satisfy customer demands for faster time to market and data volumes are growing too large for single console solutions. Come see how Hortonworks (HWX) & IBM provide a connected platform for data science. The combination of HWX and IBM provides the best of both open-source standard and business solutions. I will walk through building a machine learning model, pushing processing to a massive amount of data, running in a secure environment and deploying the model, all on Hortonworks Data Platform & DSX. This talk will cover notebook development, spark distributed execution, containerizing custom libraries and model deployment. Join me for the discussion and see why driving your company to be data science ready is a platform decision.
Can Cars Think Like Humans?
Robert Hryniewicz
Data Evangelist
Hortonworks
Self-driving cars are here, and how they are deployed will change the world. In this session, you will see how a miniature race car can be powered by open source to faster time to innovation. Learn how data is captured, Deep Learning models are trained in TensorFlow using pooled GPUs, created and deployed back to the car to improve functionality as new data is fed in. Witness how new innovations from Apache Hadoop 3.1 support leading technologies from Hortonworks and IBM to transform all industries.
Harsh Shah
Partner Solutions Engineer
Hortonworks