Close

FREE  |  ONLINE  |  CONFERENCE

July 24th, 2018

9am-6pm EDT/

6am-3pm PDT

an IBM Community Virtual Event

TALKS

Svetlana Levitan

Developer Advocate

IBM

SPSS Models in Watson Studio

Watson Studio makes many SPSS models available in the graphical user interface, but there are also many models that can be called by Scala or Python API. This talk with present them, as well as output components that can be used with SPSS or open source models to get nice visualizations of the model results. 


Justin McCoy

Developer Advocate

IBM

Accelerate Training of Deep Learning Tasks with PowerAI

Brief history of ML to DL and the rise of GPU accelerated hardware. Introducing the PowerAI platform and the advantages provided while working on deep learning problems.

Luciano Resende

Data Science Platform Architect

IBM

Scaling Jupyter Notebooks with Jupyter Enterprise Gateway

The Jupyter Notebook Stack has become the "de facto" platform used by data scientists to build interactive applications and with the popularity of deep learning, there is an increasing need of resources to make deep learning effective. In this session, we will discuss how one can scale their Jupyter Notebook deployment by enabling kernels to run in a distributed mode using multiple compute nodes, an analytical engine such as Apache Spark or even containers managed by Kubernetes.

Raj Singh

Sr. Developer Advocate

IBM

Automate Data Science Drudgery with Pixiedust

The Jupyter notebook has quickly become one of data scientists’ favorite tools. When using them in IBM's Watson Studio, you get a complete platform for building an application - from data preparation and analytics to building and deploying machine learning models. Jupyter notebooks are a big step up from executing code at the command line, but the basic notebook environment doesn’t do much to automate repetitive tasks. This is where Pixiedust comes in. It puts some of the most common visualization tasks behind a convenient GUI so you don’t have to remember all those obscure arguments that go into the creation of a simple bar chart. Even better, Pixiedust is extensible, so if the function you want to automate isn’t available, you can write a “PixieApp” – a Python class that extends Pixiedust – to do the job. Come learn how to use Pixiedust and build PixieApps.

Victor Terpstra

Senior Data Scientist,

Data Science Elite Team

IBM

Getting Started with Decision Optimization for DSX

Walk-through of how to develop and evaluate a decision optimization model in DSX. How to use the CPLEX Python API in a Jupyter notebook to code an optimization model. Use the DO for DSX add-on to create, run, do what-if analysis and compare multiple scenarios using dashboards. Scenario management. Discussion of best practices.


Shadi Copty

Director, Data Science & Machine Learning Offerings

IBM

Mega Trends in Data Science

Data Science continues to evolve in importance, adoption and impact across industries. In this talk we will go over the mega trends that are shaping this landscape, from the algorithmic and computational all the way to the people and cultural.

Ted Fisher

Senior Offering Manager, Data Science Cross-Product Initiatives

IBM

Data Science Best and Worst Practices

In this session, a data science leader will provide best practices for your data science project as well as tips to avoid landmines that could cause significant problems. 

Join developers and their advocates as they talk about projects and technologies they contribute to and depend upon

IBM Code

IBM Sessions

Hear from IBM product experts who will discuss the latest technologies in Data Science

Virginie Granhaye

Offering Manager - IBM Decision Optimization

IBM

How Decision Optimization Can Complement ML in Decision Making

Machine learning, AI, better insight ...How to combine all those techniques, to put in place better decision-making systems. Learn about Data Science Experience platform, and how to benefit from Decision Optimization in it. Prescriptive analytics is a good complement to predictive models, to go one step further. Learn about real use cases, and products available to accomplish this.


Daniel Zilio

Senior Software Developer, IBM Db2 Event Store

IBM

IBM Db2 Event Store - Deriving Deep Insights from Fast Data

Deriving deep insights from Fast Data has historically been challenging. System designers have typically had to choose between fast insights on a window of the data (the streaming analytics solution) or analyzing all of the data, but after a considerable delay (the data lake/warehouse approach). Unfortunately this tradeoff is no longer acceptable. We’ll discuss IBM Db2 Event Store, a new offering capable of ingesting millions of events per second, while at the same time making that data immediately available for analytics. The solution is builton an open source stack (Spark, Parquet) and is integrated into IBM’s Data Science Experience for first-class data analysis. We will provide examples of Db2 Event Store, and how it can be used to quickly ingest data and analyze it through Scala and Python notebooks. Come learn about IBM's newest offering for Fast Data Management.

Sidney Phoon

Chief Data Scientist

IBM

Extend SPSS Modeler Capabilities with Open Source

Extend and embrace open source in SPSS Modeler, to programmatically perform tasks you can’t easily accomplish with out-of-the-box Modeler nodes. In this session, you will learn about the Python and R programming frameworks as implemented in SPSS Modeler V18.1.  You will see examples of how Python and R code is embedded in the extension nodes, those nodes programmatically read data, manipulate data, build Spark ML model and pass the results to Modeler nodes downstream.

Dean Wampler

VP, Fast Data Engineering

Lightbend

What You Need to Know About Fast Data

Streaming data systems, so called "Fast Data", promise accelerated access to information, leading to new innovations and competitive advantages. But they aren't just "faster" versions of Big Data. They force architecture changes to meet new demands for reliability and dynamic scalability, more like microservices. This talk tells you what you need to know to exploit Fast Data successfully, especially from the point of view of a data science team.


Learn how IBM Community Partners can help you tackle your Data Science challenges

Partner

Boris Lublinsky

Architect

Lightbend

Operationalizing Model Serving

Traditional approach to model serving is to treat model as code, which means the that machine learning implementation has to be somehow adopted for model serving. As the amount of machine learning tools and techniques grows such approach is becoming more questionable. Additionally machine learning and model serving are driven by very different quality of service requirements - while machine learning is typically batch, mostly concerned with scalability and processing power, model serving is mostly concerned with performance and stability.  


An alternative approach to model serving, proposed in the presentation, is treating model as data. Such approach allows complete decoupling between model implementation for machine learning and model serving and allows for easier standardization of model serving implementation. Additionally such approach allows for dynamic update of served model without requirement of restart of the system.


The presentation explores usage of Tensorflow and PMML as model representation and their usage for building “real time updatable” model serving architecture. Boris will also present options for implementing such architecture leveraging Akka Streams and Flink .  

A Look Under the Hood of Driverless AI

Driverless AI speeds up data science workflows by automating feature engineering, model tuning, ensembling and model deployment. Hemen Kapadia will give a quick overview on Driverless AI and its features – Automatic Feature Engineering, Machine Learning Interpretability, Automatic Visualization.


Driverless AI turns Kaggle-winning recipes into production-ready code and is specifically designed to avoid common mistakes such as under or overfitting, data leakage or improper model validation. Avoiding these pitfalls alone can save weeks or more for each model, and is necessary to achieve high modeling accuracy.


With Driverless AI, everyone can now train and deploy modeling pipelines with just a few clicks from the GUI. Advanced users can use the client/server API through a variety of languages such as Python, Java, C++, go, C# and many more. To speed up training, Driverless AI uses highly optimized C++/CUDA algorithms to take full advantage of the latest compute hardware.


For example, Driverless AI runs orders of magnitudes faster on the latest Nvidia GPU supercomputers on Intel and IBM platforms, both in the cloud or on premise. There are two more product innovations in Driverless AI: statistically rigorous automatic data visualization and interactive model interpretation with reason codes and explanations in plain English. Both help data scientists and analysts to quickly validate the data and models.

Hemen Kapadia

Senior Solution Architect, Customer Success Team

H2O.ai

Erin LeDell

Chief Machine Learning Scientist

H2O.ai

Scalable Automatic Machine Learning with H2O

In this presentation, Erin LeDell (Chief Machine Learning Scientist, H2O.ai), will provide an overview of the field of "Automatic Machine Learning" and introduce the new AutoML functionality in H2O. H2O's AutoML provides an easy-to-use interface which automates the process of training a large, comprehensive selection of candidate models and a stacked ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.

Vincent Granville

Co-Founder

Data Science Central

Feature Selection for Unsupervised Learning

After reviewing popular techniques used in supervised, unsupervised and semi-supervised machine learning, this presentation will feature selection methods in these different contexts, especially the metrics used to assess the value of a feature or set of features, be it binary, continuous or categorical variables. In this session we will go into deeper detail and review modern feature selection techniques for unsupervised learning, typically relying on entropy-like criteria. While these criteria are usually model-dependent or scale-dependent, we introduce a new model-free, data-driven methodology in this context, with an application to an interesting number theory problem (simulated data set) in which each feature has a known theoretical entropy. 


We will also briefly discuss high precision computing as it is relevant to this peculiar data set, as well as units of information smaller than the bit. Despite the apparent advanced level of this presentation, it is made accessible to a large audience of data scientists, ranging from practitioners to executives.

Gabriela de Queiroz

Senior Data Scientist,

Data Science Elite Team

IBM

Statistics for Data Science: What You Should Know and Why

Data science is not only about machine learning. To be a successful data person, you also need a significant understanding of statistics. Gabriela de Queiroz walks you through the top five statistical concepts every Data Scientist should know to work with data. 

Augustina Ragwitz

Open Sourceress

IBM

Level Up your Open SourceRy with Data Science + R

IBM is leading industry in open source community engagement. Reflecting this external engagement into individual performance goals is challenging for both employees and managers. R provides an easy way to publish your code as a personal performance report with R markdown!

Steve Barbee

Offering Manager of SPSS Predictive Analytics Algorithms; Data Scientist

IBM

Tips on Generalizing Feature Creation and Assessing Imbalanced-Data Trees

Successful deployments in machine learning require that models generalize well to new data.  When building predictive work flows it is easy to overlook where bias may enter when creating features thereby leading to an over-optimistic expectation of a success deployment.  The correct placement and use of the Partition Node when creating targeted features can help to avoid this problem.  When trees are built using sampling methods to address imbalanced data the coverage and accuracies reported for the resulting rules are based on the balanced training data. A method starting with the Rule Trace Node is shown that finds the coverage and accuracy of individual tree rules on the test data.  This will indicate how each rule generalizes to new data.

Susara van Den Heever

IBM Data Science Elite Team, Program Director

IBM

Make Better, Faster, Smarter Decisions by combing Machine Learning and Decision Optimization

What if you could reduce your planning process from 1 week to 1 hour, or from 1 hour to 1 second? What if you could, at the click of a button, improve your bottom line by double digits? In this session, you will learn to do just that by leveraging IBM's powerful Machine Learning (ML) and Decision Optimization (DO) technologies together. You will learn the differences and complementary strengths of ML and DO, learn best practices, and see examples of combining these technologies to achieve financial gains and efficiencies based on some IBM Data Science Elite client projects. The session also includes a demo of combining ML & DO in IBM Data Science Experience (DSX).

Jean Francois Puget

Distinguished Engineer, Machine Learning, Optimization, Advanced Analytics

IBM

Machine Learning Competitions : Engaging, Learning, and Winning

This session will introduce Kaggle machine learning competition platform, and describe an effective methodology to fare well on it, developed by a Kaggle Grand Master. The methodology covers data exploratory analysis, feature engineering, model evaluation, and algorithm tuning. Machine learning competitions are a simplified version of real world machine learning, and we will discuss additional steps that need to be added to the methodology.

Ferenc Katai, Ph.D.

Offering Manager for CPLEX Optimization Studio (CPLEX/CPO/OPL)

IBM

Integrating predictive (SPSS Modeler) and Prescriptive Analytics (CPLEX Optimization Studio) - Campaign Optimization

Optimization is about making plans, schedules, in short, decisions. However, optimisation has to build on good data, which most of the time comes form forecasts, predictions. In this talk a specific use-case will be brought on showing the integration between prediction and optimisation. More specifically, imagine that a bank brings about new products and wants to target its clients on a way that the bank reaches out to those clients who most likely will want to buy some of the new products. It means that the bank has to predict what each client or client groups most likely will buy and then build a campaign to advertise the products to the best candidates. Obviously the bank has a given budget for the campaign, has different channels to approach clients, where each channel has different cost, and the bank wants to maximise the expected revenue minus cost. SPSS Modeler will be used for the prediction part and solving an OPL model in CPLEX Optimization Studio will give the decisions for which clients should be approached by what offer thru what channel. 

Diane Reynolds

Chief Data Scientist, Client Insights

IBM Watson FSS

Natural Language Classification in Analyzing Regulation

The application of natural language classification continues to grow. Over a period of 18 months, my team looked at how to effectively classify, map and assess regulations (financial services industry) to gain insights into the critical requirements and “map” them onto available existing controls.  


This presentation provides an overview of the process used to train Watson Natural Language Classifier to interpret such financial regulations. What were the critical steps, and what did we learn along the way that you might apply to your own projects? We’ll share those insights during this session. 

Hemant Suri

Sr. Manager, Hybrid Data Warehouse Offering Management

IBM

The Hybrid Enterprise Data Warehouse of the Future –Do More with your Data

The Enterprise Data Warehouse (EDW) has traditionally been the foundation for data storage. So how do you leverage current investments while remaining relevant and competitive? It is important for your organization to continue to evolve, accelerate development/deployment times, provide high performance, and a cloud ready platform to drive the future of advanced analytics. In this webinar we will talk about current data challenges, and how Data Science and Machine Learning is driving better, faster data driven decisions. We will also discuss the need for a Hybrid data strategy, and how the IBM Integrated Analytics System, as the mainstay of the Enterprise Data Warehouse of the future, remains an integral part of that strategy. We will talk about IBM’s future vision and how current and future innovative, scalable, and flexible technology is harnessing growing data for even your most advanced workloads. While providing your data scientists with a platform for advanced analytics.

Kirk Haslbeck

Principal Solutions Engineer

Hortonworks

Data Science at Scale – A Platform Decision

Data science isn’t just creeping into areas of modern business, it’s being targeted in every department. Gone are the days where data science (DS) was a one-off project in hopes to improve a single area of the company. Organizations currently look to take advantage of DS advancements in every business aspect. The challenge arises when you want to operationalize your DS practice. How do you make a DS project repeatable from discovery to feature selection, through training, implementation and deployment? What measures can you use to state a return on investment (ROI) as traditional SDLC won’t satisfy customer demands for faster time to market and data volumes are growing too large for single console solutions. Come see how Hortonworks (HWX) & IBM provide a connected platform for data science. The combination of HWX and IBM provides the best of both open-source standard and business solutions. I will walk through building a machine learning model, pushing processing to a massive amount of data, running in a secure environment and deploying the model, all on Hortonworks Data Platform & DSX. This talk will cover notebook development, spark distributed execution, containerizing custom libraries and model deployment. Join me for the discussion and see why driving your company to be data science ready is a platform decision.

Robert Hryniewicz

Data Evangelist

Hortonworks

Can Cars Think Like Humans?

Self-driving cars are here, and how they are deployed will change the world. In this session, you will see how a miniature race car can be powered by open source to faster time to innovation. Learn how data is captured, Deep Learning models are trained in TensorFlow using pooled GPUs, created and deployed back to the car to improve functionality as new data is fed in. Witness how new innovations from Apache Hadoop 3.1 support leading technologies from Hortonworks and IBM to transform all industries.

Harsh Shah

Partner Solutions Engineer

Hortonworks

Jean Francois Puget

Distinguished Engineer, Machine Learning, Optimization, Advanced Analytics

IBM

Machine Learning Competitions : Engaging, Learning, and Winning

This session will introduce Kaggle machine learning competition platform, and describe an effective methodology to fare well on it, developed by a Kaggle Grand Master. The methodology covers data exploratory analysis, feature engineering, model evaluation, and algorithm tuning. Machine learning competitions are a simplified version of real world machine learning, and we will discuss additional steps that need to be added to the methodology.

More to be announced soon

Talk titles and abstracts are subject to change

PARTNERS

MEDIA

Get in Touch.

Connect on Social Media

Share. Solve. Do More.

This virtual event is brought to you by IBM Community.

©2018 BEMYAPP - All rights reserved.