Data science with Python: from analytics scripts to services in scale

Giuseppe Broccolo

Abstract

How to get from a Jupyter notebook prototype to a distributed service in scale on the cloud. A 5 steps recipe.

Description

There is a scenario that is quite common when doing data science at scale.
The Data Science team have developed a good algorithm that suits our purpose and the prototype works well on a test dataset. But how to transform it into a reliable, responsive service ready for production payload? We will got through the steps involved in the evolution of a Jupyter notebook into an auto-scaling service. These steps involve changes in data ingestion, asynchronous processing, dockerisation, kubernetes and cloud technologies.

Bio

I hold a PhD in Physics and worked at CERN laboratories, where I started my computing career as C/C++ programmer for Montecarlo simulations in particle physics.

Outside academy, I started to work in data science/engineering 7 years ago, using Python with the main scientific libraries (Pandas, Scipy, Scikit-learn) and the big data Apache stack Kafka-Druid-Hadoop.

Big fan of the open source ecosystem, I'm member of the board of the ITalian PostgreSQL User Group, and I contributed implementing the support of a new type of geospatial index for two of the main PostgreSQL geospatial extension (PostGIS and pgSphere).

In the free time, I like to experiement the new methodologies introduced in data science and in machine learning, and continue to study Python :)