Yeah, I know this is a lot. But, you will understand why I’m so hyped if you read my previous article “Building Snowflake applications using Streamlit”.
Let’s deconstruct this title. I’ve built an ML model using Tensorflow to predict a salary of an NBA player, based on years played. In this article I’m going to show you how to deploy the model to Snowflake as a Python UDF and how to run that model directly out of Snowflake using Streamlit so anyone can calculate their fantasy NBA salary.
What you’ll read about in this article:
· Help, I’m a Data Scientist and need to show my work to others
· Streamlit + Snowflake’s Snowpark
· Deploying a Keras model to Snowflake as Python UDF
· That’s not all!
· TL;DR
Help, I’m a Data Scientist and need to show my work to others
Let’s talk about Tracey. She is a data scientist specialising in machine learning. In the current sprint she is building a cool ML model. You guessed it — using this model she will be able to predict the salary of an NBA player, based on the years played.
She builds the model, finds it pretty cool and is eager to show it to the team, the product owner and the customers. But, she’s struggling to explain and show the results of her work. Why?
Typically, Tracey and other data scientists prepare the data and build ML models in Jupyter Notebooks. That’s their natural environment and they feel at their best there. But, outside of the data science community, notebooks are just not used much.
For instance, the customers of the software that Tracey’s team builds and even the product owners are having a hard time following and understanding what’s going on in Tracey’s notebooks. They understand Excel and tables and graphs well, but they don’t understand Python, DNN, features, labels and “funny” data structures.
And it’s hard to make a software product out of a Jupyter notebook because it’s really a software development tool, similar to VS Code or IDEA IntelliJ, tools that some of Tracey’s team colleagues use on regular basis.
How should she go about this? How do data scientists demonstrate their work i.e. explain the process and show the results?
Many data scientists build web applications to demonstrate what their models do. In these web apps they show tables with data that the models are producing and visualise these data with graphs.
However, this requires an additional skillset, namely full-stack development skills. And that’s a whole other area of software development. Data scientists that are versed in both machine learning and full-stack web development are a rare find on the market. And, if they are good at their work, they are very expensive employees.
Tracey is a very skilled data scientist but she’s not a full-stack web developer. And building web apps is time consuming if you have to go for the full-blown application development framework. It’s a lot of fun, but it’s hard work.
Streamlit + Snowflake’s Snowpark
Enter Streamlit + Snowflake’s Snowpark. Streamlit is a low-code web application framework invented for data scientists who don’t have time to use or time to learn full-blown frameworks such as Flask (not that I don’t find Flask cool, I really do).
In fewer words, Streamlit democratises web application development and deployment. Now, any data scientist with zero knowledge of MVC, MVT, Spring Boot, JavaScript, HTML…is able to build web applications, and build them rapidly.
Exactly what Tracey needs at the end of the sprint to show her ML model’s results. She doesn’t get a whole new sprint to build an app so she can show what she did in the previous sprint. She has an hour on a Friday, the last day of the sprint, to do that.
And since the data that her ML model will be applied on is stored in Snowflake, she will use Snowpark for Python to deploy her ML model to Snowflake and run it on top of Snowflake’s data from her Streamlit application. Snowpark is a natural environment for a Data Engineer and a Data Scientist because it uses DataFrames to work with data in, you guessed it, Python.
As I’ve shown in my previous article, “Building Snowflake applications using Streamlit”, it’s super simple to connect a Streamlit application to Snowflake.
Today, we’re developing our app further and we’ll add a part to it that runs a Keras DNN model deployed as a Snowflake UDF and visualises the result of that model.
You can find the complete GitProject here and I’m going to walk you through some of the code snippets.
Deploying a Keras model to Snowflake as Python UDF
Here’s what my Keras DNN does:
In this model, NBA salary is a non-linear function of years played in the NBA. This is, of course, very naive. Years played is not the most contributing factor to the NBA salary. In fact, points scored per game and field points percentage are the 2 most contributing factors and I’d sure like to get my hands on that data and train another model. If you know, let me know where to find this data.
Years played still has influence on the salary and I like this curve showing that there’s obviously a peak in salary as you play longer in NBA, after which a plateau is reached. As the blue dots (data points) indicate, there are some players who kept getting more money with each next year they played.
Let’s see how we deploy a Keras model to Snowflake using Snowpark for Python. Since Keras saves the model in a folder, I’m doing the following to make that folder available on a Snowflake stage:
- Create a ZIP file from the folder contents
- Upload the ZIP file into a stage.
- Inside the UDF, unzip the ZIP file into a location in
/tmp. See https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-creating.html#unzipping-a-staged-file for how to do this. - Point Keras to the unzipped folder in
/tmp.
This is the code of the UDF, notice the “@udf” decorator, it will push the code to Snowflake and the name of the UDF function will be “predict_nba_salary”:
First, we’ll create a Snowflake session (connection), don’t forget to add the necessary imports:
Then, we’ll tell Snowflake that we’ll be using some server side packages in this session:
These packages are provided and managed by Anaconda. We’re partnering with them so we can avoid the dependency hell that so many developers find themselves in very quickly.
We’ll now upload the zip file:
Finally, we can deploy the UDF:
The UDF is now defined on Snowflake and we can test the functionality, like this:
And, a more realistic use case where the data is in Snowflake would look like this:
In this example, I have a table ‘employee’ that I used to create a DataFrame from and then run this function for one of the employee’s years of experience to see how much more would she earn in NBA as opposed to what she’s making right now.
We can visualise that by embedding this code into our Streamlit app:
Here’s how that looks in the Streamlit app:
Press enter or click to view image in full size
For Elon Musk, on the other hand, switching to NBA would bring a significant salary decrease:
Press enter or click to view image in full size
And that’s how easy it is to visualise your model in Streamlit. As a non-data scientist, I’d take that any day over this:
Press enter or click to view image in full size
Let me know in the comments what you think and what you’d like me to write about.
I also create YT videos in my spare time to help Sales Engineers get better at their job. Go and have a look, some people find them entertaining (admittedly, very few): https://www.youtube.com/watch?v=HujphQY8xKA
That’s not all!
Stay tuned for more exciting things coming out of the Snowpark & Streamlit kitchen. We’re just starting!
TL;DR
Shame, but I get it, you don’t have time to read this — you have to quickly build a web application so that you can show your work to people who don’t understand Jupyter notebooks.
Before you leave, I just wanted to let you know that there’s an easier way to do exactly that and you don’t have to be a full-stack web developer. It’s called Streamlit.
Instead of doing
from flask import Flaskapp = Flask(__name__)@app.route("/")
def hello_world():
return "<p>Hello, World!</p>"
You can do that much simpler using Streamlit:
import streamlit as st
st.write(“Hello, World!”)Seriously, that’s how easy it is.
Views and opinions expressed in this article are my own and do not represent that of my place of work. I expressly disclaim any liability or loss incurred by any person who acts on the information, ideas or strategies discussed in my stories on Medium.com. While I make every effort to ensure that the information I’m sharing is accurate, I welcome any comments, suggestions, or correction of errors.