We are going to briefly follow the Quickstart in the official MLFlow Documentation. We will create, serve and invoke a Machine Learning Model with MLFlow.
pip3 install mlflow
Create a new folder for our little project and create a new file called generate_model.py.
import pandas as pd
def __init__(self, n):
self.n = n
def predict(self, context, model_input):
return model_input.apply(lambda column: column + self.n)
# Construct and save the model
model_path = "Churn_one"
Churn_one = Churn_one(n=5)
# Load the model in `python_function` format
loaded_model = mlflow.pyfunc.load_model(model_path)
model_input = pd.DataFrame([range(10)])
model_output = loaded_model.predict(model_input)
assert model_output.equals(pd.DataFrame([range(5, 15)]))
Briefly explained: we import mlflow. We create a Class that which is kind of our Class Model/Classifier. With the next three lines we save our Class as a “pyfunc” model. You can read what a pyfunc is here. Now execute this file with
MLFlow will create a new Folder “Churn_one”. This Folder is a self-contained Version of the Class-Model from the previous file. Inside this Folder we see following:
The conda.yaml is the definition of your python environment that is needed to make your Model work. MLmodel is a File that contains meta-data about your model – the syntax is very poorly explained in the official docs. The pkl will contain pickled data for our model.
Serving the Model with MLFlow
Now we need to serve our Machine Learning Model. You can either create a Flask server, like we discussed in this post. But we are going to use the built-in feature of MLFlow serve. This will spawn a Flask Server for us and do all the work. We start a ML Server with
Now we can invoke prediction process for our model. We have to create a POST Request to http://localhost:5000/invocations . Obviously, if you deploy this server into an EC2 Instance or another public server, you’ll be able to call your public ip to invoke your ML Model like: http:PUBLIC_IP:5000/invocations.
Check out the Mlflow Online Courses
MLflow for Beginners
MLOps is a breed of Machine Learning and DevOps. The most popular framework in this niche is the MLFlow Framework. Do you need to create trackable, reproducible and scalable Machine Learning Applications? This Online Course willl teach you MLflow from scratch. Pre-Order Course – still in development.
I’ll use a UI Request Generator and create a POST Request. I expect the ML Server to receive this Request, make some predictions with the model, and give me back some results. The data that our Model needs to run predictions, will be supplied in the POST Request in JSON Format.
In the Screenshot above you see the POST Request to our Server and the JSON Payload with our Prediction Data. The data has to be in JSON Format. And this Format has to have keys of Columns and Data. These Columns will be converted to a pandas DataFrame. This DataFrame will then be used to insert into your model to run the predictions.
Below is the result of running predictions on our Model with the supplied data. And that is it. The Workflow is pretty simple once you got it. You create an MLModel out of your current Machine Learning Model. You use this new MLModel Format to serve your model. The hardest part is most probably the correct converting of your Python Model Classes/Function into the MLModel Format without any limits.
We compare two Machine Learning and Data Science frameworks – MLFlow vs. Metaflow. These Data Science and Machine Learning Frameworks are the most popular in their category – ML. They provide you a fixed set or best practices, methods, classed and helping tools (like UIs or APIs). They support your ML or DS Project Lifecycle.
MLFlow was developer and open sourced by Databricks. Here is how MetaFlow describes itself in their Intro Blog Post:
MLflow is designed to work with any ML library, algorithm, deployment tool or language. It’s built around REST APIs and simple data formats (e.g., a model can be viewed as a lambda function) that can be used from a variety of tools, instead of only providing a small set of built-in functionality. This also makes it easy to add MLflow to your existing ML code so you can benefit from it immediately, and to share code using any ML library that others in your organization can run.
The official Metaflow description of itself is very good, so here goes copy & paste:
Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix to boost productivity of data scientists who work on a wide variety of projects from classical statistics to state-of-the-art deep learning.
Our framework provides a unified API to the infrastructure stack. It’s required to execute data science projects – from prototype to production.
Community, Popularity and Reliability
The popularity of framework is pretty important. Here is why. When you run into a problem, you will turn to Google and StackOverflow for help and you will beg that someone knows what framework you are using. So here is a Google Trends for MLFlow vs Metaflow. As you can see below, metaflow peaked shortly in late 2019. But overall, MLFlow wins.
Another critical point is the Community behind a Codebase. The bigger the community the more reliable (in most cases) the codebase. The following Github Screenshots speak for themselves. And MLFLow wins again.
Meta-Logging, Training, Deploying
A ML Framework should provide us with a healthy balance between concrete ways to implement a new project – which reduces complexity. But it also should give us enough freedom to experiment and stay flexible for unusual business/technical environments.
The most pressing issue, in my experience, is the fact that when you train your model locally, you don’t save any data about this process. Re-producing the same results may be difficult. Or even a more simple task, just retrieve the training results from last week…you can’t, cause you started to tune your hyperparameters and retrained your model 10 times more that day. An ML framework should collect metadata about these and other similar processes.
Let’s take a look at MLFlow from a practical side. Installation, Usage, UI, etc.
pip3 install mlflow
Usage: mlflow [OPTIONS] COMMAND [ARGS]...
--version Show the version and exit.
--help Show this message and exit.
artifacts Upload, list, and download artifacts from an MLflow artifact...
azureml Serve models on Azure ML.
db Commands for managing an MLflow tracking database.
experiments Manage experiments.
models Deploy MLflow models locally.
run Run an MLflow project from the given URI.
runs Manage runs.
sagemaker Serve models on SageMaker.
server Run the MLflow tracking server.
ui Launch the MLflow tracking UI for local viewing of run...
Serving on http://localhost:5000
Now let’s run the basic dummy script below to generate some meta/tracking data.
# Start an MLflow run
# Log a parameter (key-value pair)
# Log a metric; metrics can be updated throughout the run
mlflow.log_metric("foo", 2, step=1)
mlflow.log_metric("foo", 4, step=2)
mlflow.log_metric("foo", 6, step=3)
# Log an artifact (output file)
with open("output.txt", "w") as f:
MLFlow UI Experiments
On the frontpage you can see all the executions that i executed either with “mlflow run …” or with “python mlflow_code.py”.
You can see even those executions that failed. Which is pretty awesome for developing and debugging.
Extremely important is the feature of comparing different experiments with one another which you can do on this page.
MLFlow UI Params
On the Single Experiment Page you’ll see stuff like Date, User, Source, Duration of this particular Experiment (Execution, Flow).
You also can see Tags and a List of Artifacts with Preview mode – which is awesome. In our case the dummy code from above generated a text file with “Hello World” in it!
MLFlow UI Metric Chart
Another awesome feature of MLFLow is the chart which will display your metrics (that you set up manually) in a chart.
First of all, i’m on Windows. You can’t use MetaFlow on Windows without some crazy tweaks. In fact, MetaFlow said they won’t support Windows. So this is already is a nogo for me (shouldn’t be a problem if you are on Linux). For the purpose of demonstration i’ll boot a Linux EC2 Instance and play around there – obviously, you can’t develop effectively with EC2.
$ pip3 install metaflow==2.0.1
Metaflow (2.0.1): More data science, less engineering
http://docs.metaflow.org - Read the documentation
http://chat.metaflow.org - Chat with us
email@example.com - Get help by email
metaflow tutorials Browse and access metaflow tutorials.
metaflow configure Configure metaflow to access the cloud.
metaflow status Display the current working tree.
metaflow help Show all available commands to run.
$ metaflow tutorials pull
Pulling episode "00-helloworld" into your current working directory.
Pulling episode "01-playlist" into your current working directory.
Pulling episode "02-statistics" into your current working directory.
Pulling episode "03-playlist-redux" into your current working directory.
Pulling episode "04-playlist-plus" into your current working directory.
Pulling episode "05-helloaws" into your current working directory.
Pulling episode "06-statistics-redux" into your current working directory.
Pulling episode "07-worldview" into your current working directory.
To know more about an episode, type:
metaflow tutorials info [EPISODE]
$ cd metaflow-tutorials/00-helloworld
That’s it. MetaFlow has a fancy CLI and no UI (to my knowledge). Compared to MLFlow, Metaflow has a more granular process control. For example, you can track and manage every single method in your Python code. They are so called steps in MetaFlow and they (python methods – metaflow steps) are controlled via Python decorators. The file “helloworld.py” file contains a so called “Flow” (Collection of “steps” or python methods).
$ python3 helloworld.py show
Metaflow 2.0.1 executing HelloFlow for user:ubuntu
A flow where Metaflow prints 'Hi'.
Run this flow to validate that Metaflow is installed correctly.
This is the 'start' step. All flows must have a step named 'start' that
is the first step in the flow.
A step for metaflow to introduce itself.
This is the 'end' step. All flows must have an 'end' step, which is the
last step in the flow.
$ python3 helloworld.py run
Metaflow 2.0.1 executing HelloFlow for user:ubuntu
Validating your flow...
The graph looks good!
Pylint not found, so extra checks are disabled.
Creating local datastore in current directory (/home/ubuntu/metaflow-tutorials/00-helloworld/.metaflow)
2020-01-30 14:50:52.288 Workflow starting (run-id 1580395852284119):
2020-01-30 14:50:52.291 [1580395852284119/start/1 (pid 18456)] Task is starting.
2020-01-30 14:50:52.660 [1580395852284119/start/1 (pid 18456)] HelloFlow is starting.
2020-01-30 14:50:52.691 [1580395852284119/start/1 (pid 18456)] Task finished successfully.
2020-01-30 14:50:52.695 [1580395852284119/hello/2 (pid 18462)] Task is starting.
2020-01-30 14:50:53.068 [1580395852284119/hello/2 (pid 18462)] Metaflow says: Hi!
2020-01-30 14:50:53.101 [1580395852284119/hello/2 (pid 18462)] Task finished successfully.
2020-01-30 14:50:53.105 [1580395852284119/end/3 (pid 18468)] Task is starting.
2020-01-30 14:50:53.478 [1580395852284119/end/3 (pid 18468)] HelloFlow is all done.
2020-01-30 14:50:53.511 [1580395852284119/end/3 (pid 18468)] Task finished successfully.
2020-01-30 14:50:53.512 Done!
You see a pretty verbose logging over what step (method) is being executed. Every single Step like “start”, “hello” and “end” corresponds to a python method inside the “helloworld.py” file.
Conclusion – MLFlow vs. Metaflow
Short version: go with MLFlow!
And here is why: Metaflow lacks an overview or a UI that will make this metadata, logging & tracking more accessible to us developers. Also the easy comparison between flows or models isn’t there. Metaflow seems to be highly intertwined with AWS (Sagemaker), which is great.
Compared to a framework that integrates also with Google Cloud, MS Azure, etc. – not so great. The concept of steps gives you a granular control over your Project. Also, you can define python dependencies on method level – meaning that every step could have it’s own versions of libraries – which is awesome.
The Framework MLFlow tops with the less intrusive structure – it doesn’t try gain control over your methods with decorators. It has a useful UI and integrations to many Cloud providers.