Review: Azure Machine Learning challenges Amazon SageMaker

Microsoft Azure’s plush, Python-based environment covers the full machine learning and deep learning development cycle

Credit: Thinkstock

Azure Machine Learning Service is Microsoft’s latest offering for developers and data scientists in the custom cloud machine learning and deep learning category. Azure Machine Learning Service adds to a suite of Azure AI products that includes numerous AI toolkits, chatbot and IoT edge services, data science VMs, and pre-built services for vision, speech, language, knowledge, and search.

InfoWorld

The AI toolkits include Visual Studio Code Tools for AI, the older drag-and-drop Azure Machine Learning Studio, MMLSpark deep learning tools for Apache Spark, and the Microsoft Cognitive Toolkit, previously known as CNTK, which is being de-emphasized in favor of other machine learning and deep learning frameworks.

Using cloud resources for training deep learning models makes eminent sense in many cases. Using the cloud for training doesn’t necessarily replace the convenience and low operating cost of using your own computer for model building, especially if you have one with lots of RAM and a capable GPU such as an Nvidia Titan RTX. On the other hand, using the cloud offers the opportunity to add compute resources as needed, potentially reducing the time it takes to complete your experiments and find a sufficiently accurate predictive model.

All of the major cloud services now offer machine learning and deep learning development environments. On AWS, that’s primarily Amazon SageMaker, which I reviewed in May 2018. On the Google Cloud Platform, that’s primarily Cloud Machine Learning Engine and the beta Cloud AutoML. On the IBM cloud, that’s primarily IBM Watson Studio. I’ll compare Azure Machine Learning Services with Amazon SageMaker later in this review.

Azure Machine Learning represents an evolution of Microsoft’s approach to a cloud machine learning development environment. The old (but still active) Machine Learning Studio product is an inexpensive, rather dumbed-down, drag-and-drop design environment for the whole data analysis pipeline. It provides a good selection of basic machine learning algorithms, but little support for deep learning and no support at all for GPUs or FPGAs.

azure machine learning service — The Azure Machine Learning Service is part of an integrated suite of models, frameworks, services, infrastructure, and deployment options. The new features of Azure Machine Learning include AutoML, hyperparameter tuning, distributed deep learning, and FPGA support for specific models.

Azure Machine Learning overview

Azure Machine Learning, by contrast, embraces Python programming and builds on several APIs and frameworks, including all of the currently popular deep learning frameworks. The new features of Azure Machine Learning include AutoML, hyperparameter tuning, distributed deep learning, and FPGA support for specific models—all hot-button items desired by data scientists and model developers.

The overall workflow for Azure Machine Learning, shown in the figure below, moves from data preparation to model building, to training and testing, to model management and deployment. This linear view of the process is grossly oversimplified; a more realistic diagram would show arrows going both forward and backward.

azure machine learning workflow — This figure is a high-level view of the Azure Machine Learning workflow. In real life, most projects require iteration among the steps to find the best models, and to adapt the models to data that changes over time.

Azure Machine Learning model development

Azure Machine Learning supports five environments for model development: Azure Notebooks, the Data Science Virtual Machine (DSVM), Jupyter Notebooks, Visual Studio Code, and Azure Databricks. In all cases the Azure Machine Learning Python SDK is a required additional component.

Azure Notebooks is a Jupyter Notebooks service that is hosted in the Azure cloud. The Azure Machine Learning SDK is already installed in Azure Notebooks. The Azure team recommends using Azure Notebooks to get started, and has provided a selection of Azure ML samples as Azure Notebooks.

The Data Science Virtual Machine already has Python 3, Conda, Jupyter Notebooks, and the Azure Machine Learning SDK installed. The DSVM comes with popular machine learning and deep learning frameworks, tools, and editors for developing machine learning solutions.

You may already have Jupyter Notebooks installed in your development environment. You may also have Visual Studio Code. Both can be enhanced with the Azure Machine Learning SDK and plug-ins.

Azure Databricks is an Apache Spark-based service with a Jupyter Notebooks UI. You can install the Azure Machine Learning SDK in that environment and deploy models to run on big data.

Azure Machine Learning framework support

Azure Machine Learning supports any Python-based machine learning or deep learning framework. According to Microsoft, the most popular ones—i.e., Scikit-learn for classical machine learning and PyTorch, TensorFlow, and Chainer for deep learning—have first-class support in the SDK via the Estimator class. (I only found specific Estimator classes for PyTorch and TensorFlow when I viewed the documentation on January 18, 2019.) You can create and use an Estimator to submit any training code you want to run on remote compute, whether it’s a single-node run or distributed training across a GPU cluster.

The Estimator support for TensorFlow includes two flavors of distributed back-end training, MPI/Horovod and Parameter Server. The Parameter Server option uses a native TensorFlow facility. The Estimator support for PyTorch only includes one flavor of distributed back-end training, MPI/Horovod. The ability to set up distributed training by changing a few arguments in the Estimator constructor is very nice.

If you know about the new Estimator class in TensorFlow, you may find the Azure Machine Learning Estimator class a little confusing. To avoid that, think of both classes in terms of their fully-qualified names, tf.estimator and azureml.train.estimator.

Azure Machine Learning services

Azure Machine Learning provides most (if not all) of the services you need to perform data science in the cloud end to end. That includes the basic data preparation, experiment tracking, model training, model management and deployment, and model monitoring. It also includes hyperparameter tuning, AutoML, and pipelines.

Data prep

The Azure Machine Learning Data Prep SDK helps you load, transform, and write data for machine learning workflows. The azureml.dataprep package was designed to handle both the small data sets you’re likely to work with on your own computer and the huge data sets you’ll encounter when you work with Spark or data lakes. Don’t underestimate the importance of data preparation: It often accounts for the bulk of the effort required to create a good model, even though the data prep APIs tend to use modest amounts of CPU time.

Model training

Training is the piece of the deep learning workflow that takes the most compute time. Azure Machine Learning can use essentially any Python framework for machine learning or deep learning, as discussed in the section on supported frameworks and the Estimator class above. Azure Machine Learning can train your model locally or in the cloud, and can send training jobs to a variety of compute targets including Azure Machine Learning Compute, remote VMs, Azure Databricks, Azure Data Lake Analytics, and Azure HDInsight.

Your Azure compute targets may use VMs with many CPUs, one or more GPUs, and/or one or more FPGAs. The targets can be configured as scalable clusters that instantiate on demand and stop themselves when training runs are complete. Using expensive GPU instances only for short periods while actively training models keeps the costs down. Amazon SageMaker does essentially the same thing.

Experiment tracking

Your Azure Machine Learning workspace keeps track of all of your experiments, training runs within experiments, pipelines, compute targets, models, images, deployments, and activities. You and your colleagues can view these at any time. The screenshot below focuses on the Experiments list.

azure machine learning experiments — Azure Machine Learning helps you manage your experiments in the dashboard. Drilling down into one of the listed experiments will show you the runs you have done.

AutoML and hyperparameter tuning

Sometimes finding the best model for a given data set feels like finding a needle in a haystack. Azure Machine Learning has two mechanisms to help you with your search: AutoML and hyperparameter tuning. With AutoML, you provide the data set, and Azure Machine Learning automatically sweeps through features, algorithms, and hyperparameters. You can control the number of models generated and trained, the maximum number of iterations, the primary metric, and the number of cross-validation splits.

The algorithms are chosen from a list that depends on the type of experiment (classification, regression, or prediction). You can optionally blacklist algorithms that you know don’t work well for your data, but when you are first working with a data set you’ll want to let AutoML explore all possibilities.

Hyperparameter tuning is more focused. You provide an existing experiment, specify the hyperparameters you want to sweep (such as the learning and dropout rates), and Azure Machine Learning will perform the variational study for you, automatically stopping training runs that aren’t performing well before the full maximum number of iterations.

Pipelines

Azure Machine Learning Pipelines allow the data scientist to modularize model training into discrete steps such as data movement, data transforms, feature extraction, training, and evaluation. Pipelines are a mechanism for sharing and reproducing models, and help you to skip steps that have already run when you are trying variations on a specific step, such as training. Not every compute target is “pipeline-friendly,” and some compute targets can only be called as pipeline steps.

Model management, deployment, and monitoring

Once you have a model that is good enough for your purposes, you can register and deploy it. On the Azure cloud, you’ll typically package models into Docker containers and deploy them in the Azure Kubernetes Service (AKS). If you’re deploying to an edge device, which can be anything from a single smart sensor to the controller for an entire factory, the Docker containers are fed into Azure IoT Edge. No matter where you deploy, Azure Machine Learning can monitor both service health and model health.

Azure Machine Learning examples

I spent about a day going through many of the supplied Azure Machine Learning samples. My initial inclination was to develop with Visual Studio Code locally on my MacBook Pro. However, I was unable to install the Azure Machine Learning SDK from the command line; the details are in the sidebar.

Azure Machine Learning SDK installation issue

The machine was running MacOS 10.14.2 and Python 3.7.2, without Anaconda. Based on the error messages, I guessed that the required dependency on TensorFlow couldn’t be fulfilled. Simply installing TensorFlow with pip on that machine didn’t work, either.

TensorFlow did install on my iMac running MacOS 10.13.6 and Python 3.6.5, with Anaconda. It was not clear to me what went wrong (of the three possible factors) on my MacBook Pro. The TensorFlow issue list didn’t shed any light on the problem.

I eventually discovered that TensorFlow doesn’t yet support Python 3.7. I think I could install the Azure Machine Learning SDK on my MacBook Pro running Mohave (without nuking Python 3.7) if I Installed Python 3.6.5 and created an environment that included it.

Fortunately, using Azure Notebooks doesn’t require any local installations, other than a working web browser, and all of the examples were set up for Notebooks, so I did my experiments that way.

azure machine learning notebooks — The “Welcome” Azure Machine Learning notebooks demonstrate approximating the value of Pi. If you dive into the how-to-use-azureml folder or load the Azure/MachineLearningNotebooks GitHub page you’ll find some more interesting samples.

Among the more interesting examples was one for AutoML regression after data prep and merging of two NYC taxi data sets, with the summary shown below. The sample is in the tutorials folder of the Welcome project.

azure machine learning jupyter widget — A Jupyter widget in Azure Notebooks showing the best results from a 30-model AutoML regression run.

Another interesting sample, from the Azure/MachineLearningNotebooks GitHub repository, shows how to do distributed Horovod training of a language-based TensorFlow model, using two Standard_NC6 VMs, each of which has 6 vCPU, 56 GiB RAM, and 1 K80 GPU with 8 GiB of GPU memory.

azure machine learning tensorflow estimator — The notebook above shows how to create a TensorFlow estimator that will run on two nodes using MPI/Horovod distribution.

My day’s work and five-plus days of storage incurred charges of a whopping $1.27. Most of that was for the container registry ($0.17 per day). The charge for the distributed training on NC6 VMs with K80 GPUs was $0.29.

A for Azure Machine Learning

Overall, I found Azure Machine Learning to be an excellent environment for machine learning and deep learning development and deployment. Aside from the issue I encountered installing the SDK on my MacBook Pro—due to my running an unsupported version of Python (whoops)—everything worked well, and nothing cost more than I expected.

I like the way that Azure Notebooks integrates Jupyter with Azure Machine Learning. I like the way you can configure estimators to run on an autoscaling cluster. I was annoyed by the time it took to configure a compute cluster on the first run, but in future runs with the same specs that overhead pretty much disappears.

I like the way AutoML and hyperparameter tuning allow you to delegate the search for a best model to the service. I also like how easily you can incorporate TensorFlow and PyTorch models as estimators, and perform distributed training on CPUs and GPUs.

I haven’t tested all of the features of Azure Machine Learning, and I haven’t used it for real data science on large, dirty data sets. From what I did test, however, Azure Machine Learning seems to be a worthy competitor to Amazon SageMaker.

—

Cost: $0.051 to $18.46 per hour depending on number of CPUs, GPUs, and FPGAs, and amount of RAM; storage costs additional.

Platform: Microsoft Azure, plus optional local Windows, MacOS, or Linux development environment with Python 3.

Topics

About

Policies

Our Network

More

Review: Azure Machine Learning challenges Amazon SageMaker

Microsoft Azure’s plush, Python-based environment covers the full machine learning and deep learning development cycle

Azure Machine Learning overview

Azure Machine Learning model development

Azure Machine Learning framework support