by Michael Berthold

How to manage machine learning models

how-to
Jan 09, 20198 mins
AnalyticsMachine LearningSoftware Development

Monitoring and management not only keep models in tune, but can help streamline the machine learning process

abstract arrows direction process magnifying glass search investigate
Credit: Getty Images

In today’s fast-paced analytics development environments, data scientists are often tasked with far more than building a machine learning model and deploying it into production. Now they’re charged with regularly monitoring, fine-tuning, updating, retraining, replacing, and jump-starting models—and in some cases, hundreds or even thousands of models collectively.

As a result, different levels of model management have emerged. In the following, I try to highlight each, from single model management all the way through building an entire model factory.

Machine learning workflow basics

You may be wondering, how do I use the result of my training procedure to score new incoming data? There are a lot of options, such as scoring within the same system that was used for training and exporting models in standardized formats. Alternatively, you can push models into other systems, like scoring models as SQL statements within a database or containerizing models for processing in an entirely different runtime environment. From the model management perspective, you just need to be able to support all required options.

The standard process looks like this:

machine learning model management 1 KNIME

Note: In reality, very often the model alone is not extremely helpful unless at least part of the data processing (transformation/integration) is a part of the “model” in production. This is where many deployment options show surprising weaknesses in that they only support deployment of the predictive model alone.

Machine learning model evaluation and monitoring

One vital part of model management is making sure the model keeps performing as it should. Periodically collecting data from the past—as many data scientists are forced to do—only provides the assurance that the model has not suddenly changed. Continuous monitoring allows you to measure whether the model is starting to “drift,” i.e. to become outdated because of reality changes. Sometimes it is also advisable to include manually annotated data to test border cases or simply make sure the model is not making gross mistakes.

Ultimately, model evaluation should result in a score measuring some form of model quality, such as classification accuracy. Sometimes you will want a more application-dependent measurement such as expected cost or a risk measure. What you do with that score, however, is another story.

Updating and retraining machine learning models

At the next stage, we move beyond monitoring to actually managing something. Suppose your monitoring solution starts reporting more and more errors. You can trigger automatic model updating, retraining, or even complete replacement of the model.

Some model management setups simply train a new model and then deploy it. However, since training can take significant resources and time, a more sensible approach is to make this switch dependent on performance. A performance threshold ensures that it is actually worth replacing the existing model. Run an evaluation procedure to take the previous model (often called the champion) and the newly (re)trained model (the challenger); score them and decide whether the new model should be deployed or the old one kept in place. In some cases, you may only want to go through the hassle of model deployment when the new model significantly outperforms the old one.

Even with continuous monitoring, retraining, and replacement, machine learning models can still struggle with seasonality if you don’t take precautions elsewhere in your management system. For example, if the model is predicting sales quotas of clothing, seasons will affect those predictions dramatically. If you monitor and retrain on a monthly basis, year after year, you can effectively train models to adjust to the current season. You can also manually set up a mix of seasonal models that are weighted differently, depending on the season.

Sometimes models need to guarantee specific behavior for certain cases. Injecting expert knowledge into model learning is one way to do this, but having a separate rule model in place that can override the output of the trained model is a more transparent solution.

While some models can be updated, many of the algorithms can be forgetful. Data from a long time ago will play less and less of a role in determining the model’s parameters. This is sometimes desirable, but it’s hard to properly adjust the rate of forgetting.

An alternative is to retrain a model, building a new model from scratch. This lets you use an appropriate data sampling (and scoring) strategy to make sure the new model is trained on the right mix of past and recent data.

Now, the management process looks a bit more like this:

machine learning model management 2 KNIME

Managing multiple machine learning models

Suppose you now want to continuously monitor and update/retrain an entire set of models. You could handle this in the same way as the single-model case, but with more than one model, issues arise that are connected to interface and actual management. How do you communicate the status of many models to the user and let her interact with them, and who controls the execution of all of those processes? There must be a dashboard view of all of the models with capabilities to manage and control individual models.

Most workflow tools allow their internals to be exposed as services, so you can envision a separate program making sure your individual model management process is being called properly. You can either build a separate application or use existing open source software that orchestrates the modeling workflows, supervises these processes, and summarizes their outputs.

Managing large numbers of machine learning models gets even more interesting when you group them into different model families. You can handle models similarly that are predicting very similar behavior. This is particularly useful if you regularly need a new model. When models are similar, you can save time and effort by initializing a new model from existing models in the family rather than starting from scratch or only training the new model on isolated past data. You can use either the most similar model (determined by some measure of similarity of the objects) or a mix of models for initialization.

The model management setup now looks like this:

machine learning model management 3 KNIME

If you abstract the interfaces between model families sufficiently, you should be able to mix and match at will. This allows new models to reuse load, transformation, (re)training, evaluation, and deployment strategies and combine them in arbitrary ways. For each model, you just need to define which specific process steps are used in each stage of this generic model management pipeline.

Take a look:

machine learning model management 4 KNIME

There may be only two different ways to deploy a model, but there are a dozen different ways to access data. If you had to split this into different families of model processes, you would end up with over a hundred variations.

Machine learning model factories

The final step in machine learning model management is to make the jump to creating model factories. This can be done by defining only the individual pieces (process steps) from above and combining them in flexible ways defined in a configuration file, for example. Then, whenever someone wants to alter the data access or the preferred model deployment later, you would only need to adjust that particular process step instead of having to fix all of processes that use it. This is a fantastic timesaver.

At this stage, it makes sense to split the evaluation step into two parts, the part that computes the score of a model and the part that decides what to do with that score. The latter can include different strategies to handle champion/challenger scenarios and be independent of how you compute the actual score.

Then, putting a model factory to work is actually straightforward. Configuration setups define which incarnation of each process step is used for each model pipeline. For each model, you can automatically compare past and current performance and trigger retraining and updating. This is described in detail in this white paper on scaling model processes for the enterprise.

This is a lot of information, but data scientists can master every level because they must. Today’s massive trove of information will soon seem miniscule. It is essential that we develop sound, reliable management practices now to handle the increasingly huge volumes of data and the accompanying flood of models to ultimately make sense of it at all.

Michael Berthold, Ph.D., is the founder and CEO at KNIME. His 25-plus years of research and industry expertise span data analytics, machine learning, artificial intelligence, and rule induction. Michael has a long history of working in academia as a professor at University of Konstanz as well as Carnegie Mellon and UC Berkeley, and in industry at Intel, Utopy, and Tripos. Follow Michael on Twitter, LinkedIn, and the KNIME blog.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.