Home Analytics 5 takeaways on scaling machine learning

by Isaac Sacolick

Contributing writer

5 takeaways on scaling machine learning

feature

May 02, 20196 mins

AnalyticsArtificial IntelligenceMachine Learning

Twitter and Facebook can teach us a lot about effective artificial intelligence

virtual brain / digital mind / artificial intelligence / machine learning / neural network

Credit: MetamorWorks / Getty Images

Many companies are just starting their machine learning journeys and 37% of organizations have implemented artificial intelligence according to a recent Gartner survey. If you’ve opened the door to machine learning, you might want to review 10 questions before starting a machine learning proof of concept or the complete guide to AI, machine learning, and deep learning.

Machine learning is evolving, with new commercial breakthroughs, scientific advancements, framework improvements, and best practices frequently reported.

We have a lot to learn from organizations that have large-scale machine learning programs and view artificial intelligence as core to their business. At the O’Reilly Artificial Intelligence Conference in New York last month I saw several common trends between Facebook’s and Twitter’s machine learning programs.

Understand business needs and competitive factors

At Facebook, machine learning is used in many areas. On the Facebook home page, it searches, translates language, scans the news feed, recognizes faces in uploaded photos, and sees what ads are presented. Behind the scenes, machine learning is used for content understanding, speech recognition, content integrity, sentiment analysis, objectionable content detection, and fraudulent account detection.

Similarly, you can see Twitter’s machine learning at work in its tweet ranking, ad selection, search functions, and user recommendations. Machine learning is also used to flag abusive tweets, spam, and images not safe for work.

What may be less obvious is the scale of each machine learning operation and how the two companies invest in differentiating capabilities.

Facebook performs more than 200 trillion predictions per day for its 2.6 billion users. Many of its users are global, with bandwidth limitations, and a significant number of the interactions are done via mobile phones.

This poses some challenges as 61% of global mobile users have phones six years or older; less than 10% of users are on the most advanced smartphones. Part of Facebook’s strategy is to shift more neural network computing to edge devices to drive scale, lower latency, and offer more personalized machine learning models. Facebook’s machine learning technology stack reflects its goals of making it easy to research new models while delivering inference at scale and offloading some computation to edge devices.

Twitter optimizes its models on scale and latency requirements. It performs tens of millions of predictions per second and trains some models on tens of terabytes per day. The company focuses on optimizing latency, the time it takes a model to respond, and has defined a tens of milliseconds prediction latency budget.

Standardize platforms to drive scale

Both Facebook and Twitter had early starts to their machine learning programs. They began with unstructured approaches but are now taking steps to standardize their platforms, frameworks, and pipelines. Twitter aims to make it easier to share models and wants to reduce duplicative work. Facebook is addressing pain points in reliability, scalability, efficiency of running models, and the developer experience of its scientists and engineers.

Both companies’ platforms are optimized around similar data pipeline processing principles. Both have stages to process data, extract features, train models, and deploy models to production environments.

The two social media giants are taking steps to standardize on selected machine learning frameworks. Facebook was using PyTorch to enable easy research and Caffe2 to run production inference models at scale. It has consolidated this to PyTorch 1.0 which combines both these capabilities, and it uses Caffe2Go to run its mobile neural networks. Twitter was using a mix of Lua Torch, TensorFlow, Scikit, PyTorch, and other platforms. It is now standardizing on Scalding, PySpark, Tensorflow, and Apache Airflow.

Enable scientists, developers, and engineers to collaborate

Twitter and Facebook described different efforts to enable the productivity, knowledge sharing, and code reusability between data scientists, developers, and engineers.

Many data teams establish data catalogs and dictionaries as part of their data governance programs. These tools make it easier for everyone to understand the underlying data models, field definitions, and quality constraints when using data for analytics or machine learning experiments.

Twitter takes this one step further by standardizing features used in machine learning experiments and capturing them in a feature store catalog. This reduces duplication and helps scientists train new models with less effort spent processing data into features.

Facebook is also cataloging and standardizing its features, automating training, and developing tools for managing and deploying models. FBLearner is its standard platform to support these capabilities.

In addition, Facebook is standardizing the types of machine learning being used. For example, rankings for news feed, ads, search, and anomaly detection use multilayer perceptrons. It also makes use of convolutional neural networks and support vector machines for facial recognition, and recurrent neural networks for language translation.

Automate continuous training of machine learning models

Just like software applications, machine learning models require ongoing training and modification. Both Facebook and Twitter automate this training so that models get retuned with fresh data.

Twitter recognized that pushing models into production kicks off new requirements of keeping models trained on the latest data and updating them when data scientists have model improvements. Apache Airflow automates both the training and deployment pipelines.

Facebook was specific around its strategies. Frequently changing models such as news feeds are retrained hourly or less, whereas language translation and facial recognition models are trained on a cycle of weeks to months.

Computing costs and availability of computing resources are also factors in how often models are retrained. Facebook may have a strategic computing advantage as it has developed hardware stacks optimized for different types of machine learning workloads. Twitter focuses on optimizing algorithm performance and scheduling training at non-peak hours when computing resources across the globe are underutilized.

Plan for the long term

Compared to most organizations, Twitter and Facebook are way ahead of the maturity curve in applying and scaling machine learning. What can you learn from their success?

Start with small efforts, prove the business value by getting models trained and running in production, and then increase efforts to scale and mature practices. Maturing practices requires disciplines similar to application development, including standardizing frameworks, defining architecture, selecting maintenance cycles, optimizing performance, and automating deployment pipelines.

You can see that machine learning delivers significant value but also requires ongoing investigations around performance and investment to make improvements. Models get trained, deployed, optimized, and then replaced with even better models. Machine learning is a new tool and skill set but will become increasingly more important to organizations that have to improve user experiences or drive competitive value with their data.

by Isaac Sacolick

Contributing writer

Isaac Sacolick, President of StarCIO, a digital transformation learning company, guides leaders on adopting the practices needed to lead transformational change in their organizations. He is the author of Digital Trailblazer and the Amazon bestseller Driving Digital and speaks about agile planning, devops, data science, product management, and other digital transformation best practices. Sacolick is a recognized top social CIO, a digital transformation influencer, and has over 900 articles published at InfoWorld, CIO.com, his blog Social, Agile, and Transformation, and other sites.

The opinions expressed in this blog are those of Isaac Sacolick and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.

Topics

About

Policies

Our Network

More

5 takeaways on scaling machine learning

Twitter and Facebook can teach us a lot about effective artificial intelligence

Understand business needs and competitive factors

Standardize platforms to drive scale

Enable scientists, developers, and engineers to collaborate

Automate continuous training of machine learning models

Plan for the long term

More from this author

7 steps to improve analytics for data-driven organizations

How to choose a data analytics and machine learning platform

Advanced CI/CD: 6 steps to better CI/CD pipelines

12 principles for improving devsecops

10 principles for creating a great developer experience

7 innovative ways to use low-code tools and platforms

How to test large language models

What is agile methodology? Modern software development explained

Most popular authors

Show me more

OpenSilver 3.0 previews AI-powered UI designer for .NET

How to use FastEndpoints in ASP.NET Core

How Azure Functions is evolving

How to use dbm to stash data quickly in Python

How to auto-generate Python type hints with Monkeytype

How to make HTML GUIs in Python with NiceGUI

5 takeaways on scaling machine learning

Twitter and Facebook can teach us a lot about effective artificial intelligence

Understand business needs and competitive factors

Standardize platforms to drive scale

Enable scientists, developers, and engineers to collaborate

Automate continuous training of machine learning models

Plan for the long term

Related content

Beyond the usual suspects: 5 fresh data science tools to try today

Generative AI won’t fix cloud migration

HR professionals trust AI recommendations

Safety off: Programming in Rust with `unsafe`

More from this author

7 steps to improve analytics for data-driven organizations

How to choose a data analytics and machine learning platform

Advanced CI/CD: 6 steps to better CI/CD pipelines

12 principles for improving devsecops

10 principles for creating a great developer experience

7 innovative ways to use low-code tools and platforms

How to test large language models

What is agile methodology? Modern software development explained

Most popular authors

Show me more

OpenSilver 3.0 previews AI-powered UI designer for .NET

How to use FastEndpoints in ASP.NET Core

How Azure Functions is evolving

How to use dbm to stash data quickly in Python

How to auto-generate Python type hints with Monkeytype

How to make HTML GUIs in Python with NiceGUI