by Rosaria Silipo, Kathrin Melcher, Adrian Nembach, Corey Weisinger

10 questions about deep learning

analysis
Mar 12, 20209 mins
AnalyticsArtificial IntelligenceDeep Learning

Learn why neural networks are so powerful, how and where they’re used, and how to get started — no programming necessary

ai artificial intelligence ml machine learning abstract face
Credit: kentoh / Getty Images

It seems everywhere you look nowadays, you will find an article that describes a winning strategy using deep learning in a data science problem, or more specifically in the field of artificial intelligence (AI). However, clear explanations of deep learning, why it’s so powerful, and the various forms deep learning takes in practice, are not so easy to come by.

In order to know more about deep learning, neural networks, the major innovations, the most widely used paradigms, where deep learning works and doesn’t, and even a little of the history, we have asked and answered a few basic questions.

What is deep learning exactly?

Deep learning is the modern evolution of traditional neural networks. Indeed, to the classic feed-forward, fully connected, backpropagation trained, multilayer perceptrons (MLPs), “deeper” architectures have been added. Deeper means more hidden layers and a few new additional neural paradigms, as in recurrent networks and in convolutional networks.

What is the difference between deep learning and neural networks?

There is no difference. Deep learning networks are neural networks, just with more complex architectures than were possible to train in the 1990s. For example, long short-term memory (LSTM) units in recurrent neural networks (RNNs) were introduced in 1997 by Hochreiter and Schmidhuber but never found extensive adoption because of the long computational times and high computational resources that were required. Multilayer perceptrons with more than one hidden layer have also been around for a long time, and their benefits were clear. The main difference is that modern computational resources have made their implementation feasible. 

Is deep learning mainly about faster and more powerful computational resources?

In general, faster and more powerful computational resources have allowed for the implementation and experimentation of more powerful and more promising neural architectures. It is clear that spending days in network training cannot rival the few minutes spent in training the same network with the help of GPU acceleration.

What was the breakthrough project that triggered the deep learning popularity?

The big breakthrough was in 2012 when the deep learning-based AlexNet network won the ImageNet challenge with an unprecedented margin. The top-five error rate of AlexNet was 15 percent, while the next best competitor ended up with 26 percent. This victory kicked off a surge in deep learning networks, and the best models nowadays attain error rates below the 3 percent mark.

That’s particularly impressive if you consider that the human error rate is around 5 percent.

What makes deep learning so powerful?

In a word, flexibility. On the one hand, neural networks are universal function approximators, which is smart talk for saying that you can approximate almost anything using a neural network—if you make it complex enough. On the other hand, you can use the trained weights of a network to initialize the weights of another network that performs a similar task. This is called transfer learning, and you would be surprised how well it works, even for tasks that seem quite dissimilar at first glance.

What are the most widely used neural network paradigms?

There are four very successful and widely adopted deep learning paradigms: LSTM units in recurrent neural networks, convolutional layers in convolutional neural networks (CNNs), encoder-decoder structures, and generative adversarial networks (GANs).

RNNs are a family of neural networks used for processing sequential data, like text (e.g., a sequence of words or characters) or time series data. The idea is to apply a copy of the same network at each time step and connect the different copies via some state vectors. This allows the network to remember information from the past. Popular unit network structures in RNNs are gated recurrent units (GRUs) and LSTMs.

CNN layers are especially powerful for data with spatial dependencies like images. Instead of connecting every neuron to the new layer, a sliding window is used, which works like a filter.  Some convolutions may detect edges or corners, while others may detect cats, dogs or street signs inside an image.

Another often used neural network structure is the encoder-decoder network. A simple example is an autoencoder where a neural network with a bottleneck layer is trained to reconstruct the input to the output. A second application of encoder-decoder networks is neural machine translation where encoder-decoder structure is used in an RNN. The LSTM-based encoder extracts a dense representation of the content in the source language, and the LSTM-based decoder generates the output sequence in the target language.

And, of course, the generative adversarial networks. A generative adversarial network is composed of two deep learning networks, the generator and the discriminator. Both networks are trained in alternating steps competing to improve themselves. GANs have been successfully applied to image tensors to create anime, human figures and even van Gogh-like masterpieces.

Has deep learning taken over the entire machine learning world?

No, at least not yet. There are certain domains, like computer vision, where you can’t get around deep learning anymore, but there are other areas, such as tabular data, that have proven to be a challenge for deep learning.

In the case of tabular data, which is still the main format used for storing business data, deep learning is not doing terribly poorly there. However, training a deep learning model for days on an expensive GPU server is hard to justify if you can get similar accuracy using random forests or gradient boosted trees, which you can train within a few minutes on a decent laptop.

Do I need to know how to code to use deep learning?

Not really. It is true that most deep learning paradigms are available in TensorFlow and Keras and that both of them require Python skills. However, in our open source KNIME Analytics Platform, we provide a graphical user interface (GUI) to handle exactly those Keras and deep learning libraries using TensorFlow in the back end. You can build a neural architecture as complex as you wish just by dragging and dropping the appropriate nodes one after the other.

An example is shown in Figure 1 below, where we trained an LSTM-based RNN to generate free text. The model creates fake names that resemble mountain names for a new outdoor clothing line. At the top (the brown nodes), you can see where we built the neural architecture, which we then trained using the Keras Network Learner node. The trained network, opportunely modified, is then saved in a TensorFlow format.

knime lstm rnn KNIME

Figure 1. Constructing and training an LSTM-based RNN to generate free text. At the top, the brown nodes build the network architecture. Then, the Keras Network Learner node trains the network, which after some appropriate post-processing is saved in a TensorFlow file.

Where can I find examples of deep learning networks?

You can find plenty on our community KNIME Hub. For example, recurrent neural networks with LSTM units can be found in this example for free text generation (also shown in Figure 1) or in this other example for time series prediction. Also, there are several example workflows using convolutional neural networks to process images, such as “Building a CNN from scratch” or “Train simple CNN.” A simple feed-forward, fully connected multilayer autoencoding structure was built and used as a solution to a fraud detection task. I am sure many more have been uploaded by the community as we speak.

If I use the KNIME Analytics Platform, do I need to host my work on the cloud?

KNIME Analytics Platform is an open source application and so are its integrations, including the Keras and TensorFlow integrations. You can install them wherever you wish, i.e. in the public cloud of your choosing or on your machine. It is clear, though, that the more powerful the machine, the faster the execution. You can even apply GPU acceleration in the KNIME Keras integration. You just need a GPU-equipped machine with CUDA installed, a Conda environment with Keras for GPU installation, and the KNIME Keras integration on top of that.

Rosaria Silipo is principal data scientist at KNIME. She is the author of more than 50 technical publications, including her most recent book Practicing Data Science: A Collection of Case Studies. She holds a doctorate degree in bio-engineering and has spent 25 years working on data science projects for companies in a broad range of fields, including IoT, customer intelligence, the financial industry, and cybersecurity. Follow Rosaria on Twitter, LinkedIn, and the KNIME blog.

Kathrin Melcher is a data scientist at KNIME. She holds a master degree in mathematics from the University of Konstanz, Germany. She enjoys teaching and applying her knowledge to data science, machine learning and algorithms. Follow Kathrin on LinkedIn. 

Adrian Nembach is a KNIME software engineer, specializing in machine learning algorithms including deep learning since 2015. He has an MSc in computer and information science from the University of Konstanz, where he focused on deep learning for computer vision. Follow Adrian on LinkedIn.

Corey Weisinger is a data scientist at KNIME in Austin, Texas. He studied mathematics at Michigan State University, focusing on actuarial techniques and functional analysis. Prior to KNIME, he worked as an analytics consultant for the auto industry in Detroit, Michigan. He currently focuses on signal processing and numeric prediction techniques and is the author of the guidebook, “From Alteryx to KNIME.” Follow Corey on LinkedIn.

For more information on KNIME, please visit www.knime.com and the KNIME blog.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.