Keras sequential models make deep neural network modeling about as simple as it can be Credit: mihtiander / Getty Images As I discussed in my review of PyTorch, the foundational deep neural network (DNN) frameworks such as TensorFlow (Google) and CNTK (Microsoft) tend to be hard to use for model building. However, TensorFlow now contains three high-level APIs for creating models, one of which, tf.keras, is a bespoke version of Keras. InfoWorld Keras proper, a high-level front end for building neural network models, ships with support for three back-end deep learning frameworks: TensorFlow, CNTK, and Theano. Amazon is currently working on developing a MXNet back end for Keras. It’s also possible to use PlaidML (an independent project) as a back end for Keras to take advantage of PlaidML’s OpenCL support for all GPUs. As an aside, the name Keras is from the Greek for horn, κέρας, and refers to a passage from the Odyssey. The dream spirits that come through the gate made of horn are the ones that announce a true future; the ones that come through the gate made of ivory, ἐλέφας, deceive men with false visions. TensorFlow is the default back end for Keras, and the one recommended for many use cases involving GPU acceleration on Nvidia hardware via CUDA and cuDNN, as well as for TPU acceleration in the Google Cloud. I used the TensorFlow back end configured for CPU-only to do my basic Keras testing on a MacBook Pro. Keras vs. PyTorch Keras (Google) and PyTorch (Facebook) are often mentioned in the same breath, especially when the subject is easy creation of deep neural networks. Both are designed to make it as simple as possible to build models. PyTorch says it’s designed for “fast, flexible experimentation.” Keras “was developed with a focus on enabling fast experimentation.” Both expose Python APIs. There are some practical differences between the two. While Keras is a front end for three DNN frameworks, PyTorch provides its own back ends, primarily C/C++ code adapted from Torch, with some production features from Caffe2. Keras has a high-level environment that reduces adding a layer to a neural network to one line of code in its sequential model, and needs one function call each for compiling and training a model. PyTorch model-building code can look very similar if you add layers using its sequential model, but PyTorch requires you to write your own optimization loop for training, as opposed to making a single call in Keras. Frankly, writing that loop isn’t a big deal. Both Keras and PyTorch let you work at a lower level if you want. Keras calls that level its model or functional API. Keras also allows you to drop down even farther, to the Python coding level, by subclassing keras.Model, but prefers the functional API when possible. PyTorch claims two distinctions: the ability to change the model dynamically from step to step during training, and the ability to compute gradients using tape-based back-propagation. Keras lacks dynamic modeling, but it does have tape-based gradients, courtesy of the TensorFlow back end’s GradientTape class. Keras also has a Scikit-learn API, so that you can use the Scikit-learn grid search to perform hyperparameter optimization in Keras models. In a way, that ability can replace the need for PyTorch-like dynamic models, especially if you’re doing your training on multiple GPUs. Essentially, you’re doing the hyperparameter optimizations in parallel training runs instead of within a single training. Keras simplicity The 30-second intro to Keras explains that the Keras model, a way to organize layers in a neural network, is the framework’s core data structure. The sequential model is a linear stack of layers, and the layers can be described with one call each. By contrast, describing a layer in TensorFlow takes multiple lines of code. The code for a simple Keras sequential model might look like this: import keras from keras.models import Sequential from keras.layers import Dense #Create Sequential model with Dense layers, using the add method model = Sequential() #Dense implements the operation: # output = activation(dot(input, kernel) + bias) #Units are the dimensionality of the output space for the layer, # which equals the number of hidden units #Activation and loss functions may be specified by strings or classes model.add(Dense(units=64, activation=’relu’, input_dim=100)) model.add(Dense(units=10, activation=’softmax’)) #The compile method configures the model’s learning process model.compile(loss=’categorical_crossentropy’, optimizer=’sgd’, metrics=[‘accuracy’]) #The fit method does the training in batches # x_train and y_train are Numpy arrays — just like in the Scikit-Learn API. model.fit(x_train, y_train, epochs=5, batch_size=32) #The evaluate method calculates the losses and metrics # for the trained model loss_and_metrics = model.evaluate(x_test, y_test, batch_size=128) #The predict method applies the trained model to inputs # to generate outputs classes = model.predict(x_test, batch_size=128) To understand that a little better, let’s dive into the architecture. Keras architecture As noted above, the model is the core Keras data structure. There are two main types of models available in Keras: the sequential model, and the Model class used with the functional API. Both sequential and functional models have methods or attributes for layers, inputs, outputs, summary(), get_config(), from_config(config), get_weights(), set_weights(weights), to_json(), to_yaml(), save_weights(), and load_weights(). I won’t dwell on keras.Model subclassing, which doesn’t have all the listed methods and attributes. Keras sequential models As I discussed earlier, you can use the model.add() method to add layers to sequential models. You can also list layer instances inside the Sequential() constructor: from keras.models import Sequential from keras.layers import Dense, Activation model = Sequential([ Dense(32, input_shape=(784,)), Activation(‘relu’), Dense(10), Activation(‘softmax’), ]) The first layer in a sequential model normally specifies its input shape or dimension. The other layers get their input shapes from the output of the previous layers; in the code above, the relu activation layer has an input dimension of 32, and the softmax activation layer has an input dimension of 10. There is a mechanism for delayed sequential model building that infers the input shape the first time fit() is called if you don’t specify the shape or dimension; it only seems to be mentioned in the sequential.py source code. Model compilation configures the learning process. It sets the optimizer, the loss function, and a list of metrics. Keras has a full set of all of these predefined, and calls the back end when appropriate. You can pass string identifiers for these, or instances of the appropriate classes. Training takes NumPy arrays of input data and labels as input. You normally call the fit() method to run the entire training process, but you can also feed in data batch by batch with the train_on_batch() method. If you need even more control, you can train a model on data from a Python generator function, using the fit_generator() method. Keras layers Keras has numerous layers pre-defined, organized into categories: core, convolutional, pooling, locally connected, recurrent, embedding, merge, advanced activations, normalization, and noise. There are also two layer wrappers, for time series generation and bidirectional RNNs, and an API for writing custom layers. For example, the core layers include Dense, the regular densely-connected neural network layer that does a dot product with optional bias and activation function; Activation, which applies an activation function; Dropout, which randomly drops input units to 0 to prevent overfitting; and several more. Convolutional layers can be 1D (temporal convolution), 2D (spatial convolution), 3D (spatial convolution over volumes), separable, transposed, cropping, upsampling, and so on. In general, layers pass most of the work to the back end (TensorFlow, etc.) where compute-intensive operations such as convolution of large tensors can be optimized with, for example, GPU or TPU support. Keras functional API The Keras functional API is useful for creating complex models, such as multi-input/multi-output models, directed acyclic graphs (DAGs), and models with shared layers. The functional API uses the same layers as the sequential model, but provides more flexibility in putting them together. In the functional API you define the layers first, and then create the model, compile it, and fit (train) it. The functional model that follows takes an input, runs it through two 64-unit Dense layers with ReLU (rectified linear unit) activation, and finally runs it through a 10-unit Dense layer with softmax (normalized exponential function) activation. It could just as easily have been created with a sequential model. The input could be the MNIST data set of handwritten numerals or something else that has 10 classes of 28×28 (784) pixel images. from keras.layers import Input, Dense from keras.models import Model # This returns a tensor inputs = Input(shape=(784,)) # a layer instance is callable on a tensor, and returns a tensor x = Dense(64, activation=’relu’)(inputs) x = Dense(64, activation=’relu’)(x) predictions = Dense(10, activation=’softmax’)(x) # This creates a model that includes # the Input layer and three Dense layers model = Model(inputs=inputs, outputs=predictions) model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’, metrics=[‘accuracy’]) model.fit(data, labels) # starts training You can do much cooler things with functional models than you can with sequential models, since you can blithely apply models (both the model architecture and the trained weights) to tensors. For example, the code that follows turns the image classification model defined above into a video classification model: from keras.layers import TimeDistributed # Input tensor for sequences of 20 timesteps, # each containing a 784-dimensional vector input_sequences = Input(shape=(20, 784)) # This applies our previous model to every timestep in the input sequences. # The output of the previous model was a 10-way softmax, # so the output of the layer below will be a sequence of 20 vectors of size 10. processed_sequences = TimeDistributed(model)(input_sequences) Installing Keras Keras installation is basically a two-step process, meaning you have to install a back end as well as Keras. On my MacBook, I started by upgrading pip; then I upgraded TensorFlow and installed Keras, both with pip. I also freshened the source code for both repositories so that I could use the code for reference in areas where the documentation wasn’t complete enough for me. sudo pip install --upgrade pip sudo pip install --upgrade tf-nightly sudo pip install keras At this point I tested TensorFlow and discovered that I had a rogue copy of the protobuf lying around that kept TensorFlow from importing. It turned out to be a version that I had installed with HomeBrew, so I uninstalled it: brew uninstall protobuf Finally TensorFlow imported and worked: Martins-Retina-MacBook:~ martinheller$ python Python 2.7.10 (default, Oct 6 2017, 22:29:07) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin Type “help”, “copyright”, “credits” or “license” for more information. >>> import tensorflow as tf >>> hello = tf.constant(‘Hello, TensorFlow!’) >>> sess = tf.Session() >>> print(sess.run(hello)) Hello, TensorFlow! Keras worked fine once TensorFlow had been repaired. I copied the code for a simple Keras sequential model with random input tensors into a Python REPL a few lines at a time. As you can see from the timings after the model.fit() call, this little five-layer classification network ran quite quickly (~12 ms per epoch after the first epoch) even on a CPU: Martins-Retina-MacBook:~ martinheller$ python Python 2.7.10 (default, Oct 6 2017, 22:29:07) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin Type “help”, “copyright”, “credits” or “license” for more information. >>> import keras Using TensorFlow backend. >>> from keras.models import Sequential >>> from keras.layers import Dense, Dropout, Activation >>> from keras.optimizers import SGD >>> >>> import numpy as np >>> x_train = np.random.random((1000, 20)) >>> y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10) >>> x_test = np.random.random((100, 20)) >>> y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10) >>> >>> model = Sequential() >>> model.add(Dense(64, activation=’relu’, input_dim=20)) >>> model.add(Dropout(0.5)) >>> model.add(Dense(64, activation=’relu’)) >>> model.add(Dropout(0.5)) >>> model.add(Dense(10, activation=’softmax’)) >>> >>> sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True) >>> model.compile(loss=’categorical_crossentropy’, ... optimizer=sgd, ... metrics=[‘accuracy’]) >>> model.fit(x_train, y_train, ... epochs=20, ... batch_size=128) Epoch 1/20 1000/1000 [==============================] - 0s 319us/step - loss: 2.3804 - acc: 0.0910 Epoch 2/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.3517 - acc: 0.0860 Epoch 3/20 1000/1000 [==============================] - 0s 13us/step - loss: 2.3573 - acc: 0.1000 Epoch 4/20 1000/1000 [==============================] - 0s 16us/step - loss: 2.3303 - acc: 0.0990 Epoch 5/20 1000/1000 [==============================] - 0s 15us/step - loss: 2.3177 - acc: 0.1090 Epoch 6/20 1000/1000 [==============================] - 0s 11us/step - loss: 2.3180 - acc: 0.1050 Epoch 7/20 1000/1000 [==============================] - 0s 14us/step - loss: 2.3125 - acc: 0.1310 Epoch 8/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.3068 - acc: 0.1330 Epoch 9/20 1000/1000 [==============================] - 0s 11us/step - loss: 2.3066 - acc: 0.0970 Epoch 10/20 1000/1000 [==============================] - 0s 13us/step - loss: 2.2954 - acc: 0.1100 Epoch 11/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.3065 - acc: 0.1110 Epoch 12/20 1000/1000 [==============================] - 0s 11us/step - loss: 2.3057 - acc: 0.1140 Epoch 13/20 1000/1000 [==============================] - 0s 11us/step - loss: 2.2993 - acc: 0.1200 Epoch 14/20 1000/1000 [==============================] - 0s 13us/step - loss: 2.2978 - acc: 0.1240 Epoch 15/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.2989 - acc: 0.1230 Epoch 16/20 1000/1000 [==============================] - 0s 11us/step - loss: 2.3001 - acc: 0.1180 Epoch 17/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.2892 - acc: 0.1210 Epoch 18/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.2993 - acc: 0.1060 Epoch 19/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.2990 - acc: 0.1110 Epoch 20/20 1000/1000 [==============================] - 0s 12us/step - loss: 2.2936 - acc: 0.1200 <keras.callbacks.History object at 0x112847990> >>> score = model.evaluate(x_test, y_test, batch_size=128) 100/100 [==============================] - 0s 537us/step >>> print (score) [‘2.2815165519714355’, ‘0.1599999964237213’] Here Keras is reporting a categorical cross-entropy loss of 2.28 and an accuracy of 16 percent from the evaluation of the model on the test data. That would be bad, except for the fact that the inputs were random—no model could fit them. Deploying Keras Keras models can be deployed across a great range of platforms—perhaps greater than any other deep learning framework. That includes iOS, via CoreML; Android, via the TensorFlow Android runtime; in a browser, via Keras.js and WebDNN; on Google Cloud, via TensorFlow-Serving; in a Python webapp backend; on the JVM, via DL4J model import; and on Raspberry Pi. Keras applications, data sets, and examples Keras supplies seven of the common deep learning sample data sets via the keras.datasets class. That includes cifar10 and cifar100 small color images, IMDB movie reviews, Reuters newswire topics, MNIST handwritten digits, MNIST fashion images, and Boston housing prices. Keras also supplies 10 well-known models pre-trained against ImageNet: Xception, VGG16, VGG19, ResNet50, InceptionV3, InceptionResNetV2, MobileNet, DenseNet, NASNet, and MobileNetV2TK. You can use these models to predict the classification of images, extract features from them, and fine-tune the models on a different set of classes. By the way, fine-tuning existing models is a good way to speed up training. For example, you can add layers as you wish, freeze the base layers to train the new layers, then unfreeze some of the base layers to fine-tune the training. You can freeze a layer by setting layer.trainable = False. The Keras examples repository contains more than 40 sample models. They cover vision models, text and sequences, and generative models. If I were starting a new deep learning project today, I would most likely do the research with Keras. Keras is really about as simple as it could be, given that the hard part of building deep neural network models is finding a network topology that fits the data as accurately as possible without overfitting. — Cost: Free open source under the MIT license. Platform: Linux, MacOS, Windows, or Raspbian; TensorFlow, Theano, or CNTK back end. Related content analysis Beyond the usual suspects: 5 fresh data science tools to try today The mid-month report includes quick tips for easier Python installation, a new VS Code-like IDE just for Python and R users, and five newer data science tools you won't want to miss. By Serdar Yegulalp Jul 12, 2024 2 mins Python Programming Languages Software Development analysis Generative AI won’t fix cloud migration You’ve probably heard how generative AI will solve all cloud migration problems. It’s not that simple. Generative AI could actually make it harder and more costly. By David Linthicum Jul 12, 2024 5 mins Generative AI Artificial Intelligence Cloud Computing news HR professionals trust AI recommendations HireVue survey finds 73% of HR professionals trust AI to make candidate recommendations, while 75% of workers are opposed to AI making hiring decisions. By Paul Krill Jul 11, 2024 3 mins Technology Industry Careers how-to Safety off: Programming in Rust with `unsafe` What does it mean to write unsafe code in Rust, and what can you do (and not do) with the 'unsafe' keyword? The facts may surprise you. By Serdar Yegulalp Jul 11, 2024 8 mins Rust Programming Languages Software Development Resources Videos