Blog

A Deep Learning Overview with Python

This course proposes a quick introduction to deep learning and two of its major networks, convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The purpose is to give an intuitive sense of how to implement deep learning approaches for various tasks. To use this iPython notebook, run the python code in separate files for each cell. The content below each cell of this notebook is the output for running those cells.

Simple perceptron

In [1]:
import numpy as np

# sigmoid function
def sigmoid(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))
    
# input dataset
X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])
    
# output dataset            
y = np.array([[0,0,1,1]]).T

# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

for j in range(100000):

    # forward propagation
    l0 = X
    l1 = sigmoid(np.dot(l0,syn0))

    # how much did we miss?
    l1_error = y - l1
    if (j% 10000) == 0:
        print("Error:" + str(np.mean(np.abs(l1_error))))

    # multiply how much we missed by the 
    # slope of the sigmoid at the values in l1
    l1_delta = l1_error * sigmoid(l1,True)

    # update weights
    syn0 += np.dot(l0.T,l1_delta)

print()
print("Prediction after Training:")
print(l1)
Error:0.517208275438
Error:0.00795484506673
Error:0.0055978239634
Error:0.00456086918013
Error:0.00394482243339
Error:0.00352530883742
Error:0.00321610234673
Error:0.00297605968522
Error:0.00278274003022
Error:0.0026227273927

Prediction after Training:
[[ 0.00301758]
 [ 0.00246109]
 [ 0.99799161]
 [ 0.99753723]]

What is the loss function here? How is it calculated?

Any idea how it would perform on non-linearly separable data? How could we test it?

Multilayer perceptron

Let’s use the fact that the sigmoid is differenciable (while the step function we saw in the slides is not). This allows us to add more layers (hence more modelling power).

In [2]:
import numpy as np

def sigmoid(x,deriv=False):
	if(deriv==True):
	    return x*(1-x)

	return 1/(1+np.exp(-x))
    
X = np.array([[0,0,1],
              [0,1,1],
              [1,0,1],
              [1,1,1]])
                
y = np.array([[0],
			  [1],
			  [1],
			  [0]])

np.random.seed(1)

# randomly initialize our weights with mean 0
syn0 = 2*np.random.random((3,4)) - 1
syn1 = 2*np.random.random((4,1)) - 1

for j in range(100000):

	# Feed forward through layers 0, 1, and 2
    l0 = X
    l1 = sigmoid(np.dot(l0,syn0))
    l2 = sigmoid(np.dot(l1,syn1))

    # how much did we miss the target value?
    l2_error = y - l2
    
    if (j% 10000) == 0:
        print("Error:" + str(np.mean(np.abs(l2_error))))
        
    # in what direction is the target value?
    # were we really sure? if so, don't change too much.
    l2_delta = l2_error*sigmoid(l2,deriv=True)

    # how much did each l1 value contribute to the l2 error (according to the weights)?
    l1_error = l2_delta.dot(syn1.T)
    
    # in what direction is the target l1?
    # were we really sure? if so, don't change too much.
    l1_delta = l1_error * sigmoid(l1,deriv=True)

    syn1 += l1.T.dot(l2_delta)
    syn0 += l0.T.dot(l1_delta)
    
print()
print(l2)
Error:0.496410031903
Error:0.00858452565325
Error:0.00578945986251
Error:0.00462917677677
Error:0.00395876528027
Error:0.00351012256786
Error:0.00318350238587
Error:0.00293230634228
Error:0.00273150641821
Error:0.00256631724004

[[ 0.00199094]
 [ 0.99751458]
 [ 0.99771098]
 [ 0.00294418]]

Setting up the environment

We have done toy examples for feedforward networks. Things quickly become complicated, so let’s go deeper by relying on high-level frameworks: TensorFlow and Keras. Most technicalities are thus avoided so that you can directly play with networks.

In [ ]:
!conda install tensorflow keras
In [3]:
import tensorflow as tf
import keras
/Users/syedather/.local/lib/python3.6/site-packages/matplotlib/__init__.py:1067: UserWarning: Duplicate key in file "/Users/syedather/.matplotlib/matplotlibrc", line #2
  (fname, cnt))
Using TensorFlow backend.
In [4]:
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
b'Hello, TensorFlow!'

CNNs

We are going to use the MNIST dataset for our first task. The code below loads the dataset and shows one training example and its label.

In [5]:
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
from pylab import *

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

print("The first training instance is labeled as: "+str(y_train[0]))
The first training instance is labeled as: 5
In [6]:
figure(1)
imshow(x_train[0], interpolation='nearest')
Out[6]:
<matplotlib.image.AxesImage at 0x1259b2320>

Now study the following code. What is the network we use? How many layers? What hyper parameters?

In [7]:
# Setup some hyper parameters
batch_size = 128
num_classes = 10
epochs = 15

# input image dimensions
img_rows, img_cols = 28, 28

# This is some technicality regarding Keras' dataset
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# We convert the matrices to floats as we will use real numbers
x_train = x_train.astype('float32')[:1000]
x_test = x_test.astype('float32')[:200]
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)[:1000]
y_test = keras.utils.to_categorical(y_test, num_classes)[:200]


# Build network
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

# Train
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

# Evaluate on test data
score = model.evaluate(x_test, y_test, verbose=0)
print()
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# Evaluate on training data
score = model.evaluate(x_train, y_train, verbose=0)
print()
print('Train loss:', score[0])
print('Train accuracy:', score[1])
x_train shape: (1000, 28, 28, 1)
1000 train samples
200 test samples
Train on 1000 samples, validate on 200 samples
Epoch 1/15
1000/1000 [==============================] - 4s 4ms/step - loss: 1.7244 - acc: 0.5660 - val_loss: 0.9116 - val_acc: 0.7900
Epoch 2/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.5967 - acc: 0.8320 - val_loss: 0.5148 - val_acc: 0.8100
Epoch 3/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.4394 - acc: 0.8670 - val_loss: 0.3056 - val_acc: 0.8600
Epoch 4/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.3296 - acc: 0.9050 - val_loss: 0.3263 - val_acc: 0.9000
Epoch 5/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.2205 - acc: 0.9360 - val_loss: 0.2092 - val_acc: 0.9200
Epoch 6/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.1684 - acc: 0.9560 - val_loss: 0.1870 - val_acc: 0.9450
Epoch 7/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.1325 - acc: 0.9690 - val_loss: 0.1597 - val_acc: 0.9350
Epoch 8/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0990 - acc: 0.9740 - val_loss: 0.1617 - val_acc: 0.9400
Epoch 9/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0636 - acc: 0.9840 - val_loss: 0.1434 - val_acc: 0.9450
Epoch 10/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0393 - acc: 0.9960 - val_loss: 0.1545 - val_acc: 0.9400
Epoch 11/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0267 - acc: 0.9950 - val_loss: 0.1444 - val_acc: 0.9400
Epoch 12/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.0158 - acc: 1.0000 - val_loss: 0.1642 - val_acc: 0.9350
Epoch 13/15
1000/1000 [==============================] - 3s 3ms/step - loss: 0.0090 - acc: 1.0000 - val_loss: 0.1475 - val_acc: 0.9450
Epoch 14/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.0057 - acc: 1.0000 - val_loss: 0.1556 - val_acc: 0.9350
Epoch 15/15
1000/1000 [==============================] - 4s 4ms/step - loss: 0.0041 - acc: 1.0000 - val_loss: 0.1651 - val_acc: 0.9350

Test loss: 0.165074422359
Test accuracy: 0.935

Train loss: 0.00311407446489
Train accuracy: 1.0

Is there anything wrong here?

How do you think a linear classifier performs?

In [8]:
# Setup some hyper parameters
batch_size = 128
num_classes = 10
epochs = 15

# input image dimensions
img_rows, img_cols = 28, 28

# This is some technicality regarding Keras' dataset
if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

# We convert the matrices to floats as we will use real numbers
x_train = x_train.astype('float32')[:1000]
x_test = x_test.astype('float32')[:200]
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)[:1000]
y_test = keras.utils.to_categorical(y_test, num_classes)[:200]


# Build network
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

# Train
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

# Evaluate on test data
score = model.evaluate(x_test, y_test, verbose=0)
print()
print('Test loss:', score[0])
print('Test accuracy:', score[1])

# Evaluate on training data
score = model.evaluate(x_train, y_train, verbose=0)
print()
print('Train loss:', score[0])
print('Train accuracy:', score[1])
x_train shape: (1000, 28, 28, 1)
1000 train samples
200 test samples
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-a1470fe28059> in <module>()
     53           epochs=epochs,
     54           verbose=1,
---> 55           validation_data=(x_test, y_test))
     56 
     57 # Evaluate on test data

~/anaconda3/lib/python3.6/site-packages/keras/models.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
    961                               initial_epoch=initial_epoch,
    962                               steps_per_epoch=steps_per_epoch,
--> 963                               validation_steps=validation_steps)
    964 
    965     def evaluate(self, x=None, y=None,

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
   1628             sample_weight=sample_weight,
   1629             class_weight=class_weight,
-> 1630             batch_size=batch_size)
   1631         # Prepare validation data.
   1632         do_validation = False

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size)
   1478                                     output_shapes,
   1479                                     check_batch_axis=False,
-> 1480                                     exception_prefix='target')
   1481         sample_weights = _standardize_sample_weights(sample_weight,
   1482                                                      self._feed_output_names)

~/anaconda3/lib/python3.6/site-packages/keras/engine/training.py in _standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix)
    111                         ': expected ' + names[i] + ' to have ' +
    112                         str(len(shape)) + ' dimensions, but got array '
--> 113                         'with shape ' + str(data_shape))
    114                 if not check_batch_axis:
    115                     data_shape = data_shape[1:]

ValueError: Error when checking target: expected dense_4 to have 2 dimensions, but got array with shape (1000, 10, 10)

Let’s use this model to predict a value for the first training instance we vizualized.

In [ ]:
print(model.predict(np.expand_dims(x_train[0], axis=0)))

Is the model correct here? What is the output of the network?

RNNs

We will now switch to RNNs. These require more resources, so we can’t do the fanciest applications during the workshop. We will do some sentiment classification of movie reviews.

In [9]:
from __future__ import print_function
import numpy as np
import keras
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Dropout, Embedding, LSTM, Bidirectional
from keras.datasets import imdb

# Number of considered words, based on frequencies
max_features = 20000
# cut texts after this number of words
maxlen = 100
batch_size = 32

print('Loading data...')
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=max_features, index_from=3)

# This is just for pretty printing the sentences...
word_to_id = keras.datasets.imdb.get_word_index()
word_to_id = {k:(v+3) for k,v in word_to_id.items()}
word_to_id["<PAD>"] = 0
word_to_id["<START>"] = 1
word_to_id["<UNK>"] = 2
id_to_word = {value:key for key,value in word_to_id.items()}

print("Here's the input for the first training instance:")
print(' '.join(id_to_word[id] for id in x_train[0] ))
Loading data...
Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
17465344/17464789 [==============================] - 2s 0us/step
Downloading data from https://s3.amazonaws.com/text-datasets/imdb_word_index.json
1646592/1641221 [==============================] - 0s 0us/step
Here's the input for the first training instance:
<START> this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert <UNK> is an amazing actor and now the same being director <UNK> father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for retail and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also congratulations to the two little boy's that played the <UNK> of norman and paul they were just brilliant children are often left out of the praising list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all

What do you think about this text? Is it a positive or negative review?

In [10]:
print("Here are the dataset shapes")
print(len(x_train), 'train sequences')
print(len(x_test), 'test sequences')

print("And the input for the first instance is represented as:")
print(x_train[0])
Here are the dataset shapes
25000 train sequences
25000 test sequences
And the input for the first instance is represented as:
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]

What do these numbers represent? Is there any limitation you can imagine coming from this?

In [11]:
print('Pad sequences (samples x time)')
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)[:5000]
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)[:5000]
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
y_train = np.array(y_train)[:5000]
y_test = np.array(y_test)[:5000]

model = Sequential()
model.add(Embedding(max_features, 128, input_length=maxlen))
model.add(Bidirectional(LSTM(64)))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

model.compile('adam', 'binary_crossentropy', metrics=['accuracy'])

print('Train...')
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=4,
          validation_data=[x_test, y_test])
Pad sequences (samples x time)
x_train shape: (5000, 100)
x_test shape: (5000, 100)
Train...
Train on 5000 samples, validate on 5000 samples
Epoch 1/4
5000/5000 [==============================] - 54s 11ms/step - loss: 0.6032 - acc: 0.6570 - val_loss: 0.4283 - val_acc: 0.8056
Epoch 2/4
5000/5000 [==============================] - 54s 11ms/step - loss: 0.2761 - acc: 0.8918 - val_loss: 0.4403 - val_acc: 0.7948
Epoch 3/4
5000/5000 [==============================] - 61s 12ms/step - loss: 0.1101 - acc: 0.9670 - val_loss: 0.6366 - val_acc: 0.8026
Epoch 4/4
5000/5000 [==============================] - 56s 11ms/step - loss: 0.0478 - acc: 0.9868 - val_loss: 0.6637 - val_acc: 0.7954
Out[11]:
<keras.callbacks.History at 0x1392d76d8>
In [12]:
print("The neural net predicts that the first instance sentiment is:")
print(model.predict(np.expand_dims(x_train[0], axis=0)))
The neural net predicts that the first instance sentiment is:
[[ 0.99445081]]

Remarks? Comments?

How do the training scores compare to the test scores? How can we improve this? What are the current limitations?

This RNN use case takes more time to train but it is definitely more impressive. We will model the language, by training on a novel. For each (set of) word(s) in the novel, the objective is to predict the following word. This can be done on any text, and we don’t need annotated data – the text itself is enough.

Have a look at the following piece of code and try to understand what it does. Then, run it and see the network generating text! At first, the output is not meaningful, but it becomes so over time. This is the magic I was referring to.

Beware: this will take longer to run on a CPU. A GPU is recommended, but you can still try to run it for a while to see the predictions evolve. On my laptop, an epoch takes 6mins so the full training takes 6hrs. About 20 epochs are required for the generated text to be somewhat meaningful.

Note, however, that although this seems long, training actual deep learning models for concrete tasks takes days, even on multiple GPUs. This is mostly because of the data size and the much deeper networks.

In [ ]:
from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
from keras.utils.data_utils import get_file
import numpy as np
import random
import sys
import io

# We load a text from Nietzsche
path = get_file('nietzsche.txt', origin='https://s3.amazonaws.com/text-datasets/nietzsche.txt')
with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

# We create dictionaries of character > index and the other way around
chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1


# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

optimizer = RMSprop(lr=0.01)
model.compile(loss='categorical_crossentropy', optimizer=optimizer)


def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)


def on_epoch_end(epoch, logs):
    # Function invoked at end of each epoch. Prints generated text.
    print()
    print('----- Generating text after Epoch: %d' % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print('----- diversity:', diversity)

        generated = ''
        sentence = text[start_index: start_index + maxlen]
        generated += sentence
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.

            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]

            generated += next_char
            sentence = sentence[1:] + next_char

            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y,
          batch_size=128,
          epochs=60,
          callbacks=[print_callback])
Downloading data from https://s3.amazonaws.com/text-datasets/nietzsche.txt
606208/600901 [==============================] - 0s 0us/step
corpus length: 600893
total chars: 57
nb sequences: 200285
Vectorization...
Build model...
Epoch 1/60
200285/200285 [==============================] - 281s 1ms/step - loss: 1.9553

----- Generating text after Epoch: 0
----- diversity: 0.2
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no and it is the the of the same the of the sention of the strenge of the most the self-our of the inderent that the sensive indeed the one of the constitute of the most of the semple of the desire of the sensive of the most of the semple of the sempathy of the one of the into the every to a soul of the some of the persent the free of the semple of the most of the sention of the of the spiritual the 
----- diversity: 0.5
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no may a suptimes and also orage mankind the one of indeed of one streng the possible the sensition and the inderenation of a sul the in a sould be the orting a solitiarity of religions in a man of such and a scient, in every of and the self-to and of a revilued it is the most in the indeed, and it is assual that the ord of the of the distiture in its all the manter of the soul permans the decours of
----- diversity: 1.0
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no causest and hew the fown of every groktulr
destined a the art it noteriness of one it all and
and cothinded of that rendercaterfroe to doe," in the pational the is the onl yutre
allor upitsoon,--one
viburan mused a "master in the that niver if
a pridicle quesiles of
the shoold enss nowxing to
feef ma.t--wute disequerly that then her rewadd finale the eeblive alse rusurefver" a selovery catte he re
----- diversity: 1.2
----- Generating with seed: "to
agree with many people. "good" is no "
to
agree with many people. "good" is no likeurenes, it is novamentstisuser'stone, indos paces. fund, wethel feel the
que let doee new eveny that is that the catel. thotgy is
within ceoks of theregeritades) and itwas brutmes ageteron
clyrelogilabl freephi; its. by an? andaver happ
one of his absuman artificss? itself old a
ooker himsood and bus hray
fined in smuch is sudtirers of rerarder from and
afutty
mest utfered with to "bewnook one
Epoch 2/60
 81664/200285 [===========>..................] - ETA: 2:37 - loss: 1.6395