Here are the dataset shapes
25000 train sequences
25000 test sequences
And the input for the first instance is represented as:
[1, 14, 22, 16, 43, 530, 973, 1622, 1385, 65, 458, 4468, 66, 3941, 4, 173, 36, 256, 5, 25, 100, 43, 838, 112, 50, 670, 2, 9, 35, 480, 284, 5, 150, 4, 172, 112, 167, 2, 336, 385, 39, 4, 172, 4536, 1111, 17, 546, 38, 13, 447, 4, 192, 50, 16, 6, 147, 2025, 19, 14, 22, 4, 1920, 4613, 469, 4, 22, 71, 87, 12, 16, 43, 530, 38, 76, 15, 13, 1247, 4, 22, 17, 515, 17, 12, 16, 626, 18, 19193, 5, 62, 386, 12, 8, 316, 8, 106, 5, 4, 2223, 5244, 16, 480, 66, 3785, 33, 4, 130, 12, 16, 38, 619, 5, 25, 124, 51, 36, 135, 48, 25, 1415, 33, 6, 22, 12, 215, 28, 77, 52, 5, 14, 407, 16, 82, 10311, 8, 4, 107, 117, 5952, 15, 256, 4, 2, 7, 3766, 5, 723, 36, 71, 43, 530, 476, 26, 400, 317, 46, 7, 4, 12118, 1029, 13, 104, 88, 4, 381, 15, 297, 98, 32, 2071, 56, 26, 141, 6, 194, 7486, 18, 4, 226, 22, 21, 134, 476, 26, 480, 5, 144, 30, 5535, 18, 51, 36, 28, 224, 92, 25, 104, 4, 226, 65, 16, 38, 1334, 88, 12, 16, 283, 5, 16, 4472, 113, 103, 32, 15, 16, 5345, 19, 178, 32]
This course proposes a quick introduction to deep learning and two of its major networks, convolutional neural networks (CNNs) and recurrent neural networks (RNNs). The purpose is to give an intuitive sense of how to implement deep learning approaches for various tasks. To use this iPython notebook, run the python code in separate files for each cell. The content below each cell of this notebook is the output for running those cells.
Simple perceptron¶
What is the loss function here? How is it calculated?
Any idea how it would perform on non-linearly separable data? How could we test it?
Multilayer perceptron¶
Let’s use the fact that the sigmoid is differenciable (while the step function we saw in the slides is not). This allows us to add more layers (hence more modelling power).
Setting up the environment¶
We have done toy examples for feedforward networks. Things quickly become complicated, so let’s go deeper by relying on high-level frameworks: TensorFlow and Keras. Most technicalities are thus avoided so that you can directly play with networks.
!conda install tensorflow keras
CNNs¶
We are going to use the MNIST dataset for our first task. The code below loads the dataset and shows one training example and its label.
Now study the following code. What is the network we use? How many layers? What hyper parameters?
Is there anything wrong here?
How do you think a linear classifier performs?
Let’s use this model to predict a value for the first training instance we vizualized.
Is the model correct here? What is the output of the network?
RNNs¶
We will now switch to RNNs. These require more resources, so we can’t do the fanciest applications during the workshop. We will do some sentiment classification of movie reviews.
What do you think about this text? Is it a positive or negative review?
What do these numbers represent? Is there any limitation you can imagine coming from this?
Remarks? Comments?
How do the training scores compare to the test scores? How can we improve this? What are the current limitations?
This RNN use case takes more time to train but it is definitely more impressive. We will model the language, by training on a novel. For each (set of) word(s) in the novel, the objective is to predict the following word. This can be done on any text, and we don’t need annotated data – the text itself is enough.
Have a look at the following piece of code and try to understand what it does. Then, run it and see the network generating text! At first, the output is not meaningful, but it becomes so over time. This is the magic I was referring to.
Beware: this will take longer to run on a CPU. A GPU is recommended, but you can still try to run it for a while to see the predictions evolve. On my laptop, an epoch takes 6mins so the full training takes 6hrs. About 20 epochs are required for the generated text to be somewhat meaningful.
Note, however, that although this seems long, training actual deep learning models for concrete tasks takes days, even on multiple GPUs. This is mostly because of the data size and the much deeper networks.
Share this:
Like this: