My deeplearning.ai Experience
At the end of July 2018, as some of the readers might have guessed
from
Before entering the review, let me give you an idea of how many hours you would have to sink in (counting the time you will take to write your notes also) to learn the content of the courses:
Topic | Time Needed |
---|---|
Neural Networks and Deep Learning | 10h |
Improving Neural Networks | 8.5h |
Structuring Machine Learning Projects | 3.5h |
Convolutional Neural Networks | 10h |
Sequence Models | 8h |
In total, it would be roughly 40h, but I would estimate 50h to 60h in order to have the knowledge really make its way into your long-term memory.
As usual, (most of) the source code will be in my Github profile, more specifically, this repo. In it, you will also find my course notes, which are all in the markdown language and include all the necessary formulas in LaTeX also, a material I would like to believe is very useful even for those who have taken the specialization, since it summarizes the whole experience in only 5 files (encoded asdlCS#.md
, for Deep Learning Cheat
Sheet + the number of the course).
This first course was the one that got me hooked.
I had already taken the first week of it 2 years ago, but didn’t have the necessary mindset nor the belief that it would be worth the time to go through it to the end. This time, since my mental state was much more in tune with online courses and I was much more interested, I decided to take it again, and I think I can safely say that this course feels like a must-have.
Andrew Ng
The concepts discussed in this course are paramount to the field and all of the more complex networks you will later learn, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). Some of them are:
- Activation Functions
- Sigmoid and Softmax
- Logistic Regression Cost Function
- Backpropagation & Gradient Descent (and how to implement it)
- General Equations for Neural Networks
- Random Initializations
All of the important parts will be coded from scratch in Python and you will end up feeling very proud of yourself for being able to go from zero to working NN with very few lines of code. You can consult the Jupyter Notebooks in the aforementioned Github Repo (or google it) to get an idea of how it is done.
In this following course we will then enter the realm of art, since many of the techniques involving hyperparameters don’t work for all cases and, thus, rely a lot on the intuition of the engineer. This is another essential chapter of a Deep Learning practitioner, sometimes you can only make an NN work properly if you make a good optimization.
Most of the topics here are studied both in a high and low level of abstraction, with very clear explanations from Andrew Ng — I haven’t yet found clearer explanations anywhere else —, they are:
- Regularization
- Dropout
- Data Augmentation
- Early Stopping
- Vanishing/Exploding Gradients
- Gradient Checking
- Batch vs Mini-Batch Gradient Descent
- RMSprop
- ADAM
- Learning Rate Decay
- The Problem of Local Optima
- Batch Normalization
- Deep Learning Frameworks
Again, surprisingly, you will find yourself implementing basically all of the above from scratch, a quite impressive feat in my opinion.
Nonetheless, I believe the last coding exercise is quite a strong
indication of what’s to come. In the Deep Learning Frameworks section
you will have a very brief introduction of one of the core tools used
by most of today’s practitioners: TensorFlow; but, despite its
importance, only a very small percentage of the course is devoted to
it, and you will inevitably have to scrape the web for tutorials
trying to figure out how to deal with the very strange way of thinking
TensorFlow has
This part of the set feels like an interlude. It is centered around two longer quizzes which would test your ability of dealing with the implementation of real-world systems (a bird classifier and an autonomous driving car). Andrew Ng created these two problems to train and test his students on what they had learned, but soon realized that their use could be broadened, since he later witnessed engineering teams stuck for months in problems that were solved within his simulations/quizzes.
Some students that are more experienced in the field have reviewed
this course as a huge breakthrough in their Deep Learning careers,
however, I think I won’t be able to fully assess its value until I
face similar problems in real life
As previously mentioned, the quality of the courses starts to drop from here and that’s mostly because the coding exercises are too short for the complexity of the algorithms. It was probably mandatory for the developers to fit everything into short pills, but I can’t help but still feel uneasy when it comes to implementing CNNs after this course — that’s one of the reasons why I’ve been going around the web looking for different tutorials with different implementations —, although you will indeed implement the most important steps of the CNN various algorithms.
The other factor that accounts for the diminishing value of the
courses is the fact that a detailed explanation about the
peculiarities of the Keras library — which sits on top of
TensorFlow and Pytorch, and is widely used by researchers and
practitioners — is lacking. Though mostly an easy to understand
framework, Keras also has some specific ways of executing some aspects
of NNs
Anyway, the intuition and theory of the course seems, as usual, to have enough quality to avoid any complaints — even though it hasn’t been updated since around the end of 2016:
- How (and Why) Convolutions Work
- Padding and Strided Convolutions
- Pooling (Max and Average) Layers
- Various Classical Architectures (LeNet-5, AlexNet, VGG-16, ResNet, Inception)
- 1×1 Convolutions
Out of the last 3 courses, this is my favorite. The notation and distinction between LSTMs and GRUs is probably its strongest point, and, certainly, not very easy to find elsewhere. Sequence Models are a very peculiar and newer area when compared to normal NNs and CNNs, so everything here can be quite eye-opening — much like when you see recurrent algorithms for the first time. The main topics are:
- Why and When to choose RNNs (when there is a step-wise dependency in the independent variables)
- How RNNs measure conditional probabilities
- Language Modeling
- Vanishing Gradients in RNNs (more usual than exploding gradients)
- GRUs & LSTMs
- Bidirectional and Deep RNNs
- Embedding Matrices and Transfer Learning
- Similarity Functions, Linguistic Regularities and t-SNE
- Algorithms for learning embedding matrices (Word2Vec, Negative Sampling, GloVe)
- Beam Search
- Translation Metrics (e.g. Bleu Score)
- Attention Models
- Trigger Word Detection
I wouldn’t say I can implement any of the above methods from scratch with only this course; in the coding exercises, we see only a higher level application — though I could do it if I studied the particular application. The ending of this last section of the specialization is the biggest example of the issue I had mentioned about skimming through the details of the implementation to fit the time slot.
The more you use the internet the more you realize how ludicrous it is to give complex experiences a rating that usually has at most 5 levels. If I give a 4-star rating to this specialization, you will believe it is rather mediocre, if, on the other hand, I give it a 5-star, you might believe this is the bomb. It’s neither of those, to be more precise, I would give it 85% (a 4.25-star rating) and that’s because I think they oversimplified the implementations of the CNNs and RNNs.
However, the grading is not the most important concept to be grasped here, but the realization that the status quo of the online courses today is almost always not enough to convert people into real practitioners, simply because they won’t be able to give enough experience to the student — i.e., they mostly deal with toy and classical problems. What the student should realize is that no course, be it online or not, will be able to compensate for a team of trained developers and novel problems, that is: real-life experience.