My deeplearning.ai Experience

At the end of July 2018, as some of the readers might have guessed from , I decided to turn my career back towards engineering. The first week was dedicated solely to getting acquainted with Python, but the main theme of the next ones was the Coursera Specialization Deeplearning.ai — though I tried to learn from something more cutting edge like fast.ai, but couldn’t do much without the basic concepts. Now that it has been approximately 2 weeks since I finished the whole set of courses, I can give a somewhat thorough review of the full experience.

Before entering the review, let me give you an idea of how many hours you would have to sink in (counting the time you will take to write your notes also) to learn the content of the courses:

Topic	Time Needed
Neural Networks and Deep Learning	10h
Improving Neural Networks	8.5h
Structuring Machine Learning Projects	3.5h
Convolutional Neural Networks	10h
Sequence Models	8h

In total, it would be roughly 40h, but I would estimate 50h to 60h in order to have the knowledge really make its way into your long-term memory.

As usual, (most of) the source code will be in my Github profile, more specifically, this repo. In it, you will also find my course notes, which are all in the markdown language and include all the necessary formulas in LaTeX also, a material I would like to believe is very useful even for those who have taken the specialization, since it summarizes the whole experience in only 5 files (encoded as dlCS#.md, for Deep Learning Cheat Sheet + the number of the course).

This first course was the one that got me hooked.

I had already taken the first week of it 2 years ago, but didn’t have the necessary mindset nor the belief that it would be worth the time to go through it to the end. This time, since my mental state was much more in tune with online courses and I was much more interested, I decided to take it again, and I think I can safely say that this course feels like a must-have.

Andrew Ng, the main teacher — and the only one you will see —, is not only one of the most competent practitioners of Deep Learning in the world, but also an amazing teacher who has an incredibly simple way of explaining complex concepts. His explanations always start with the ideas behind systems or algorithms and then the more difficult to understand respective set of equations, a technique that enabled learning feel pleasant and fulfilling. Another of his strengths is notation: there is no real consensus among practitioners nor researchers, so beginners can be very confused; thankfully, Andrew chose a very simple, consistent and practical notation.

The concepts discussed in this course are paramount to the field and all of the more complex networks you will later learn, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN). Some of them are:

Activation Functions
Sigmoid and Softmax
Logistic Regression Cost Function
Backpropagation & Gradient Descent (and how to implement it)
General Equations for Neural Networks
Random Initializations

All of the important parts will be coded from scratch in Python and you will end up feeling very proud of yourself for being able to go from zero to working NN with very few lines of code. You can consult the Jupyter Notebooks in the aforementioned Github Repo (or google it) to get an idea of how it is done.

In this following course we will then enter the realm of art, since many of the techniques involving hyperparameters don’t work for all cases and, thus, rely a lot on the intuition of the engineer. This is another essential chapter of a Deep Learning practitioner, sometimes you can only make an NN work properly if you make a good optimization.

Most of the topics here are studied both in a high and low level of abstraction, with very clear explanations from Andrew Ng — I haven’t yet found clearer explanations anywhere else —, they are:

Regularization
Dropout
Data Augmentation
Early Stopping
Vanishing/Exploding Gradients
Gradient Checking
Batch vs Mini-Batch Gradient Descent
RMSprop
ADAM
Learning Rate Decay
The Problem of Local Optima
Batch Normalization
Deep Learning Frameworks

Again, surprisingly, you will find yourself implementing basically all of the above from scratch, a quite impressive feat in my opinion.

Nonetheless, I believe the last coding exercise is quite a strong indication of what’s to come. In the Deep Learning Frameworks section you will have a very brief introduction of one of the core tools used by most of today’s practitioners: TensorFlow; but, despite its importance, only a very small percentage of the course is devoted to it, and you will inevitably have to scrape the web for tutorials trying to figure out how to deal with the very strange way of thinking TensorFlow has. From the end of this course I started to suspect that I would not be a fully independent practitioner or researcher after I had finished the whole specialization.

This part of the set feels like an interlude. It is centered around two longer quizzes which would test your ability of dealing with the implementation of real-world systems (a bird classifier and an autonomous driving car). Andrew Ng created these two problems to train and test his students on what they had learned, but soon realized that their use could be broadened, since he later witnessed engineering teams stuck for months in problems that were solved within his simulations/quizzes.

Some students that are more experienced in the field have reviewed this course as a huge breakthrough in their Deep Learning careers, however, I think I won’t be able to fully assess its value until I face similar problems in real life. For now, they feel like a more complicated repetition of the two former courses. My only complaint would be that the submission of the quizzes here were a painstaking process as my answers were involuntarily changed many times due to some kind of weird minor bug.

As previously mentioned, the quality of the courses starts to drop from here and that’s mostly because the coding exercises are too short for the complexity of the algorithms. It was probably mandatory for the developers to fit everything into short pills, but I can’t help but still feel uneasy when it comes to implementing CNNs after this course — that’s one of the reasons why I’ve been going around the web looking for different tutorials with different implementations —, although you will indeed implement the most important steps of the CNN various algorithms.

The other factor that accounts for the diminishing value of the courses is the fact that a detailed explanation about the peculiarities of the Keras library — which sits on top of TensorFlow and Pytorch, and is widely used by researchers and practitioners — is lacking. Though mostly an easy to understand framework, Keras also has some specific ways of executing some aspects of NNs — it also comes from Google in a way… —, so adapting yourself to it can be quite annoying.

Anyway, the intuition and theory of the course seems, as usual, to have enough quality to avoid any complaints — even though it hasn’t been updated since around the end of 2016:

How (and Why) Convolutions Work
Padding and Strided Convolutions
Pooling (Max and Average) Layers
Various Classical Architectures (LeNet-5, AlexNet, VGG-16, ResNet, Inception)
1×1 Convolutions

Out of the last 3 courses, this is my favorite. The notation and distinction between LSTMs and GRUs is probably its strongest point, and, certainly, not very easy to find elsewhere. Sequence Models are a very peculiar and newer area when compared to normal NNs and CNNs, so everything here can be quite eye-opening — much like when you see recurrent algorithms for the first time. The main topics are:

Why and When to choose RNNs (when there is a step-wise dependency in the independent variables)
How RNNs measure conditional probabilities
Language Modeling
Vanishing Gradients in RNNs (more usual than exploding gradients)
GRUs & LSTMs
Bidirectional and Deep RNNs
Embedding Matrices and Transfer Learning
Similarity Functions, Linguistic Regularities and t-SNE
Algorithms for learning embedding matrices (Word2Vec, Negative Sampling, GloVe)
Beam Search
Translation Metrics (e.g. Bleu Score)
Attention Models
Trigger Word Detection

I wouldn’t say I can implement any of the above methods from scratch with only this course; in the coding exercises, we see only a higher level application — though I could do it if I studied the particular application. The ending of this last section of the specialization is the biggest example of the issue I had mentioned about skimming through the details of the implementation to fit the time slot.

The more you use the internet the more you realize how ludicrous it is to give complex experiences a rating that usually has at most 5 levels. If I give a 4-star rating to this specialization, you will believe it is rather mediocre, if, on the other hand, I give it a 5-star, you might believe this is the bomb. It’s neither of those, to be more precise, I would give it 85% (a 4.25-star rating) and that’s because I think they oversimplified the implementations of the CNNs and RNNs.

However, the grading is not the most important concept to be grasped here, but the realization that the status quo of the online courses today is almost always not enough to convert people into real practitioners, simply because they won’t be able to give enough experience to the student — i.e., they mostly deal with toy and classical problems. What the student should realize is that no course, be it online or not, will be able to compensate for a team of trained developers and novel problems, that is: real-life experience.