Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
New blog series: Deep Learning Papers visualized This is the first post of a new series I am starting where I explain the content of a paper in a visual picture-based way. To me, this helps tremendously to better grasp the ideas and remember them and I hope this will be the same for many of you as well.
Today’s paper: Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour by Goyal et al.
PyTorch multi-GPU training for faster machine learning results
When you have a big data set and a complicated machine learning problem, chances are that training your model takes a couple of days even on a modern GPU.
However, it is well-known that the cycle of having a new idea, implementing it and then verifying it should be as quick as possible. This is to ensure that you can efficiently test out new ideas.
If you need to wait for a whole week for your training run, this becomes very inefficient.
Do you know which inputs your neural network likes most?
Recent advances in training deep neural networks have led to a whole bunch of impressive machine learning models which are able to tackle a very diverse range of tasks. When you are developing such a model, one of the notable downsides is that it is considered a “black-box” approach in the sense that your model learns from data you feed it, but you don’t really know what is going on inside the model.
Shapeshifting PyTorch
An important consideration in machine learning is the shape of your data and your variables. You are often shifting and transforming data and then combining it. Thus, it is essential to know how to do this and what shortcuts are available.
Let’s start with a tensor with a single dimension:
import torch test = torch.tensor([1,2,3]) test.shape torch.Size([3]) Now assume we have built some machine learning model which takes batches of such single dimensional tensors as input and returns some output.
What are embeddings in machine learning?
Every now and then, you need embeddings when training machine learning models. But what exactly is such an embedding and why do we use it?
Basically, an embedding is used when we want to map some representation into another dimensional space. Doesn’t make things much clearer, does it?
So, let’s consider an example: we want to train a recommender system on a movie database (typical Netflix use case). We have many movies and information about the ratings of users given to movies.