Hyperparameter tuning on numerai data with fastai and weights & biases
Today we will try to tackle the Numerai tournament using the fastai deep learning library. However, as the results likely depend on many different hyperparameters, let’s take advantage of the weights and biases library and their sweeps API. Sweeps are hyperparameter runs which test out different combinations of your model’s hyperparameters.
What is Numerai? Numerai is a hedge fund which trades stocks in a market neutral fashion. That means that they try to make money without having a lot of risk for their customers.
P-Diff Learning Classifier with noisy labels based on probability difference distributions
Label noise in digital Pathology In the field of digital pathology and other health related deep learning applications, label noise is an important challenge to consider during training.
It’s inherent to the medical fields as the problems are extremely challenging even for trained experts, so there is high intra- as well as inter-observer variability.
This blog post dives into the idea of the paper P-DIFF: Learning Classifier with Noisy Labels based on Probability Difference Distributions which is authored by researchers of Microsoft in China.
Git config
Git config I like to have a global git config which takes care of my usual git setup like typical commands and abbreviations I use, my username and my email address.
It can be helpful to adjust some of this information for a local project, e.g. when you are normally having your regular email address setup, but in one of the local folders you develop for a company you work for and you want to have your work email address instead.
Bash string manipulation
Bash string manipulation When I write bash scripts in my terminal, I often need to manipulate strings.
Unfortunately, I often forget how to do this properly in bash, so I thought I’d write a blog article for me to remember better in the future. Hopefully it will be helpful for some of you developers out there as well.
String manipulation in bash is not hard, but I find some of the notation a bit cumbersome especially when normally working more with Python or other languages.
Meta-learning from noisy labels
Label noise introduction Training machine learning models requires a lot of data. Often, it is quite costly to obtain sufficient data for your problem. Sometimes, you might even need domain experts which don’t have much time and are expensive.
One option that you can look into is getting cheaper, lower quality data, i.e. have less experienced people annotate data. This usually has the side effect of your labels becoming more noisy.