ImageNet with Deep ConvNets Paper
Make nets super optimized for training on the GPU
Significantly more parameters than neurons (60M vs 650k)
Trained on synthetic data made from horizontal reflections, changing RGB intensity etc
Dropout in fully connected layers
Dropout: randomly set hidden neurons to 0
Depth is super important, network degraded with removal of any conv layer. (so... they could have added more conv layers for more improvement)
Deep makes sense.. more translations = greater ability to organize input data into useful subsets
More layers also = more information stored.. whoa so translations are stores of information. Makes sense. Just never thought of a function as doubling as a storage mechanism. Dang ok ok
Your dataset is a subset of the population
You want the model to perform for the population
Generalized models perform well on the population
Overfitting is when a model 'considers' factors in the dataset that are not in the population
Size of their net caused overfitting issues. Makes sense because the model has more storage to memorize the 80% of factors that only make up 20% of the outcome, thus storing data more specific to the dataset, rather than learning more fundamental patterns (which means they would also likely show up in the population, which means generalizer)
Interesting, they 'centered' the RGB values around 0 by subtracting the average (of the entire data set's pixels) from each value. Makes sense
Someone on StackOverflow answered my exact question lol StackOverflow Answer
ReLU seems much better than saturating functions so why even use those???