7x70

home

ImageNet with Deep ConvNets

Make nets super optimized for training on the GPU

Significantly more parameters than neurons (60M vs 650k)

Trained on synthetic data made from horizontal reflections, changing RGB intensity etc

Dropout in fully connected layers

Dropout: randomly set hidden neurons to 0

Depth is super important, network degraded with removal of any conv layer. (so... they could have added more conv layers for more improvement)

Deep makes sense.. more translations = greater ability to organize input data into useful subsets

More layers also = more information stored.. whoa so translations are stores of information. Makes sense. Just never thought of a function as doubling as a storage mechanism. Dang ok ok

My Thought on Overfitting:

Your dataset is a subset of the population

You want the model to perform for the population

Generalized models perform well on the population

The 80/20 of the dataset is also likely the 80/20 of the population
Generalized models 'consider' these top ~20% of factors

Overfitting is when a model 'considers' factors in the dataset that are not in the population

Thus performing worse on the population

Size of their net caused overfitting issues. Makes sense because the model has more storage to memorize the 80% of factors that only make up 20% of the outcome, thus storing data more specific to the dataset, rather than learning more fundamental patterns (which means they would also likely show up in the population, which means generalizer)

Interesting, they 'centered' the RGB values around 0 by subtracting the average (of the entire data set's pixels) from each value. Makes sense

Someone on StackOverflow answered my exact question lol StackOverflow Answer

ReLU seems much better than saturating functions so why even use those???

Their Overfitting Reduction Strategies:

Data augmentation, make synthetic data that preserves labels