Identity Mappings in Deep Residual Networks

Identity Mappings in Deep Residual Networks

Link to paper: Identity Mappings in Deep Residual Networks

Plan:

My Guess:

Identity mappings in deep residual nets. Residual nets were created to help models converge to the identity function. By using residuals/skips, the identity function becomes all zeros rather than all ones and is easier for the model to converge to.

Abstract Notes:

They use identity mapping as the skip connection. Why do they have to even use the construct h(x) = x? Why not just say 'x'? y = h(x) + F(x, W), why not just x + F(x, W)? (F is a residual function btw) (I assume there is a reason for this, but on its face it seems unnecessary)

Paper Notes:

Main Takeaways: