Understanding LSTM Networks

Understanding LSTM Networks

Paper 4/30 of the Ilya 30u30

Link to paper: Understanding LSTM Networks

Summary of notes

LSTMs = factorio

My Notes:

They are essentially a conveyor belt of information (called 'cell state'). Each token in the input's sequence acts like an employee (a 'gate') that may or may not alter the product moving through. At each workstation, there are three employees ('gates') that can alter info on the conveyor belt:

LSTM Variants:

Greff et al. 2015 tested tons of popular variants and found they were all the same.

Jozefowicz et al. tested over 10k RNN architectures and found some worked better than LSTMs on some tasks.

Article's conclusion: RNN -> LSTM -> (we should add attention!) (he said this in 2015 btw lol)