7x70

Summary of notes

LSTMs = factorio

My Notes:

RNNs are bad with gaps in context/long term dependencies, for example, a sentence at the beginning of a book that affects the predicted token at the end of the book.
Point of LSTMs is to fix this.
LSTMs are repeating networks.

They are essentially a conveyor belt of information (called 'cell state'). Each token in the input's sequence acts like an employee (a 'gate') that may or may not alter the product moving through. At each workstation, there are three employees ('gates') that can alter info on the conveyor belt:

At each workstation, employees use the current 'token' + the last workstation's output to figure out how to change the product. None of them can actually see it; they just use their info to figure out how to modify it.
First employee figures out the parts of the product that need to be thrown away.
Second employee figures out what parts need to be updated, and how they should be changed.
Third employee actually makes the changes that the first two figured out.
Fourth employee figures out, based on the current token and the product state, what to output. Maybe, this is like making a report on the state of the workstation and product. The report is sent back to the boss (output), and is also sent to the next workstation down the assembly line.

LSTM Variants:

Peephole: All the employees can see the conveyor belt/product (add a feed from cell state into each gate).
Gated Recurrent Unit (GRU): The delete employee and update employee are combined into one employee.

Greff et al. 2015 tested tons of popular variants and found they were all the same.

Jozefowicz et al. tested over 10k RNN architectures and found some worked better than LSTMs on some tasks.

Article's conclusion: RNN -> LSTM -> (we should add attention!) (he said this in 2015 btw lol)

Understanding LSTM Networks

Summary of notes

My Notes:

LSTM Variants: