Do pre-article thinking to strengthen memory as I read via active recall
Generate summary of article with AI and read it
Read the paper
Sets are unordered so how do you put them in a seq2seq RNN?????
Apparently order matters which makes sense but is a super interesting premise for a paper
Ok so, how do you deal with this?
Read_Process-Write model uses the attention mechanism to make order not matter
Read: embed
Process: do stuff with embeddings
Write: make the output
Ok makes sense lol
Ok so order matters with respect to performance even when it shouldn't
Authors picked an attention mechanism that doesn't care about order
Then they do encode -> process -> decode in their RPW network
Ok lol
Read step is a small MLP to embed each input
Process step is done with an LSTM repeatedly attending to the embedded input values
Writes block is an LSTM pointer network (other paper nice) which takes the qT from the processing step and uses it to point to elements in the input, one step at a time
Slower convergence requires not only more time, and more compute, but also more data!! Makes sense but I didn't see it before. Dang so data is a resource that gets burned through.
The network takes a set of inputs
Embeds them
Does order-invariant attention to process the embeddings
Processes this encoding to output a list of indices
Seq2Seq
Chain rule for probabilities
Seq2Seq is a fundamental of these types of papers and I should read more about it. I get the general idea but not as well as I should know it