Seq2Seq for Sets

Seq2Seq for Sets

Seq2Seq for Sets Paper

Procedure:

Do pre-article thinking to strengthen memory as I read via active recall

Generate summary of article with AI and read it

Read the paper

Pre-Article Thinking for Active Recall as I Read

Sets are unordered so how do you put them in a seq2seq RNN?????

Apparently order matters which makes sense but is a super interesting premise for a paper

Ok so, how do you deal with this?

AI Summary of the Paper Notes

Read_Process-Write model uses the attention mechanism to make order not matter

Read: embed

Process: do stuff with embeddings

Write: make the output

Ok makes sense lol

Paper Notes

Ok so order matters with respect to performance even when it shouldn't

Authors picked an attention mechanism that doesn't care about order

Then they do encode -> process -> decode in their RPW network

Ok lol

Read step is a small MLP to embed each input

Process step is done with an LSTM repeatedly attending to the embedded input values

Writes block is an LSTM pointer network (other paper nice) which takes the qT from the processing step and uses it to point to elements in the input, one step at a time

Slower convergence requires not only more time, and more compute, but also more data!! Makes sense but I didn't see it before. Dang so data is a resource that gets burned through.

Summary

The network takes a set of inputs

Embeds them

Does order-invariant attention to process the embeddings

Processes this encoding to output a list of indices

To Learn in Future

Seq2Seq

Chain rule for probabilities

Post Paper Notes

Seq2Seq is a fundamental of these types of papers and I should read more about it. I get the general idea but not as well as I should know it