RNN/LSTM can't do problems with non-fixed inputs and outputs
The pointer networks can generalize past sequence lengths larger than what they were trained on
Pointer network uses attention mechanism
Net1: input -> embedding (using attention?)
Net2: embedding (context-rich?) -> output
Create pointers to input elements using the attention mechanism
I thinkit solves problems of the form: `given X={x0, x1, x2, x3..xn} find the m-sized subset S that satisfies f(*S)` for arbitrary n