Neural Machine Translation Paper
Jointly learning to align and translate? Hmm no idea what this means, will find out I guess.
LLM summary (copy/paste paper into LLM and ask for a summary of key points)
Then YouTube if possible (this was ok: YouTube Video)
Then read the paper
Explain it out loud as if I were teaching it to an imaginary person
What does 'jointly tuned' mean here? Each component can be tuned on its own?
AI: You got this backwards, it actually means that the whole model gets trained at once. Traditional statistical ML systems have subcomponents often trained separately. The NMT's encoder and decoder are tuned together, allowing the model to learn more effective representations of the data/align components better because the encoder's output is directly optimized to be useful for the decoder's task of generating translations (whoa!! ok so training components together aligns them better than training them separately, at least in this case...)
Traditional approach had fixed-length vectors which is obviously inefficient, because eventually sequences will be too long to store enough information in the vector size
AI: Correct
Interesting, no masking for future tokens in the attention mechanism? Allows for filling text in, is this correct?
What are source annotations? Are these just embeddings?
AI: No. Annotations are more context-rich than embeddings. They are embeddings + context
Process:
Make a better translator lol (it worked, beat RNN encoder-decoder by a lot)
Computationally expensive, does not handle rare/unknown words well
Basically the main change is that the input isn't smooshed into a fixed-length vector
Translation is about finding the sentence in the new language which maximizes the conditional probability of the original sentence.