Practical Notes on Transformer Implementation

A short technical note on building decoder-style transformers from scratch.

Start with a minimal decoder block and make architecture changes one variable at a time.
Keep training/evaluation scripts simple and reproducible; add ablations after baseline stability.
Track not only loss but generation quality and failure modes under long-context prompts.