Practical Notes on Transformer Implementation
A short technical note on building decoder-style transformers from scratch.
- Start with a minimal decoder block and make architecture changes one variable at a time.
- Keep training/evaluation scripts simple and reproducible; add ablations after baseline stability.
- Track not only loss but generation quality and failure modes under long-context prompts.