Abstract BPE token tiles used as the cover for LLM Engineering From Scratch

LLM Engineering From Scratch

Overview LLM Engineering From Scratch is a public learning series that rebuilds core LLM mechanics one project at a time. Each project pairs a small runnable implementation with plots, stress cases, explanations, and a reproducible artifact. Roadmap inspiration: Ahmad Osman (@TheAhmadOsman) and his article, “Step-By-Step LLM Engineering Projects (2026 Edition)”. The repository uses the article as a roadmap reference; the implementations, experiments, traces, demos, and writeups are independent. Links GitHub Repository Tokenizer From Scratch Blog Post BPE Merge Microscope Demo Status The first project, Tokenizer From Scratch, is implemented with a byte-level BPE tokenizer, deterministic artifacts, and an interactive static demo. The planned sequence continues through embeddings, positional methods, attention, Transformer blocks, training loops, and objectives. ...

Jun 30, 2026 · 2 min · Avishek Saha
Colored token tiles representing byte-pair encoding merges

Tokenizer From Scratch: BPE as Learned Compression

A from-scratch byte-level BPE tokenizer with runnable Python, failure cases, charts, and an interactive trace that shows every merge step.

Jun 30, 2026 · 3 min · Avishek Saha