Intuitive Understanding of Rotary Position Embedding (RoPE)

This post explains Rotary Position Embedding (RoPE). RoPE is now widely used in modern LLMs including LLaMA, PaLM, and others. The note has been polished by AI based on the original learning notes of the author. The Problem: Vanilla Attention Ignores Position In the transformer attention mechanism, each token produces three vectors: a query (representing “what am I looking for?”), a key (representing “what do I contain?”), and a value (the actual content to retrieve). The attention score between two tokens is computed as the dot product of the query and key: ...

January 5, 2026 · 8 min · Licheng Guo