📔 Change logs

2026-06

Published: Extra: Model Distillation — pouring big-model behavior into smaller models
Published: 13: AI Native Product Design — making probabilistic systems feel reliable
Published: 12: LLM Engineering — KV cache, inference cost, and deployment systems
Published: 11: Agents — from chatbots to task-execution systems
Published: 10: Tool Use — from saying things to doing things
Published: “Understanding LLMs from First Principles” 06: Scaling Laws and Emergence, 07: Inference and Generation, 08: The Nature of Hallucination, and 09: RAG
Updated: gave the “Understanding LLMs from First Principles” series a full polish pass — tightened mechanism accuracy and technical details, standardized and completed section illustrations, and added series entrance cards to the home page

2026-05

Published: 01: The First Principle of LLMs: Token Prediction
Updated: 01: The First Principle of LLMs: Token Prediction — added 5 section illustrations
Published: 02: Token and Embedding: How Language Becomes Numbers
Updated: 02: Token and Embedding: How Language Becomes Numbers — added 5 section illustrations
Published: 03: Transformer and Attention: How Models “See” Context
Updated: 03: Transformer and Attention: How Models “See” Context — added 5 explanatory diagrams
Published: 04: Language as Compression of the World: Why Prediction Can Become Intelligence
Updated: 04: Language as Compression of the World: Why Prediction Can Become Intelligence — added 5 explanatory diagrams
Published: 05: Pretraining, Fine-tuning, and Alignment: From Continuation Machine to Assistant — bundled with 6 explanatory diagrams
Published: The Math Behind LLM Pricing 01: How Inference Actually Works
Published: The Math Behind LLM Pricing 02: Writing Inference as Equations — bundled with an interactive T_compute / T_memory simulator
Published: The Math Behind LLM Pricing 03: From Inference Latency to Inference Cost
Published: The Math Behind LLM Pricing 04: Cracking Open the KV Cache
Updated: The Math Behind LLM Pricing 03: From Inference Latency to Inference Cost — added a Cost / Token interactive simulator
Published: The Math Behind LLM Pricing 05: From One GPU to a Cluster — Parallelism and Interconnect
Updated: 👋 Hello, World! — added a “Recently Published” section listing the latest posts

2026-01

Published: Vibe Coding

2025-12

Published: Intent Recognition

2025-06

Site upgraded to Nextra-4
Updated: User Value
Published: Case Study: Yuque’s Consumer Product Line

2024-04

Updated: User Value
Updated: 🔗 RAG Intro

2024-03

Published: RAG (Retrieval-Augmented Generation) Practice Sharing
Published: Annotation Reply

2024-02

i18n Support
Published: Softwaer as a Service
Published: User Value
Published: The 7 Question of Product Design
Structure Adjusted
Updated site’s domain: https://insights.kaho.io
Published: Dictionary
Published: Japan Journey Gallery

2024-01

Site Up! 👋 Hello, world!
Published: 🇯🇵 Japan Journey

Last updated: June 17, 2026

👋 Hello, world!💻 Vibe Coding