Build a database engine. Upgrade from library consumer to systems architect.
Storing table data in TLV encoded binary files.
Using Write-Ahead Logs to have fault tolerance.
Using 4KB data pages with a page-level LRU cache for efficiency.
Building B-Tree and hash-based indexes for fast look ups.
Through 13 years of building applications - across languages, frameworks, and trends - one thing became clear: databases outlive every tech stack. And the truth is? We developers know very little about them. So let's build one!
I used to think systems programming required a genius-level IQ.
Then I navigated to /var/lib/mysql and found… files.
Plain, editable files that store table data.
That's when I realized: database engines aren't magic. But understanding these concepts still seemed daunting.
Then a random YouTube video changed everything.
The guy had a funny mustache so I clicked. Instantly. ThePrimeagen was building his own network protocol in Golang as a fun side project.
Then came tsoding - building a websocket protocol in his own programming language while being 100% cringe.
Come on… I was inspired and humiliated at the same time. A unique feeling that only those guys can give you. And your wife.
After a while, inspiration won. If this guy, with his mustache can do stuff like that, then I’m going to build my own database engine.
So why should you build your own database engine?
Because you will be unstoppable.
Move out of your comfort zone. Understand what 90% never will.
You'll architect solutions while others fight symptoms.
You won't be replaced by LLMs.
That's why.
Exploring how real database engines store your data.
Starting from a naive CSV-based approach we slowly work our way up to TLV-encoded binary files that can encode any type of data into an efficient, language-agnostic format.
We're going to store column definitions, records, B-Tree indexes, WAL log entries, hash indexes in TLV format.
Exploring how real database engines handle indexing.
Each table is organized into 4KB data pages. That simple trick reduces the number of I/O operations.
After that, we introduce B-Tree indexes where each node points to a specific page.
Using the index, it's blazing fast to read an entire page from the disk.
Exploring how real database engines cache your query results.
Caching entire data pages instead of individual records or result sets. This technique exploits data locality and requires less I/O.
Implemented by an LRU (least recently used) cache backed by a linked list and a hash map.
Building a database engine is a 2-part book. Part I is 129 pages and contains these chapters:
From a naive CSV-based approach, through a fixed-size format, to a variable-length TLV based storage system.
An efficient storage format is the foundation of the entire database engine. In this chapter, we implement reusable TLV encoders and decoders with generics.
Before writing data to tables the engine needs to handle columns and column options such as `nullable`
The project will follow package-oriented design. In this chapter, we establish the main packages.
It's time to create and store databases and tables on the disk. We'll follow Postgres' format. Each database is a folder, each table is a file.
Implementing insert which needs to encode a hash map to a TLV encoded record and store it in the table file.
Implementing select. At this stage, it's a full table scan. Meaning it reads and decodes the entire table file sequentially.
Implementing delete which works the same as MySQL's delete. Meaning it doesn't actually delete the bytes. It only marks them as "deleted." It's more efficient.
Implementing update which is a combination of delete and insert.
Building a database engine is a 2-part book. Part II is 135 pages and contains the following chapters. This is where the fun begins.
Adding Write-Ahead Log which will guarantee a certain degree of fault tolerance. Even if the engine fails while inserting, the data won't be lost.
Organizing each table into 4KB data pages. This is a crucial step to support B-Tree indexes. On top of that, it's more efficient in lots of situations.
In this chapter, we're going to implement a B-Tree based primary index for tables. It will result in blazing fast lookups.
Adding an LRU-based page-level cache backed by a linked list and a hash map. This will also increase to performance of SELECT queries.
Finally, another type of index. It is backed by a hash map instead of a B-Tree. We'll use this to support some basic full-text search functionality.
There's a 98-page appendix that teaches you MySQL-related concepts
This massive 60-page chapter teaches everything about database indexing. Theory and practice are both included.
This chapter includes the most interesting MySQL features that lots of developers don't know about. Things like CTEs, windows functions, partitioning.
Discover the fastest and easiest tricks and tips you can apply to optimize your queries.
One of the most important properties of a relational database is ACID. Atomicity, consistency, isolation, durability.
Download a sample chapter and take a look at the content
You don't need prior Go experience to understand the content. It's a very simple language with only 25 reserved keywords. We'll use 20 of those. I'll explain every Go-specific thing but there aren't many of them (defer is the most "advanced").
If you pre-order now you'll get the content on the 14th of April 09:37AM CET
On the 15th of April, 2025
On the 14th of April, 2025 09:37AM CET
Yes. No questions asked. Just reach out to me at martin@martinjoo.dev
The same applies to pre-orders. If you did a pre-order but you changed your mind before the book was published, just send me an e-mail. No hard feelings.
A database storage engine that implements: data storage, insert, update, delete, select, WAL, b-tree based primary index, hash-based full-text index.
It is not a DBMS so it does not include a server, a client, or a SQL parser.
It is similar to InnoDB so it exposes functions like Select or Insert and you can use them in the main.go to test things.
For simplicity reasons, the engine does not support transactions or concurrency.
Every feature we build in the book is based on production databases, but they are oversimplified. Otherwise, the book would have been 1,000+ pages long.
No. Everything is explained from scratch. While reading the book you'll gain important knowledge about real-world database engines. Especially about indexing.
The engine is written in Go.
No.
First of all, Golang is a very simple language. It contains only 25 reserved keywords. From which we use 20. In comparison, PHP comes with 73, while javascript has 56 of them.
Secondly, I'll explain every Go-specific thing in the book. But there aren't that many.
So, the only language you need to speak is English.
Also, if you'd like to learn Go, this book is a great starting point, in my opinion.
Basic package: book part I in PDF and HTML format.
Plus package: book part I in PDF and HTML format and the source code to the database engine.
Premium package: book part I & part II in PDF and HTML format. The source code to the database engine. This package also includes future updates.
©2025 thedatabasebook.com. All rights reserved.
Privacy Policy | Terms and conditions
Published: 2025.04.15