The story of this book

Through 13 years of building applications - across languages, frameworks, and trends - one thing became clear: databases outlive every tech stack. And the truth is? We developers know very little about them. So let's build one!

I'm not good enough for low-level systems

I used to think systems programming required a genius-level IQ.

Then I navigated to /var/lib/mysql and found… files.

Plain, editable files that store table data.

That's when I realized: database engines aren't magic. But understanding these concepts still seemed daunting.

ThePrimeagen has a funny mustache

Then a random YouTube video changed everything.

The guy had a funny mustache so I clicked. Instantly. ThePrimeagen was building his own network protocol in Golang as a fun side project.

Then came tsoding - building a websocket protocol in his own programming language while being 100% cringe.

Come on… I was inspired and humiliated at the same time. A unique feeling that only those guys can give you. And your wife.

After a while, inspiration won. If this guy, with his mustache can do stuff like that, then I’m going to build my own database engine.

Why build a DB engine

So why should you build your own database engine?

Because you will be unstoppable.

Move out of your comfort zone. Understand what 90% never will.

You'll architect solutions while others fight symptoms.

You won't be replaced by LLMs.

That's why.

PRE-ORDER NOW
TLV

Storage layer

Exploring how real database engines store your data.

Starting from a naive CSV-based approach we slowly work our way up to TLV-encoded binary files that can encode any type of data into an efficient, language-agnostic format.

We're going to store column definitions, records, B-Tree indexes, WAL log entries, hash indexes in TLV format.

B-Tree indexes and data pages

Exploring how real database engines handle indexing.

Each table is organized into 4KB data pages. That simple trick reduces the number of I/O operations.

After that, we introduce B-Tree indexes where each node points to a specific page.

Using the index, it's blazing fast to read an entire page from the disk.

Indexing
LRU

Page-level caching

Exploring how real database engines cache your query results.

Caching entire data pages instead of individual records or result sets. This technique exploits data locality and requires less I/O.

Implemented by an LRU (least recently used) cache backed by a linked list and a hash map.

Part I

Building a database engine is a 2-part book. Part I is 129 pages and contains these chapters:

01

Storage format

From a naive CSV-based approach, through a fixed-size format, to a variable-length TLV based storage system.

02

TLV implementation

An efficient storage format is the foundation of the entire database engine. In this chapter, we implement reusable TLV encoders and decoders with generics.

03

Columns

Before writing data to tables the engine needs to handle columns and column options such as `nullable`

04

Project structure

The project will follow package-oriented design. In this chapter, we establish the main packages.

05

Databases and tables

It's time to create and store databases and tables on the disk. We'll follow Postgres' format. Each database is a folder, each table is a file.

06

Insert

Implementing insert which needs to encode a hash map to a TLV encoded record and store it in the table file.

07

Select

Implementing select. At this stage, it's a full table scan. Meaning it reads and decodes the entire table file sequentially.

08

Delete

Implementing delete which works the same as MySQL's delete. Meaning it doesn't actually delete the bytes. It only marks them as "deleted." It's more efficient.

09

Update

Implementing update which is a combination of delete and insert.

Part II

Building a database engine is a 2-part book. Part II is 135 pages and contains the following chapters. This is where the fun begins.

01

WAL

Adding Write-Ahead Log which will guarantee a certain degree of fault tolerance. Even if the engine fails while inserting, the data won't be lost.

02

Data pages

Organizing each table into 4KB data pages. This is a crucial step to support B-Tree indexes. On top of that, it's more efficient in lots of situations.

03

B-Tree indexes

In this chapter, we're going to implement a B-Tree based primary index for tables. It will result in blazing fast lookups.

04

Page-level caching

Adding an LRU-based page-level cache backed by a linked list and a hash map. This will also increase to performance of SELECT queries.

05

Hash-based full-text indexes

Finally, another type of index. It is backed by a hash map instead of a B-Tree. We'll use this to support some basic full-text search functionality.

Appendix

There's a 98-page appendix that teaches you MySQL-related concepts

01

Database indexing

This massive 60-page chapter teaches everything about database indexing. Theory and practice are both included.

02

MySQL can do more than you think

This chapter includes the most interesting MySQL features that lots of developers don't know about. Things like CTEs, windows functions, partitioning.

03

Query optimization 101

Discover the fastest and easiest tricks and tips you can apply to optimize your queries.

04

Understanding ACID

One of the most important properties of a relational database is ACID. Atomicity, consistency, isolation, durability.


LOOK INSIDE

Download a 40-page sample chapter

Download a sample chapter and take a look at the content

Implemented in Golang

You don't need prior Go experience to understand the content. It's a very simple language with only 25 reserved keywords. We'll use 20 of those. I'll explain every Go-specific thing but there aren't many of them (defer is the most "advanced").

Golang

Packages

If you pre-order now you'll get the content on the 14th of April 09:37AM CET

Basic

  • The 129-page part I
  • Money-back guarantee
  • The 124-page part II
  • Source code to the DB engine
  • Lifetime updates

Regular price $39

Pre-order for $29

Premium

  • The 129-page part I
  • The 135-page part II
  • Source code to the DB engine
  • Lifetime updates
  • Money-back guarantee

Regular price $129

Pre-order for $99

Plus

  • The 129-page part I
  • Source code to the DB engine
  • Money-back guarantee
  • The 124-page part II
  • Lifetime updates

Regular price $79

FAQs

When will the book be published?

On the 15th of April, 2025

When do I receive the book if I pre-order?

On the 14th of April, 2025 09:37AM CET

Do you offer a money-back guarantee?

Yes. No questions asked. Just reach out to me at martin@martinjoo.dev

The same applies to pre-orders. If you did a pre-order but you changed your mind before the book was published, just send me an e-mail. No hard feelings.

What do we build in the book?

A database storage engine that implements: data storage, insert, update, delete, select, WAL, b-tree based primary index, hash-based full-text index.

It is not a DBMS so it does not include a server, a client, or a SQL parser.

It is similar to InnoDB so it exposes functions like Select or Insert and you can use them in the main.go to test things.

For simplicity reasons, the engine does not support transactions or concurrency.

Every feature we build in the book is based on production databases, but they are oversimplified. Otherwise, the book would have been 1,000+ pages long.

Do I need prior knowledge about database internals?

No. Everything is explained from scratch. While reading the book you'll gain important knowledge about real-world database engines. Especially about indexing.

What programming language do you use?

The engine is written in Go.

Do I need Go experience to understand the content?

No.

First of all, Golang is a very simple language. It contains only 25 reserved keywords. From which we use 20. In comparison, PHP comes with 73, while javascript has 56 of them.

Secondly, I'll explain every Go-specific thing in the book. But there aren't that many.

So, the only language you need to speak is English.

Also, if you'd like to learn Go, this book is a great starting point, in my opinion.

Exactly what do I get?

Basic package: book part I in PDF and HTML format.

Plus package: book part I in PDF and HTML format and the source code to the database engine.

Premium package: book part I & part II in PDF and HTML format. The source code to the database engine. This package also includes future updates.

About the author

I'm Martin Joo, a software engineer since 2012 and obsessed with databases. Especially with boring ones such as MySQL.

You can find me on Twitter and Substack. You can also check out my blog.

Martin Joo
PRE-ORDER NOW