The Message Queue I Built in 45 Minutes (And Why You Should Too)

There’s a specific kind of panic I’ve seen in every backend team at least once. Someone says “the queue is backed up,” and the room goes quiet. Not because the problem is hard — but because nobody in the room can explain what’s actually happening inside the queue. They know how to restart it. They don’t know how it works.

And that gap — between operating a message queue and understanding one — is exactly what costs you at 2 AM when messages start disappearing and Grafana just shows a red line going up.

The black box problem

Here’s the pattern. A team adopts Kafka. They follow a tutorial. They configure consumer.group.id and auto.offset.reset and max.poll.records. It works. Nobody asks why.

Then reality shows up.

Queue latency spikes. Someone suggests adding partitions — but nobody on the team has ever implemented message ordering, so they don’t realize partitioning will break it. Memory usage climbs. Someone blames the JVM — but the real issue is unbounded in-flight messages that were never acknowledged. A consumer crashes mid-processing. Messages vanish. Because nobody understood the difference between at-most-once and at-least-once delivery at the implementation level.

These aren’t hypothetical. I’ve seen all three.

The root cause is always the same: engineers using complex systems they’ve never looked inside. They can’t reason about the data structures. They can’t tell you what state the queue is in. They’ve never written queue code, so they’re afraid to touch queue behavior.

What if you built a message broker from scratch? Not a toy. Not a half-finished tutorial that falls apart under concurrency. A real broker with a binary wire protocol, bounded queues, ACK/NACK semantics, and disk persistence. In 45 minutes.

Then you’d be the person in the room who can actually answer the question.

What DMS looks like from the outside

DMS is a message broker in C. Producers send messages to named queues. Consumers pull from those queues. That’s it.

But the implementation is where every interesting thing happens.

The producer doesn’t wait for the consumer. Doesn’t care if the consumer exists. It pushes a message onto the queue and returns. The consumer can connect now, tomorrow, next week. The queue sits in between — a buffer between two systems that don’t need to know about each other. Once you see the queue_enqueue call that copies data into a ring buffer and signals a condition variable, you understand decoupling at a level no diagram can give you.

Messages stay until a consumer explicitly acknowledges them. Not retained by time window like Kafka. Not auto-deleted on read. The consumer gets the message, processes it, sends an ACK — only then is the slot freed. A NACK sends the message back for redelivery. No silent data loss. No ambiguity.

The protocol is binary, not REST. A 16-byte header followed by the payload:

 Wire format (16-byte header)
┌──────────┬──────────┬──────┬─────────┬────────┬────────┐
│  Magic   │ Payload  │ Type │ Version │ Status │ Msg ID │
│ 4 bytes  │ 4 bytes  │ 1 B  │  1 B    │ 2 B    │ 4 B    │
│ DEADBEEF │  length  │      │         │        │        │
└──────────┴──────────┴──────┴─────────┴────────┴────────┘

The magic number 0xDEADBEEF catches framing errors immediately. The length prefix eliminates parsing ambiguity. You can read the entire protocol in one sitting.

Thread safety isn’t bolted on. Each queue has its own mutex and two condition variables — one for not-empty, one for not-full. Producers and consumers block on the right condition and wake each other through signals. Lock ordering is enforced: registry lock first, then queue lock, never reversed.

What happens when you build one

Reading about queues gives you maybe 10% understanding. Building one gives you the rest.

When you implement queue_enqueue, you learn how a ring buffer manages head and tail pointers. When you add pthread_mutex_lock around the critical section, you feel why concurrent access without synchronization corrupts shared state. When you write pthread_cond_signal — the call that wakes a sleeping consumer — you understand the mechanism behind every blocking queue in every language you’ll ever use.

// The entire enqueue critical path. ~20 lines of real logic.
pthread_mutex_lock(&queue->lock);

if (queue->count >= queue->capacity) {
    pthread_mutex_unlock(&queue->lock);
    return ERR_QUEUE_FULL;     // backpressure, not a crash
}

queue_entry_t *entry = &queue->entries[queue->tail];
entry->delivery_id = atomic_fetch_add(&queue->next_delivery_id, 1);
entry->data_length = data_length;
memcpy(entry->data, data, data_length);

queue->tail = (queue->tail + 1) % queue->capacity;
queue->count++;

pthread_cond_signal(&queue->not_empty);   // wake one sleeping consumer
pthread_mutex_unlock(&queue->lock);

Next time a dashboard shows your queue at 90% capacity, you won’t just see a number. You’ll see the ring buffer filling up, the tail approaching the head, the count nearing capacity. You’ll know exactly what that means.

This isn’t toy code, either. DMS compiles with -Wall -Wextra -Werror and zero warnings. The queue handles every allocation failure — if calloc returns NULL, it cleans up the mutex, the condition variables, and propagates the error. The wire protocol handles partial TCP reads with a recv loop that retries on EINTR:

// TCP does NOT guarantee a single recv() gets all bytes.
while (remaining > 0) {
    ssize_t received = read(fd, ptr, remaining);
    if (received < 0) {
        if (errno == EINTR) continue;  // interrupted by signal, retry
        return -1;
    }
    if (received == 0) return -1;      // connection closed
    ptr += received;
    remaining -= (size_t)received;
}

Most engineers would spend two or three weeks building this from zero. With DMS, you start from working code and spend your time understanding it instead of debugging it.

The patterns you’ll see everywhere

After building DMS, the producer/consumer pattern stops being abstract. You see it in everything.

 DMS                          Kafka
┌──────────────────┐         ┌──────────────────────┐
│ Ring buffer       │  ───►  │ Partition log         │
│ head/tail/count   │        │ base offset / LEO     │
├──────────────────┤         ├──────────────────────┤
│ queue_enqueue()   │  ───►  │ Producer append       │
│ queue_dequeue()   │        │ Consumer fetch        │
├──────────────────┤         ├──────────────────────┤
│ registry_lock     │  ───►  │ ZooKeeper / KRaft     │
│ (queue discovery) │        │ (topic coordination)  │
├──────────────────┤         ├──────────────────────┤
│ ERR_QUEUE_FULL    │  ───►  │ max.block.ms          │
│ (backpressure)    │        │ (buffer memory limit) │
└──────────────────┘         └──────────────────────┘

Kafka distributes across machines. DMS runs on one. Kafka persists to an append-only log. DMS keeps messages in memory, with optional persistence to .dmsq files. Kafka has hundreds of knobs. DMS has six command-line flags.

The insight isn’t that DMS is better. It’s that DMS makes Kafka legible.

The partition log is a queue with a different storage engine. The consumer group protocol is coordination over who reads from which queue. The replication protocol is copying queue state across machines.

When you read Kafka source after building DMS, you recognize everything. You see a more sophisticated version of something you already understand. That recognition — seeing the simple structure inside the complex system — is what separates engineers who understand distributed systems from engineers who just deploy them.

Inside the broker

The architecture is deliberately simple. That’s the point.

                    ┌─────────────────┐
  Client A ──TCP──► │                 │
  Client B ──TCP──► │  Server (main)  │──► spawns one thread per client
  Client C ──TCP──► │                 │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │ Queue Registry  │  (array of queue_t*, protected
                    │ registry_lock   │   by a global mutex)
                    └────────┬────────┘
                             │
              ┌──────────────┼──────────────┐
              ▼              ▼              ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
        │ "orders" │  │ "events" │  │ "logs"   │
        │ mutex    │  │ mutex    │  │ mutex    │
        │ ring buf │  │ ring buf │  │ ring buf │
        └──────────┘  └──────────┘  └──────────┘

Each queue is a ring buffer — a fixed-size array with head and tail indices. Enqueueing copies the message into the slot at tail and advances the pointer. Dequeueing reads from head. Both O(1). The ring wraps with modular arithmetic. No linked list, no heap allocation per message.

Thread safety is where 90% of queue implementations break. DMS gives each queue its own mutex. Only one thread modifies the queue at a time. The condition variables handle coordination — a consumer on an empty queue sleeps until a producer signals it. A producer hitting a full queue gets an error immediately. No deadlocks. No spinning.

The decoupling is the real insight. Producer sends a message, gets confirmation, disconnects. Consumer connects later, requests a message, blocks until one arrives. They never coordinate directly. They never need to be online at the same time. The queue is the mediator.

This is why message queues transformed distributed systems — and once you see it working in code you read line by line, the abstraction clicks permanently.

The file structure

DMS/
├── protocol.h    383 lines  — Wire format, message types, I/O helpers
├── queue.h       483 lines  — Ring buffer, mutex, condition variables
├── server.c      942 lines  — Broker: registry, client handling, persistence
├── client.c                 — CLI: produce, consume, stress test
└── Makefile       50 lines  — Two targets, zero dependencies

No build system generator. No vendored libraries. No abstraction layers. The entire broker fits in your head.

Getting started

git clone https://github.com/devwail/DMS.git
cd DMS
make

GCC, C11, pthreads. No external dependencies. Two binaries.

./server -p 9999 -l 0 &
./client create -q test -p 9999
./client produce -q test -m "hello world" -p 9999
./client consume -q test -p 9999

You just created a named queue, pushed a message, and consumed it with acknowledgment. The broker handled framing, deserialization, synchronization, and delivery confirmation.

Now read the code. Start with queue_enqueue in queue.h — 20 lines of real logic. Then the recv/send helpers in protocol.h — see how they handle partial TCP reads and network byte order. Then server.c — one thread per client, dispatching by message type. The whole system, readable in one sitting.

Forty-five minutes to build. Forty-five minutes to read. Ninety minutes total.

Systems knowledge compounds. Every queue you understand makes the next distributed system easier. Every mutex you’ve written makes the next concurrency bug faster to find. You get that from reading code and building systems — not from documentation. DMS is on GitHub.