Try Opteryx
Engineering Blog

Written by the people
who built Opteryx.

Engine internals, query planning, storage, and the occasional field note. No fluff.

2026-06-05-json-readerSELECT * FROM insights-- How Opteryx's new JSONL reader pushes projections and filters into the document scan itself, and why that changes the cost profile for log queries.
Latest post

A JSONL Reader That Only Reads What You Ask For

How Opteryx's new JSONL reader pushes projections and filters into the document scan itself, and why that changes the cost profile for log queries.

JJJustin Joyce·2026-06-05

More posts

STRINGS REDESIGNED
Engineering

Redesigning the String Header — Prefix or Hash?

Why we deviated from the textbook long-string layout and moved to a hybrid 16-byte header tuned for equality-heavy SQL workloads.

2026-05-22
PERFECT HASH
Engineering

A Toolbox of Hash Structures — Why One Hash Table Isn't Enough

How replacing a single general-purpose hash table with a family of specialised structures — including a direct-addressed bit-array — drove meaningful performance gains across joins, filters, and deduplication in Opteryx.

2026-05-15
BENCHMARKS
Engineering

Five Benchmarks, One Engine

How we run five industry benchmarks to close blind spots exposed by our execution-layer rewrite and measure Opteryx against other engines.

2026-05-08
100X AGGREGATES
Engineering

100x Aggregates — making aggregation faster

How we replaced Python-materialised aggregates with native methods and saw 10–100x speedups.

2026-05-02
REGEX REPLACE
Engineering

When we Stopped Using Regex for REGEXP_REPLACE

REGEXP_REPLACE dominated query time. Swapping regex engines didn't help. We built a specialised DFA instead.

2026-04-24
MEMORY MODEL
Engineering

Rewriting the Memory Model Moving Beyond Arrow

Why we replaced Arrow in Opteryx to break through a fundamental performance barrier.

2026-04-16
LIKE PERFORMANCE
Engineering

Making LIKE Faster: From 93 Seconds to Single Digits

How optimising the LIKE operator — turned a 93-second query into sub-10-second execution through algorithmic improvements and GIL-aware design.

2026-04-10
MEMORY POOL
Engineering

10x Faster Memory Management: Optimising Opteryx's Core Memory Pool

How a targeted change to the MemoryPool implementation produced a 10x improvement by moving metadata into C++ while keeping the Python API unchanged.

2026-04-03
CARCHAR
Engineering

Building a specialized hash table to beat Abseil

Our custom hash table achieves faster build times than Abseil; here's what the actual benchmarks show on the probe path.

2026-03-26
AUTO DOCS
Engineering

What If the Docs Wrote Themselves?

Why Opteryx is shifting documentation generation closer to the code.

2026-03-19
REWRITING IO STACK
Engineering

Rewriting the I/O Stack

The new I/O layer and why it matters.

2026-03-12