Written by the people
who built Opteryx.
Engine internals, query planning, storage, and the occasional field note. No fluff.
A JSONL Reader That Only Reads What You Ask For
How Opteryx's new JSONL reader pushes projections and filters into the document scan itself, and why that changes the cost profile for log queries.
More posts
Redesigning the String Header — Prefix or Hash?
Why we deviated from the textbook long-string layout and moved to a hybrid 16-byte header tuned for equality-heavy SQL workloads.
A Toolbox of Hash Structures — Why One Hash Table Isn't Enough
How replacing a single general-purpose hash table with a family of specialised structures — including a direct-addressed bit-array — drove meaningful performance gains across joins, filters, and deduplication in Opteryx.
Five Benchmarks, One Engine
How we run five industry benchmarks to close blind spots exposed by our execution-layer rewrite and measure Opteryx against other engines.
100x Aggregates — making aggregation faster
How we replaced Python-materialised aggregates with native methods and saw 10–100x speedups.
When we Stopped Using Regex for REGEXP_REPLACE
REGEXP_REPLACE dominated query time. Swapping regex engines didn't help. We built a specialised DFA instead.
Rewriting the Memory Model Moving Beyond Arrow
Why we replaced Arrow in Opteryx to break through a fundamental performance barrier.
Making LIKE Faster: From 93 Seconds to Single Digits
How optimising the LIKE operator — turned a 93-second query into sub-10-second execution through algorithmic improvements and GIL-aware design.
10x Faster Memory Management: Optimising Opteryx's Core Memory Pool
How a targeted change to the MemoryPool implementation produced a 10x improvement by moving metadata into C++ while keeping the Python API unchanged.
Building a specialized hash table to beat Abseil
Our custom hash table achieves faster build times than Abseil; here's what the actual benchmarks show on the probe path.
What If the Docs Wrote Themselves?
Why Opteryx is shifting documentation generation closer to the code.
Rewriting the I/O Stack
The new I/O layer and why it matters.