Went looking for free analytics for a side project, ended up shipping code to 200k+ customers.
I discovered PostHog while hunting for a cheap way to add analytics to a side project. I was so impressed by what they offer for free that I started digging into their codebase to see if it was real. That led me to the batch exports team’s sprint board, and I figured why not just pick something up and build it.
3 weeks later: 3200+ lines of code across 3 PRs, touching Temporal workflows, Django APIs, React frontend, and CI integration. The Azure Blob Storage batch export feature is now live.
Understanding the Problem First
Before writing code, I studied how the existing export destinations were built. PostHog already supported exporting to S3, BigQuery, Snowflake, and Postgres. Just plugging in an SDK wasn’t going to cut it. At this scale, I needed to understand how Azure actually handles uploads: chunking, parallel connections, retry behavior, performance tuning options.
The batch export system uses Temporal for workflow orchestration. Each export destination is a Temporal workflow that queries ClickHouse for events within a time interval, transforms them into the target format, and uploads to the destination. The architecture handles retries, backpressure, and failure recovery automatically.
The Implementation
The feature touches several layers of the stack:
Backend (Python/Django):
The core workflow lives in azure_blob_batch_export.py. It follows the same pattern as other destinations: a workflow class that orchestrates activities, and activities that do the actual work (querying ClickHouse, uploading blobs). Credentials are stored encrypted using PostHog’s Integration model, decrypted only at activity runtime.
Handling Large Exports:
Azure’s SDK supports automatic chunking for large uploads. When data exceeds a configurable threshold, the SDK automatically uses staged block uploads (Azure’s equivalent of S3 multipart). I configured max_single_put_size and max_block_size to let the SDK handle this transparently rather than managing blocks manually.
Frontend (React/TypeScript): Added the destination to the batch export form with fields for authentication (connection string or account name + key), container configuration, file format selection (Parquet/JSONLines), and compression options.
Testing at PostHog’s Bar
The team holds a high testing bar and I wanted to match it. The implementation includes comprehensive test coverage parametrized across:
- Data models (events, persons, sessions)
- File formats (Parquet, JSONLines)
- Compression types (gzip, brotli, zstd)
- Error handling scenarios
Tests run against Azurite, Microsoft’s Azure Storage emulator, so CI doesn’t need real Azure credentials. Each test gets its own container to avoid conflicts.
What Users Get
The feature lets users export PostHog data directly to Azure Blob Storage for their data lakes and downstream analytics. Previously this wasn’t an option. If your company uses Azure as its cloud provider, you can now keep your analytics data in your own infrastructure without routing through S3.
Links
Thanks to Tomás and Ross Gray for the thorough reviews and making this a genuinely enjoyable collaboration. The founders Tim Glaser and James Hawkins have built something unusual at PostHog: a billion dollar company with every line of code open sourced. The radical transparency, high ownership, and lightning fast shipping speed at this scale is genuinely rare.