PostHog Open Source: What You Get, What You Give Up, and Whether It Matters

PostHog open source analytics deployment architecture

Anya is the CTO of a fintech startup with 40 employees and about 15,000 monthly active users. Her company processes financial data, which means every vendor conversation includes the same question: "Where does our data go?" When she first looked at PostHog, the pitch was appealing -- open source, self-host it, own your data. But after digging in, she found the answer to "what's actually open source?" was more nuanced than the marketing suggested.

I've had this conversation with enough CTOs and engineering leads to know the confusion is widespread. So let me lay it out clearly.

What's Actually Open Source

PostHog's core product analytics engine is open source under an MIT license. That includes event capture, basic analytics (trends, funnels, retention), the API, and the ingestion pipeline. The code is on GitHub. You can read it, fork it, run it, modify it. That part is real.

Here's where it gets complicated. PostHog is not one product anymore. It's a platform. And the platform has grown to include session replay, feature flags, A/B testing (experimentation), surveys, a data warehouse connector, and a CDP (customer data platform). Not all of these are open source, and the lines have shifted over time.

Session replay is available on the free tier of PostHog Cloud, but the self-hosted open-source deployment has limitations on recording volume and retention. Feature flags are available in the open-source version with basic functionality, but advanced features like multivariate flags and payloads require the paid plan. A/B testing and experimentation require the paid plan. Surveys require the paid plan.

The pattern is common in open-core companies: the foundational product is open source, and the features that drive revenue are proprietary or gated. PostHog is more transparent about this than most -- they publish their pricing publicly and the free tier is genuinely generous for small teams. But if Anya read "open source" and assumed she'd get the entire platform for free by self-hosting, she'd be wrong.

The Self-Hosting Reality

Self-hosting PostHog was once the default recommendation for teams that wanted data sovereignty. PostHog provided Helm charts for Kubernetes deployments, and plenty of companies ran their own instances on AWS, GCP, or bare metal.

That story has changed. PostHog has been steering teams toward their managed cloud offering for a while now. The self-hosted deployment still works, but it requires a Kubernetes cluster, ongoing maintenance, upgrades, and someone on your team who's comfortable operating ClickHouse at scale (PostHog uses ClickHouse as its analytics database).

For Anya's 40-person fintech, self-hosting means dedicating engineering time to operating analytics infrastructure instead of building the product. She estimated it would take about 10-15 hours per month of ops work to keep a self-hosted PostHog instance healthy -- patching, monitoring, scaling ClickHouse, debugging ingestion hiccups.

That's not a dealbreaker for every team. Some companies, especially in regulated industries like fintech, healthcare, and defense, need data to stay in their infrastructure for compliance reasons. For those teams, the ops overhead is a cost of doing business. But for a 40-person startup competing on product speed, 15 hours a month of analytics infrastructure maintenance is a real tradeoff.

PostHog Cloud, by contrast, runs in the US and EU, offers SOC 2 compliance, and handles all the infrastructure. For most teams, this is the right call. You get the same product, PostHog handles the ops, and you focus on building.

Data Ownership: What It Actually Means

"Own your data" is a phrase that gets thrown around a lot. Let me be specific about what it means in practice with PostHog.

If you self-host, you own your data in the most literal sense. Events, user properties, session recordings -- all of it lives in your infrastructure. You control access, retention, encryption, and deletion. You can query the underlying ClickHouse database directly. Nobody else can see it. If PostHog the company disappeared tomorrow, your instance would keep running (though you'd lose access to future updates and support).

If you use PostHog Cloud, you still have strong data control, but it's contractual rather than physical. PostHog processes your data on their infrastructure. You can export it via the API or data pipelines. They provide data deletion tools for GDPR compliance. But at the end of the day, your event data lives on PostHog's servers.

For Anya, this distinction mattered. Her compliance team wanted to know exactly where event data was stored and who had access. PostHog Cloud's EU hosting option and SOC 2 certification satisfied most of their requirements. But a few edge cases around financial transaction data required additional review.

The pragmatic answer: PostHog Cloud's data controls are sufficient for most companies, even in regulated industries. Self-hosting gives you absolute control but at a real operational cost. The decision usually comes down to whether your compliance requirements specifically mandate that analytics data never leave your infrastructure.

Community vs. Enterprise: What's Behind the Paywall

PostHog's free tier (they call it the "free allocation" on their Cloud product) gives you quite a lot:

1 million events per month
5,000 session recordings per month
Unlimited team members
All core analytics features (trends, funnels, retention, paths)
Basic feature flags
API access

For a startup with under 10,000 MAUs, this is genuinely enough to run a real analytics practice. I've seen teams operate on the free tier for 6-12 months before needing to pay anything.

When you hit the paid tiers, here's what you're paying for: higher volume limits, advanced experimentation (A/B testing with statistical significance calculations), group analytics (tracking company-level behavior, not just individual users), advanced feature flag targeting, priority support, and SSO/SAML authentication.

The pricing is usage-based and published on their website. Events are priced on a sliding scale -- the more you send, the cheaper per-event it gets. Session replay recordings are priced separately. This transparency is one of PostHog's genuine strengths. You can calculate your bill before you sign up. Try doing that with Amplitude's enterprise tier.

The gotcha that catches some teams: group analytics (the ability to track and analyze behavior at the company level, not just the individual user level) is a paid feature. For B2B SaaS companies where account-level analytics matter more than individual user analytics, this is table stakes. Anya's team hit this wall when they wanted to answer "which companies are most engaged?" instead of just "which users are most engaged?" That question requires group analytics, which requires a paid plan.

The Open-Source Ecosystem Around PostHog

One underappreciated benefit of PostHog being open source is the ecosystem of integrations and extensions that exist around it. Because the ingestion API is well-documented and the event schema is standard, you can build on top of PostHog in ways that proprietary analytics platforms don't easily support.

Custom data pipelines that transform events before they're stored. Reverse ETL workflows that sync PostHog data back to your CRM or marketing tools. AI agents that programmatically capture events, identify users, and create aliases through the API.

That last point is where Anya's team found the most value. They set up an event tracking agent that automated their event capture workflow -- ensuring consistent naming conventions, attaching the right properties to every event, and flagging when new user-facing features shipped without corresponding analytics instrumentation. The agent wrote events into PostHog through the same API that any integration would use. The fact that PostHog's API is open and well-documented made this straightforward.

This is the quieter benefit of open source that doesn't show up in feature comparison charts. Proprietary platforms give you their API on their terms. Open-source platforms give you the ability to understand exactly how the API works by reading the source code. When Anya's team hit an edge case with the identify call behavior, they could read the PostHog source to understand exactly how distinct ID merging worked. That saved them a week of trial-and-error debugging.

When Open Source Doesn't Matter

I want to push back on something, because I think the "open source" label sometimes carries more weight than it should in buying decisions.

For many teams, whether the analytics platform is open source is completely irrelevant to their daily work. They're not reading the source code. They're not contributing patches. They're not self-hosting. They're using PostHog Cloud through the same web UI they'd use with Amplitude or Mixpanel.

Open source matters when: you're in a regulated industry that requires on-premise deployment, you have specific compliance requirements around data residency, you want to audit exactly how your data is processed, you need to extend or modify the platform's behavior, or you're philosophically committed to open-source infrastructure.

Open source doesn't matter when: you're a 20-person startup that needs product analytics and will use whatever cloud-hosted tool gets the job done fastest. In that case, evaluate PostHog on its features, pricing, and usability -- not on its license.

Anya cared about open source because her compliance requirements demanded it. If your requirements don't, don't let "open source" be the deciding factor. Evaluate the tool on whether it solves your actual problem.

Where Agents Fit Into the Open-Source Stack

The piece that ties this together for teams like Anya's is the automation layer. PostHog gives you data collection and analysis. Open source gives you transparency and control. But neither gives you action.

An AI agent connected to PostHog's API can monitor event data for anomalies, identify users who match risk or opportunity profiles, and trigger downstream workflows -- all without anyone opening a dashboard. The open-source API makes this integration reliable because you can understand exactly what the API does, test against a local instance, and deploy with confidence.

For Anya, the stack ended up being: PostHog Cloud (EU region) for data collection and analysis, with an agent layer handling the continuous monitoring and alerting that her team didn't have bandwidth to do manually. The open-source nature of PostHog gave her confidence in the integration because her team could verify the API behavior at the source code level.

Not every team needs that level of verification. But for a fintech CTO who has to explain her vendor choices to a compliance board, "I can show you the source code for how our analytics processes data" is a conversation-ender in the best possible way.

Try These Agents

PostHog Event Tracking Setup -- Set up event tracking with consistent naming and automated instrumentation checks
PostHog User Identification Agent -- Identify users and link sessions for complete behavioral profiles
PostHog Product Usage Tracker -- Monitor feature usage patterns and surface insights automatically

PostHog Open Source: What You Get, What You Give Up, and Whether It Matters

PostHog Open Source: What You Get, What You Give Up, and Whether It Matters

What's Actually Open Source

The Self-Hosting Reality

Data Ownership: What It Actually Means

Community vs. Enterprise: What's Behind the Paywall

The Open-Source Ecosystem Around PostHog

When Open Source Doesn't Matter

Where Agents Fit Into the Open-Source Stack

Try These Agents

For people who think busywork is boring