Every year, companies invest millions into data warehouse projects — and a significant portion of them end up as expensive, hard-to-maintain systems that nobody trusts. We've seen it happen across retail, finance, logistics, and healthcare. The failure modes are almost always the same.
This post is a candid breakdown of what goes wrong, and a practical architecture checklist based on what we've seen work at scale.
The Three Most Common Failure Modes
1. Building for today's questions, not tomorrow's
Most data warehouses start with a specific reporting need. Finance wants a P&L dashboard. Marketing wants campaign attribution. Fair enough — you build exactly that. The problem? You encode all the business logic directly into the ETL jobs and SQL views. When requirements change (and they always do), you're untangling a spaghetti of logic that's tightly coupled to yesterday's business model.
The pattern we see: A star schema designed around "current" reporting needs, with no separation between raw data, transformed data, and serving layer. Every change becomes a high-risk migration.
2. No single source of truth
Teams start building their own data pipelines because the central warehouse is too slow to update or too painful to change. Within 18 months, you have six different "revenue" numbers depending on which system you query. Trust collapses. Analysts spend more time reconciling figures than generating insights.
3. Treating data quality as an afterthought
Pipelines are built fast. Data quality checks — if they exist at all — are tacked on at the end. When a source system changes its schema silently, nobody notices until a senior leader presents a wrong number in a board meeting.
"A data warehouse nobody trusts is worse than no data warehouse at all — because at least with none, people know they don't have the answer."
A Framework That Actually Works
We've refined this approach across a dozen warehouse builds and migrations. It isn't revolutionary — it draws heavily on the Medallion Architecture pattern popularised by Databricks — but the details in implementation matter enormously.
Layer 1: Raw / Bronze
Ingest everything, change nothing. Raw data lands here exactly as it came from the source — schema included. This is your audit trail. You can always replay from here. Never transform in place.
Layer 2: Cleaned / Silver
Apply standardisation, deduplication, and type casting here. No business logic. If your source sends dates as strings, parse them here. If there are known duplicates in the CRM export, deduplicate here. This layer is about structural consistency, not semantic meaning.
Layer 3: Business / Gold
Business logic lives here — and only here. Revenue recognition rules, customer segmentation definitions, KPI calculations. Managed with dbt. Version-controlled. Tested. Documented.
Key principle: Business definitions belong in version-controlled SQL models, not in BI tool calculated fields or ad hoc analyst scripts. If "active customer" means something specific to your business, that definition should live in one place and propagate everywhere.
The Operational Non-Negotiables
- Data contracts between source teams and data platform teams. Source systems should notify before schema changes. This sounds political — it is — but it's the difference between a managed migration and a production incident.
- Automated data quality testing on every pipeline run. dbt tests, Great Expectations, or even custom SQL assertions. Fail loudly when data doesn't meet expectations.
- Lineage tracking. When an executive asks "where does this number come from?", you should be able to answer in under five minutes, not five days.
- A data catalogue, even a lightweight one. Knowing what tables exist, what they mean, and who owns them is not optional at scale.
The Mindset Shift
The biggest unlock isn't technical — it's organisational. Data platforms fail when they're treated as IT projects rather than product investments. The warehouse needs a product owner. It needs to be maintained, evolved, and governed like any other critical piece of infrastructure.
Teams that ship reliable data do so because they've invested in the unsexy parts: testing, documentation, on-call processes, and stakeholder communication. Not just the cool transformation logic.
Where to Start
If you're inheriting a struggling data warehouse, resist the urge to rewrite everything. Instead:
- Map what actually exists — tables, pipelines, BI reports — and who relies on them.
- Identify the two or three metrics the business runs on. Make those the first things you stabilise and test.
- Introduce the layered architecture incrementally, starting with new data sources rather than migrating old ones.
- Get one business stakeholder who champions the platform. Progress follows trust.
If any of this resonates with what you're dealing with, we're happy to talk through your specific situation — no pitch, just a conversation.