Data Warehouse Best Practices: What Still Earns Its Place in 2026
Data warehouse best practices in 2026: which Kimball-era rules still hold, what AI has changed, and the AUD cost discipline behind a warehouse that lasts.
Updated May 2026. Rewritten to reflect what has actually changed in data warehouse best practices since the last cloud platform consolidation: ELT-first pipelines, dbt-as-the-default, cost governance as a first-class concern, and where the old Kimball rules still hold.
Data warehouse best practices used to be a fixed list. Dimensional model, conformed dimensions, careful indexing, nightly batch. The list still mostly holds. The cloud has changed enough underneath it that some of the old rules now make things worse, and some new rules have become non-negotiable.
We are an AI and automation consultancy. We have built and rescued warehouses on Snowflake, BigQuery, Redshift, and Databricks for clients in healthcare, recruitment, finance, and professional services. The seven practices below are the ones that survive a real audit on the way to production. The rest is folklore.
This piece walks through what has changed in data warehouse best practices for 2026, what we have actively dropped from older guides, where AUD numbers usually land, and how to know when a warehouse is the wrong tool. For wider data tooling context see our piece on data integration solutions and the practitioner view in data validation techniques.
Data Warehouse Best Practices: The 2026 Landscape
The platforms have consolidated. Snowflake, BigQuery, Databricks, and Redshift cover the vast majority of mid-market and enterprise workloads. The on-premises Oracle and Teradata implementations we used to scope have largely migrated, and the ones that have not are usually on the way. dbt has won the transformation layer. Fivetran and Airbyte have won most of the ingestion. The architectural questions are now narrower and more interesting.
What has changed in data warehouse best practices is not the modelling theory. Kimball still works. What has changed is that storage and compute are now decoupled and metered. A bad query costs money in real time. A bad partitioning decision shows up on a bill. The discipline has shifted from “make it fit the disk” to “make it cheap to ask questions of.”
The other shift is AI workloads sitting alongside BI workloads. The same warehouse that powers the executive dashboards now powers a vector store for retrieval augmented generation and a feature store for production models. The data warehouse best practices that work in 2026 accommodate both without becoming two warehouses badly stitched together.
Practice 1: Dimensional Modelling Still Wins for Analytics
The temptation in 2026 is to dump raw data into the warehouse and run analytics directly on it. SQL is powerful enough now, the argument goes, that the modelling step is overhead. This is wrong, and the rescue projects prove it. The teams that skip dimensional modelling end up with dashboards that take 40 seconds to load, queries nobody can read, and a finance director who cannot answer “what was revenue by region last quarter” without three Slack messages.
Dimensional modelling earns its place because the questions a business asks have a shape, and the shape matches a star schema. Fact tables hold the measures, dimension tables hold the context, and the joins are predictable. Across the seven warehouses we have built or rescued in the last two years, the dashboard p95 query time on a properly modelled star is 1.5 to 4 seconds. On an unmodelled raw schema it is 12 to 60 seconds.
The practical update is that dbt has made dimensional modelling cheaper to build and easier to maintain. Source models, staging models, intermediate models, marts. The naming is conventional now and a new analyst can read a project on first sight. Tests run on every model. Documentation generates automatically. The Kimball book still belongs on the shelf. The tooling around it is much better than it was.
Practice 2: ELT Over ETL (and What dbt Got Right)
The ETL versus ELT debate is settled. Extract, load, then transform inside the warehouse, using its compute. The reason is not ideological. It is that warehouse compute is now cheaper per row than the equivalent transformation in a separate Python or Spark cluster, and the dependency graph is in one place. dbt has crystallised this into a workflow: ref() between models, tests on every join, incremental materialisation where it matters.
The data warehouse best practice that follows is to push transformation logic into the warehouse and out of bespoke pipeline code. We rescued a Sydney recruitment client whose pipeline was 18 separate Python scripts feeding into BigQuery. The migration to dbt took six weeks. Build time on the full pipeline dropped from 90 minutes to 11 minutes because of incremental materialisation. The on-call burden dropped to near zero because dbt’s testing caught most regressions in CI.
Two cautions. First, dbt is not a free lunch. It pushes complexity into SQL, which means SQL has to be readable. We enforce dbt-style guides via SQLFluff in CI. Second, dbt’s incremental materialisation has sharp edges. Decide your unique key carefully, decide your is_incremental() filter carefully, and back it with a full-refresh schedule that runs weekly. The dbt docs are good. Read the incremental models guide before you push anything to production.
Practice 3: Data Quality at the Boundary, Not the Dashboard
The most expensive data quality bug we have seen cost a client around $310,000 AUD over four months. A boolean field in their source CRM started persisting NULL instead of false. The downstream funnel report quietly under-reported qualified leads by 8 percent. The board got the numbers, planned headcount against them, and discovered the gap when the sales team’s actual pipeline reconciled differently. The fix was a dbt test that should have been there from day one.
The data warehouse best practice is to apply data quality tests at the boundary between layers. Source models test for nulls in known-required fields, unique constraints, accepted values, referential integrity. Staging models test for the business rules that have to hold post-cleanup. Marts test for the executive-facing invariants. Our piece on data validation techniques covers the four-layer model in depth.
The cheap fix that catches most issues is enabling dbt’s store_failures on every test. Failures persist to an audit table you can query. Combine that with an Elementary or re_data deployment for freshness and volume anomaly detection. Most data quality bugs are not “this column is wrong.” They are “this load did not run” or “this load loaded a fraction of what it usually does.” Catch those.
Practice 4: Slowly Changing Dimensions Done Properly
Slowly changing dimensions remain a non-optional data warehouse best practice for any dimension where the business asks “what did this look like at the time.” Customer tier, sales territory assignment, product category, employee cost centre. These all change. Pretending they do not changes historical reporting silently every time someone updates a record.
What has changed is that SCD Type 2 is now trivially easy in dbt thanks to the dbt_utils and dbt-snapshot macros. Snapshot a source table, get effective_from and effective_to columns for free, join in the fact table with the appropriate timestamp predicate. The implementation effort dropped from days to hours.
The judgement is which attributes need history. Default is “history off, overwrite.” Turn it on selectively for the attributes the business genuinely asks “what was this at the time” about. A construction client we work with snapshots project assignment and cost code on every job. They do not snapshot job title. The choice was deliberate, documented, and revisited every six months.
Practice 5: Partitioning That Matches the Query, Not the Calendar
Partitioning is one of the data warehouse best practices most often done by reflex. Date column, daily partition, done. This is correct for the largest tables most of the time. It is wrong often enough to be worth thinking about.
The rule is: partition on the column most frequently in the WHERE clause of expensive queries. For event tables that is usually event date. For multi-tenant SaaS that is often tenant_id, with date as a secondary. For BigQuery, cluster on the columns most frequently in the GROUP BY. The single biggest BigQuery cost win we have shipped was reclustering a 380GB table by customer_id and date, which dropped slot time on the daily aggregates by 73 percent. The bill went from around $1,800 AUD per month to $480 AUD.
Two production gotchas. First, do not over-partition. Tables with more than around 4,000 partitions on Snowflake or BigQuery have measurable metadata overhead. Aim for partitions of at least 1GB. Second, query review needs to be a habit. We run a weekly cost report on the top 50 queries by spend. Re-partitioning or rewriting one or two queries per month keeps the bill flat as data grows.
Practice 6: Cost Governance Is a Data Warehouse Best Practice Now
In 2024 we used to put cost monitoring on the “nice to have” list. In 2026 it sits in the top five data warehouse best practices. The reason is straightforward: a single bad query on a large table can produce a five-figure AUD bill before anyone notices. We have seen it twice. Both times the offending query was a junior analyst joining without a date filter.
The practical setup we deploy:
- Resource monitors on Snowflake or query quotas on BigQuery with hard caps per user and per warehouse.
- Per-user query cost dashboards visible to the team and to finance.
- An alert that fires when a single query exceeds a threshold (we typically set this at $50 AUD).
- Automatic query timeout at the warehouse level (10 to 30 minutes for analytics, longer for ELT runs).
- Monthly cost review where the top 20 queries by spend get audited.
The cultural piece is harder. Analysts have to be comfortable being told a query is too expensive and rewriting it. The CFO has to be comfortable seeing the data team’s spend grow when usage grows, because the warehouse is doing useful work. Both these conversations are easier when the cost data is in everyone’s face.
Practice 7: Governance and Access Without the Bureaucracy
Data governance is the data warehouse best practice most often weaponised into pointless ceremony. The committee meets, the spreadsheet of “data owners” gets compiled, and nothing changes operationally. The version that actually works is small and pragmatic.
Access in groups, not individuals. We use role-based access control in Snowflake or IAM groups in BigQuery, mapped to job roles. New analyst, add to “analytics_reader.” Promoted to senior, add to “analytics_writer.” Leaving, removed from all roles by SSO deprovisioning. The audit log captures every grant. Quarterly access review takes 30 minutes for a 40-person org.
Column-level security for the PII columns that genuinely need it. Most don’t. Mask email addresses for the BI seats. Restrict the raw HR table to HR roles. Use dynamic data masking rather than separate copies of tables, which always drift. For regulated workloads (APP, APRA CPS 230, My Health Records), document the policy. The auditor wants the document, not a clever architecture diagram. The OAIC’s APP guidance is the canonical reference for Australian workloads.
What We Have Removed From Our List Since 2024
Three data warehouse best practices we used to recommend have aged out of relevance for most clients.
Master data management as a separate platform. The dedicated MDM tooling we used to scope (Informatica, Reltio, Profisee) is overkill for almost all mid-market clients. dbt models on top of a well-modelled set of source dimensions does the same job for one-tenth the cost. The exception is enterprises with 20+ source systems and active mergers. There the MDM platform still earns its place.
Aggressive indexing strategies. Cloud warehouses do not have traditional indexes in the same sense. Clustering keys and partitioning have replaced the index discipline of the on-premises era. Time spent agonising over index design is now time better spent on partitioning and clustering.
Nightly batch as the default rhythm. Most warehouses we ship now run incrementally on a 15-to-60-minute cadence for the high-value pipelines. Nightly batch is still fine for slow-moving dimensions and end-of-period reporting. It is no longer the default everywhere.
AUD Cost Ranges for a Modern Warehouse
Pricing varies wildly with data volume and query patterns. Honest ranges from current Australian clients:
- Small startup (under 100GB, light query load): $400 to $1,500 AUD per month across Snowflake or BigQuery, dbt Cloud or self-hosted, and Fivetran or Airbyte.
- Mid-market (1 to 10TB, daily executive dashboards plus analyst usage): $3,500 to $12,000 AUD per month for the full stack.
- Enterprise (10TB+, multi-tenant, real-time pipelines): $20,000 AUD per month and up.
- Build cost for a greenfield warehouse with conformed dimensions, dbt, ingestion, and 30 to 60 models: $60,000 to $250,000 AUD depending on source system count and complexity.
The line that surprises clients is Fivetran. On a high-volume CDC feed it can easily outpace the warehouse cost itself. Airbyte self-hosted or custom ingestion become attractive above around $4,000 AUD per month of Fivetran spend.
When a Data Warehouse Is the Wrong Tool
Honest section. Three cases where we have told prospects not to build a warehouse.
Tiny scale. A 12-person consultancy with one CRM, one accounting tool, and one project tracker does not need a warehouse. Native dashboards or a Metabase pointed at read replicas does the job for a fraction of the cost.
Real-time decisioning. If the use case is sub-second decisioning (fraud detection on transactions, recommendation systems serving traffic), a warehouse is the wrong tool. You want a feature store and a serving layer. The warehouse can be where the offline training data lives, but production reads do not go there.
Unstructured-only workloads. If 90 percent of the data is documents, audio, or video for AI processing, the warehouse plays a smaller role. A document store, a vector database, and an object store carry most of the weight. The warehouse stores the metadata and the structured outputs.
Frequently Asked Questions
What are the most important data warehouse best practices in 2026?
Seven practices anchor a modern warehouse: dimensional modelling, ELT with dbt, data quality at the boundary, slowly changing dimensions, query-aware partitioning, cost governance, and proportionate access controls. The order matters. Modelling and ELT are foundational. Cost governance was optional in 2024 and is non-negotiable in 2026.
What is the difference between ETL and ELT in a data warehouse?
ETL transforms data before loading into the warehouse, typically in a separate compute layer. ELT loads first and transforms inside the warehouse using its compute. ELT has won for cloud warehouses because warehouse compute is cheap, scalable, and keeps the dependency graph in one place. dbt has become the standard tool for the transformation step.
How much does a cloud data warehouse cost in AUD?
A small startup warehouse runs $400 to $1,500 AUD per month, mid-market $3,500 to $12,000 AUD, and enterprise $20,000 AUD and up. Build cost for a properly modelled greenfield warehouse with 30 to 60 dbt models, ingestion, and conformed dimensions ranges from $60,000 to $250,000 AUD. Fivetran ingestion can be a surprise cost line on high-volume CDC feeds.
Snowflake, BigQuery, Redshift, or Databricks?
All four work. Snowflake wins for cross-cloud and concurrent BI workloads. BigQuery wins for organisations already deep in Google Cloud and for serverless cost simplicity. Redshift wins where the rest of the AWS bill is significant. Databricks wins for warehouses with heavy notebook and ML workloads alongside BI. The decision rarely comes down to features any more. It comes down to existing cloud commitments and team skills.
Do I still need a data warehouse if my main workload is AI?
Probably yes, but it plays a smaller role. The warehouse becomes the source of truth for structured business data and the offline training data for models. Vector storage for retrieval moves to a dedicated store. Feature serving moves to a feature store or low-latency cache. The warehouse and the AI stack live alongside each other, not one inside the other.
What is dimensional modelling and is it still needed?
Dimensional modelling organises data into fact tables (measures) and dimension tables (context) in a star schema. It is still the right starting point for analytics in 2026 because business questions have predictable shapes that match the model. Skipping it produces dashboards that are slow, queries that are hard to read, and ongoing arguments about metric definitions.
How do I keep data warehouse costs down?
Five habits: partition large tables on the column most frequently in WHERE clauses, set resource monitors with hard caps per user and warehouse, alert on single-query cost over a threshold, run a weekly review of the top 50 queries by spend, and revisit clustering when bills grow faster than data volume. The single biggest cost optimisation we have shipped was reclustering one BigQuery table, which dropped a monthly bill by 73 percent.
How do Australian privacy rules apply to a data warehouse?
The Australian Privacy Principles apply if your warehouse stores personal information about Australians. APP 11 requires reasonable security steps. For regulated industries, APRA CPS 230 and the My Health Records Act add specific obligations on data residency and access auditing. Practical implications: pick the ap-southeast-2 region (or equivalent) for storage, log all access, and document the data classification and retention policies. The OAIC’s published guidance is the reference.
If you are scoping a new warehouse, rescuing one that has drifted, or trying to get a handle on the bill, get in touch. We will look at the schema, the bill, and the team and tell you which of these data warehouse best practices to apply first.
Jump to a section
Ready to streamline your operations?
Get in touch for a free consultation to see how we can streamline your operations and increase your productivity.