5 dbt Implementation Mistakes That Kill Data Trust

5 dbt Implementation Mistakes That Kill Data Trust

Table of Contents

dbt changed the game for analytics engineering. But like any powerful tool, it can create as many problems as it solves — especially when the implementation is rushed, the team is learning on the job, or the project was scoped by someone who never had to explain the numbers to a CFO.

Here are the five mistakes I see most often when companies implement dbt, and what to do instead.

1. No Testing Strategy

This is the most common and most damaging mistake. Teams build dozens of models but write zero tests. Then they wonder why stakeholders do not trust the numbers.

What happens in practice: A source schema changes upstream — maybe Salesforce adds a picklist value, maybe the billing system starts sending null timestamps on a new plan type. A join condition breaks silently. Your revenue dashboard shows a number that is 15% off for two weeks before anyone catches it. By then your CFO has already used the wrong number in a board deck, and now your team is the one that “can’t get the data right.”

I have seen this exact scenario play out at three different companies in the last year. In every case, two not_null tests and one accepted_values test would have flagged the problem on the same day.

What to do instead: Start with three tests on every model: not_null on your primary key, unique on your primary key, and at least one accepted_values or relationships test that validates business logic. That is the minimum. You can get more sophisticated later — custom schema tests, freshness checks, row-count anomaly detection. But shipping untested models is shipping broken trust.

One practical rule that works well: no model gets promoted to a mart layer until it has at least five tests. That single gate catches most of the damage before it reaches a dashboard. If you want a structured way to evaluate your current testing coverage, the dbt Project Health Scorecard walks through each layer.

2. Building for Flexibility Instead of Specificity

Teams design overly generic data models because “we might need it later.” The result is a warehouse full of tables nobody understands and nobody uses.

What happens in practice: You build a model called dim_entity instead of dim_customer because “we might want to track partners too someday.” Six months later, nobody remembers what dim_entity is, and every query against it requires a Slack thread to understand. The original developer left two months ago. The team that inherited it builds a parallel customers_v2 table in a personal schema because figuring out dim_entity was going to take longer than just writing the SQL from scratch.

Now you have two customer tables, neither fully tested, and the marketing team does not know which one to use.

What to do instead: Build for the questions people are actually asking this quarter. If the business needs a customer table, build dim_customer. If you later need a partner table, build dim_partner. Two clear, specific models beat one flexible, confusing one every time.

This is the 80/20 rule applied to data modeling: solve the problem in front of you. The cost of building a second specific model later is almost always lower than the cost of maintaining one generic model that nobody trusts now.

A useful forcing function: before you create a new model, write down the three business questions it is supposed to answer. If you cannot name three real questions that real people are waiting on, the model probably does not need to exist yet.

3. No Documentation That Humans Actually Read

dbt has excellent built-in documentation support. Most teams either do not use it or write documentation that is technically accurate but practically useless.

What happens in practice: Your schema.yml describes total_revenue as “The total revenue.” Technically correct. Completely unhelpful. When a new analyst joins, they still have to ask five questions in Slack to understand what is included and excluded. When finance asks why your revenue number is $40K different from theirs, nobody can explain it without digging through SQL.

The gap is almost always between what the data team considers documented and what a business user considers explained. Column names and data types are not documentation. They are metadata.

What to do instead: Document the business context, not just the technical definition. A useful template for any metric:

  • What it includes
  • What it specifically excludes
  • Which report or dashboard it should match
  • Who owns the definition if there is a dispute

For example: “Total revenue: Includes all closed-won deals in the current fiscal year. Excludes professional services, refunds, and deals under $1,000. Matches the number reported in the monthly board deck. Definition owned by RevOps.”

That is documentation someone can actually use without asking in Slack. If this resonates because your team has the same metric three different ways, the metric definition governance playbook lays out a repeatable process for fixing it.

One more thing: keep your docs close to the models. If a definition lives in a Confluence page that was last updated eight months ago, it is not documentation — it is a fossil. dbt’s schema.yml stays next to the code. Use it.

4. Ignoring the Source Layer

Teams jump straight to building marts and dashboards without properly staging and cleaning their source data. Then every downstream model inherits the same data quality problems.

What happens in practice: Your CRM has duplicate contacts (the same person entered by two different sales reps with slightly different email addresses). Your billing system has test transactions from QA that were never cleaned out. Your marketing platform counts internal team clicks as real engagement. All of this garbage flows into your models, and your “clean” dashboard is built on dirty data.

The worst version of this I have seen: a company spent six weeks building a sophisticated attribution model in dbt, only to discover that 8% of their “leads” were internal test accounts that had been flowing through the pipeline for over a year. The attribution math was technically correct — it was just attributing revenue to events that never happened.

What to do instead: Invest in a proper staging layer. Clean, deduplicate, and validate your sources before anything else touches them. This is the least glamorous part of a dbt project, and it is the most important.

A good staging layer does three things:

  1. Renames and types columns consistently so downstream code does not depend on whatever naming convention each SaaS vendor chose
  2. Filters out known garbage — test accounts, internal traffic, duplicate records, transactions with impossible dates
  3. Adds basic validation markers so you can tell which records passed quality checks and which were kept but flagged

If your staging models are just select * from {{ source(...) }} with a column rename, they are not doing enough.

5. Treating It as a Technical Project Instead of a Business One

This is the meta-mistake that enables all the others. Someone says “we need dbt,” the data team implements it in isolation, and three months later you have a technically sound warehouse that nobody outside the data team uses.

What happens in practice: The data team builds exactly what they think the business needs. They model every entity, build elegant DAGs, follow every dbt best practice. The business, meanwhile, has different questions than the data team anticipated. The dashboards go unused. Leadership asks why they invested in “data infrastructure” with no visible ROI. The data team feels underappreciated. The business feels underserved. Both are right.

I have watched this play out enough times to know the root cause is almost never technical. It is a scoping problem. The project was defined as “implement dbt” instead of “answer this specific business question.” (This is the same gap described in why your data team and your marketing team don’t speak the same language — the translation problem is upstream of the tooling.)

What to do instead: Start every dbt project with a business outcome. Not “implement dbt” — that is a technology goal. “Enable the marketing team to see channel-level ROAS by next quarter” — that is a business outcome. Build toward that specific outcome, validate with the stakeholders who need it, and expand from there.

A practical way to enforce this: require every new mart model to have a named stakeholder who has agreed to validate the output within two weeks of deployment. If nobody is waiting for the answer, the model can wait.

Quick Diagnostic: Where Is Your dbt Project Stuck?

If you already have a dbt implementation and something feels off, this table can help you narrow down where the trust is breaking:

SymptomLikely root causeFirst fix
Stakeholders do not trust dashboard numbersNo testing strategy — silent failures go undetectedAdd not_null, unique, and one business-logic test to every model in your critical path
New analysts cannot onboard without extensive Slack helpDocumentation describes columns, not business contextRewrite docs for your top 10 metrics using the includes/excludes/matches template
Multiple versions of the same table exist across schemasGeneric models that nobody fully understandsConsolidate into specific, well-named models scoped to real business entities
Dashboards show numbers that do not match financeDirty source data flowing through without stagingBuild or strengthen your staging layer with deduplication and known-garbage filters
Data team ships work that nobody usesProject scoped to technology goals instead of business outcomesAttach every mart model to a named stakeholder and a specific decision it supports

The Pattern

Notice the thread? Every mistake comes from the same root cause: prioritizing technical elegance over business usefulness.

dbt is a tool. Its value is measured not by how clean your DAG looks or how many models you have, but by whether it changed how your organization makes decisions. A project with 20 well-tested, well-documented models that answer real questions is worth more than one with 200 models that nobody outside the data team touches.

If your data team has implemented dbt but your stakeholders still do not trust the numbers, the problem is not dbt — it is the implementation. A data audit can help you find the gaps.

If you are about to start a dbt implementation and want to get it right the first time, let’s talk about your data foundation.

Download the worksheet

Use this worksheet to sort the next remediation conversation into the buckets that usually get muddled together: testing gaps, unclear model scope, stale documentation, source cleanliness, and stakeholder-fit.

Download the dbt trust triage worksheet (PDF)

A lightweight worksheet for naming the trust break, scoring the current dbt project, and deciding whether the next move is testing, source cleanup, model scoping, documentation, or stakeholder alignment work.

Download the PDF

Instant download. No email required.

Want future posts like this in your inbox?

This form signs you up for the newsletter. It does not unlock the download above.


Go deeper: This post covers the most common pitfalls. For a full guide to building a modern data foundation — from architecture to dbt to governance — read The Complete Guide to Building a Modern Data Foundation. If you want to assess your existing dbt project’s health, the dbt Project Health Scorecard gives you a structured way to find the weak spots.

Download the dbt trust triage worksheet (PDF)

A practical worksheet for separating testing gaps, documentation debt, source-layer issues, and stakeholder-fit problems before your next dbt remediation meeting.

Download

When the ask is still fuzzy

Translate the Ask

If your team keeps getting handed ambiguous requests for analytics, fix the translation problem before you fix the dbt project.

See the translation sprint

Need foundation help?

Data Foundation

See the broader engagement for dbt, warehouse architecture, governance, and testing.

See Data Foundation

Common dbt Implementation Questions

What is the most common dbt implementation mistake?

No testing strategy. Teams build dozens of models with zero tests, then wonder why stakeholders don’t trust the numbers. Start with not_null, unique, and at least one business-logic test on every model.

How do I know if my dbt implementation has a trust problem?

If stakeholders still distrust your dashboards or make decisions on gut feel despite having a warehouse, the implementation probably prioritized technical elegance over business usefulness. A data audit can identify where the gaps are.

Should I build generic or specific data models in dbt?

Specific. Build dim_customer if you need a customer table — not dim_entity because you might track partners later. Generic models create tables nobody understands and nobody uses.

Why is the dbt staging layer important?

The staging layer is where you clean, deduplicate, and validate source data. Without it, dirty data from your CRM, billing system, and marketing platform flows into every model and every dashboard downstream.

How should I scope a dbt implementation project?

Start with a specific business outcome, not a technology goal. Anchor the project to one question a real stakeholder needs answered, build toward that, validate it, then expand from there.

How long should a dbt implementation take for a mid-size SaaS company?

A focused first phase — one business question, the staging layer it depends on, tested models, and real documentation — can ship in 4 to 6 weeks. The mistake is trying to model everything in month one. Start narrow, validate with stakeholders, then expand.

What is the difference between a dbt project that builds trust and one that does not?

Trust comes from three things: tests that catch problems before stakeholders do, documentation that answers business questions without a Slack thread, and models scoped to decisions people are actually trying to make. A project can be technically clean and still fail all three.
Jason B. Hart

About the author

Jason B. Hart

Founder & Principal Consultant

Helps mid-size SaaS and ecommerce teams turn messy marketing and revenue data into decisions leaders trust.

Get posts like this in your inbox

Subscribe for practical analytics insights — no spam, unsubscribe anytime.

Book a Discovery Call