The dbt Project Health Scorecard

Table of Contents

What Is a dbt Project Health Scorecard?

A dbt project health scorecard is a practical way to measure whether your dbt project is organized and governed well enough to support trusted reporting, faster analytics delivery, and fewer avoidable data incidents.

A lot of teams think they have a healthy dbt project because:

the repo exists
the DAG looks impressive in screenshots
the warehouse is running
somebody can still get the dashboard out on time

That is not the same thing as health.

A dbt project can be active and still be fragile.

It can compile cleanly while the important models are under-tested. It can have dozens of marts while nobody can explain which one leadership should trust. It can look modern while one source-table change is still enough to break a revenue number quietly.

That is why this scorecard matters.

It gives heads of data, analytics engineering leads, and overstretched operators a way to separate “we use dbt” from “our dbt project actually deserves trust.”

Why This Scorecard Exists

This is not just an internal hygiene topic for data teams.

The broader market is moving in the same direction. In dbt Labs’ 2025 State of Analytics Engineering report, poor data quality remained the #1 hurdle for 56% of respondents, and 38% said they planned to increase investment in data quality and observability. Their 2024 report also found that increasing data trust had emerged as the top focus area for many data organizations.

That combination is worth paying attention to.

Teams are not investing in data quality because it sounds mature. They are investing because brittle models, weak tests, and silent source failures are still the fastest way to turn a modern analytics stack into an expensive trust problem.

dbt usually enters the stack for a good reason.

The company outgrows spreadsheet logic. The reporting layer gets too important to leave inside dashboards. Metric definitions start drifting across teams. The warehouse exists, but the transformation layer still feels improvised.

So the team adopts dbt.

That is often the right move.

But a lot of dbt projects plateau in an awkward middle state:

better than the old ad hoc SQL mess
not stable enough to support high-stakes reporting cleanly
too central to ignore
too under-governed to trust fully

That is the moment this scorecard is built for.

How to Use the Scorecard

Score each of the seven categories from 1 to 5.

1 = fragile
2 = reactive
3 = usable but risky
4 = strong
5 = disciplined and repeatable

You are not grading elegance. You are grading whether the project helps the business make decisions without leaning on hidden heroics.

Score Bands: What the Total Usually Means

With seven categories, the total score ranges from 7 to 35.

Total score	What it usually means	Practical interpretation
7-14	Fragile prototype	dbt exists, but the project is still too dependent on one person, one mental model, or one calm week
15-20	Working but reactive	the team can ship, but trust and maintainability break down under pressure
21-27	Operationally useful	the project is good enough for real use, but there are still clear weak points to tighten
28-35	Trustworthy operating asset	dbt is no longer just a tool in the stack; it is a maintained business system

A useful rule of thumb:

If your total is below 21, the project probably feels healthier to the data team than it looks to the business.

The 7 Dimensions That Actually Matter

1. Model Organization and Naming

This is the first thing I look at because messy structure usually predicts messy thinking.

A healthy project has:

a clear staging/intermediate/marts pattern or another equally explicit structure
model names that tell a human what the table is for
folders that reflect real business logic rather than one person’s memory
marts that are aligned to decisions, not just tool outputs

A weak project usually has:

business logic spread across oddly named models
duplicated logic in multiple marts
folders organized by who happened to build them
“final_v2” style artifacts still in active use

How to score model organization

Score	What it looks like
1	models are hard to navigate, naming is inconsistent, and important logic is duplicated or hidden
2	some structure exists, but conventions are weak and the project still depends on tribal knowledge
3	the basic layering works, but a few marts or subject areas are already drifting
4	structure is clear, consistent, and understandable to a new team member
5	the project structure reflects business domains cleanly and is easy to extend without confusion

2. Test Coverage

This is usually the category that determines whether the project feels trustworthy or performative.

You do not need every possible test. You do need enough testing that bad data stops being a stakeholder surprise.

At minimum, serious projects should have:

primary key uniqueness where it matters
not_null coverage on important fields
relationship or accepted-values tests on business-critical models
a habit of adding tests when incidents reveal blind spots

What good test coverage is really protecting against

It protects against:

schema drift upstream
duplicate joins that inflate revenue or user counts
status values that change quietly
executives making decisions on stale or broken marts

How to score test coverage

Score	What it looks like
1	almost no meaningful tests exist, or tests only cover cosmetic cases
2	some generic tests exist, but important marts still have obvious blind spots
3	key models have baseline coverage, but business-logic testing is inconsistent
4	important marts and sources are tested well enough to catch most damaging failures
5	testing is treated as a core operating discipline and expands when the business risk expands

3. Documentation Completeness

dbt docs are useful only if the team writes documentation another human can actually use.

“Total revenue: total revenue” is not documentation.

Good documentation explains:

what the model is for
what the metric includes and excludes
which team cares about it
where caveats still exist
who owns changes when the definition moves

Why this category matters more than most teams think

Weak documentation does not just slow onboarding. It also makes definition drift harder to catch, because nobody can tell whether a change is a bug, a business update, or a silent rewrite of the rules.

How to score documentation

Score	What it looks like
1	descriptions are missing, vague, or only technically correct
2	some models are documented, but critical metrics still require Slack archaeology
3	important models have usable descriptions, though consistency is uneven
4	documentation gives both technical and business context for the models that matter
5	docs are part of the team’s operating rhythm and clearly support onboarding, trust, and change management

4. Source Freshness and Observability

A lot of dbt projects look healthy until a source changes quietly.

That is why this category matters.

You need to know:

when a source stopped landing
when row volume changed abnormally
when freshness drift is putting dashboards at risk
when upstream schema changes are about to cascade through the DAG

If nobody notices those problems until a stakeholder asks why yesterday’s numbers vanished, the project is not healthy.

How to score freshness and observability

Score	What it looks like
1	freshness is mostly assumed, and incidents are discovered by end users
2	a few alerts or manual checks exist, but they do not cover the critical path
3	important sources are monitored, though coverage and escalation are still inconsistent
4	the team has reliable freshness checks and catches most upstream issues early
5	source monitoring is systematic, actionable, and tied to clear ownership and response patterns

5. DAG Depth and Ref Chain Discipline

A complex DAG is not automatically a problem.

An unnecessary one is.

When important marts depend on too many brittle layers, the project becomes hard to reason about and slow to change. Every request turns into a lineage archaeology exercise.

A healthy project does not optimize for the prettiest graph. It optimizes for understandable dependencies.

Signs the ref chain has become a liability

teams are afraid to change shared models because the blast radius is unclear
key marts rely on long chains of transformations that few people understand end to end
“temporary” intermediate models have become permanent architecture
debugging one bad number means opening twelve files and three mental models

How to score DAG depth and dependency discipline

Score	What it looks like
1	lineage is tangled, dependency chains are excessive, and nobody is confident about blast radius
2	some domains are manageable, but the important paths are already too deep or brittle
3	the DAG is workable, though several areas need simplification before they become chronic drag
4	dependency chains are intentional and understandable for critical business models
5	the DAG is lean, explicit, and supports fast iteration without hidden complexity tax

6. Materialization and Cost Strategy

This category exposes whether the team is actually operating the project or just accumulating models.

Materialization strategy is not only a performance question. It is also a clarity question.

The team should be able to explain:

why a model is incremental, view, or table
where rebuild cost matters
where freshness expectations justify heavier storage
which marts are expensive because they are valuable versus expensive because nobody revisited the design

How to score materialization discipline

Score	What it looks like
1	materializations feel accidental and cost/performance trade-offs are mostly unknown
2	a few patterns exist, but many models still use defaults without clear reasoning
3	the team has basic discipline, though some high-cost or slow models need redesign
4	materialization choices are mostly intentional and aligned to usage, cost, and freshness needs
5	cost, performance, and reliability trade-offs are reviewed deliberately as part of project maintenance

7. CI/CD and Change Management

This category usually separates “dbt repo” from “reliable dbt operating system.”

A healthy project has a sane path from change to production. That usually includes some mix of:

pull request review
environment separation
build or test checks before merge
a release process the team can explain
incident handling when a bad change still gets through

The exact tooling can vary. The discipline cannot.

How to score CI/CD and delivery discipline

Score	What it looks like
1	changes go live informally with little review or pre-merge validation
2	there is some review, but release discipline depends heavily on the person shipping
3	pull requests and basic checks exist, though production safety still feels uneven
4	dbt changes move through a clear, reviewable workflow with dependable validation
5	the team has a stable delivery system that supports iteration without making production trust fragile

A One-Page Scoring Table

Use this as the fast version.

Category	Your score (1-5)	Biggest risk right now	Owner
Model organization
Test coverage
Documentation completeness
Source freshness and observability
DAG depth and ref-chain discipline
Materialization and cost strategy
CI/CD and change management

Which Low Scores Matter Most First?

Not every weak score is equally dangerous.

If you need a prioritization rule, start here:

Test coverage and source observability first, because silent failure destroys trust fastest
CI/CD discipline next, because weak shipping process keeps reintroducing avoidable risk
Documentation and model structure after that, because they determine whether the project can scale beyond one heroic operator
DAG depth and materialization discipline once the project is stable enough to refactor intentionally

That order is not universal. But it is usually a more useful sequence than “let’s clean up the whole repo.”

Red Flags That Usually Mean the Project Is Less Healthy Than the Team Thinks

If two or more of these are true, the scorecard is probably telling you something important:

leadership trusts the dashboard only after a verbal caveat from the data team
a source change has broken production metrics more than once this quarter
nobody can explain the highest-stakes marts without opening multiple files live
new team members struggle to tell which model is canonical
the repo has tests, but the important business rules are still mostly untested
costs or runtimes are rising, but nobody owns the trade-offs
the project keeps absorbing unclear business requests faster than it clarifies them

That last point matters more than most technical teams want to admit.

A lot of dbt health problems are upstream translation problems wearing a tooling costume. If the business keeps changing the question, the dbt project often ends up encoding confusion instead of resolving it. That is exactly the moment to start with Translate the Ask before throwing more refactoring hours at the symptom.

What a 30-Day Repair Plan Usually Looks Like

If your project scores in the fragile or reactive range, do not try to fix everything at once.

A practical 30-day sequence:

Week 1: identify the critical path

Pick the models tied to revenue, pipeline, lifecycle reporting, or other executive metrics. Score those first. Ignore low-stakes cleanup until the trust path is visible.

Week 2: patch the trust leaks

Add or improve tests on the most decision-critical models. Make sure source freshness checks exist where broken upstream data would create leadership confusion.

Week 3: clarify the models humans actually rely on

Improve descriptions, owner fields, naming consistency, and the relationship between staging, intermediate, and mart layers.

Week 4: tighten the release path

Make sure dbt changes pass through a reviewable delivery flow with enough validation that production risk stops feeling casual.

That will not make the project perfect. It will make it less fragile. And that is usually the right next goal.

Download the Worksheet

Use the worksheet with your analytics lead, analytics engineer, or head of data.

You do not need an elaborate maturity model. You need an honest score, a clear owner, and a short list of fixes the business will actually feel.

Download the dbt Project Health Scorecard Worksheet (PDF)

A text-first worksheet for grading your dbt project across structure, testing, documentation, freshness monitoring, DAG depth, materialization strategy, and CI/CD discipline.

Or download the PDF directly.

If the score says the project is under-governed because the requirements are still moving, start with Translate the Ask. If the score says the whole warehouse and transformation layer need stronger foundations, the next step is Data Foundation.

Sources

If you want to see what healthier operating patterns look like in practice, these case studies are the right next read:

Download the dbt Project Health Scorecard Worksheet

A lightweight PDF worksheet with the seven scoring categories, score bands, red-flag prompts, and a 30-day remediation planner.

Download

See It in Action

Common questions about dbt project health

What is a healthy dbt project?

A healthy dbt project is organized, testable, documented, observable, and maintainable enough that the business can trust the models behind important decisions. It does not need to be perfect, but it should not depend on tribal knowledge and luck.

What score should worry a data leader?

Anything below about 21 out of 35 usually means the project is still too fragile for high-stakes reporting. A team can operate from there, but it should treat the score as a warning that trust or delivery risk is building underneath the surface.

Should every dbt project optimize for advanced architecture?

No. The point is not to imitate the cleanest open-source demo project on the internet. The point is to create a dbt workflow that fits your business complexity, your team size, and the reporting risk you are carrying.

What usually breaks trust first in a weak dbt project?

Testing gaps, undocumented business logic, and silent source changes are usually the first trust killers. Once stakeholders catch one wrong executive metric, the rest of the project becomes harder to defend.

Tags :

About the author

Jason B. Hart

Founder & Principal Consultant

Founder & Principal Consultant at Domain Methods. Helps mid-size SaaS and ecommerce teams turn messy marketing and revenue data into decisions leaders trust.

Marketing attribution Revenue analytics Analytics engineering

Jason B. Hart is the founder of Domain Methods, where he helps mid-size SaaS and ecommerce teams build analytics they can trust and operating systems they can actually use. He has spent the better …

Linkedin Github

Get posts like this in your inbox

Subscribe for practical analytics insights — no spam, unsubscribe anytime.