
The dbt Project Health Scorecard
- Jason B. Hart
- Data engineering
- April 7, 2026
Table of Contents
What Is a dbt Project Health Scorecard?
A dbt project health scorecard is a practical way to measure whether your dbt project is organized and governed well enough to support trusted reporting, faster analytics delivery, and fewer avoidable data incidents.
A lot of teams think they have a healthy dbt project because:
- the repo exists
- the DAG looks impressive in screenshots
- the warehouse is running
- somebody can still get the dashboard out on time
That is not the same thing as health.
A dbt project can be active and still be fragile.
It can compile cleanly while the important models are under-tested. It can have dozens of marts while nobody can explain which one leadership should trust. It can look modern while one source-table change is still enough to break a revenue number quietly.
That is why this scorecard matters.
It gives heads of data, analytics engineering leads, and overstretched operators a way to separate “we use dbt” from “our dbt project actually deserves trust.”
Why This Scorecard Exists
This is not just an internal hygiene topic for data teams.
The broader market is moving in the same direction. In dbt Labs’ 2025 State of Analytics Engineering report, poor data quality remained the #1 hurdle for 56% of respondents, and 38% said they planned to increase investment in data quality and observability. Their 2024 report also found that increasing data trust had emerged as the top focus area for many data organizations.
That combination is worth paying attention to.
Teams are not investing in data quality because it sounds mature. They are investing because brittle models, weak tests, and silent source failures are still the fastest way to turn a modern analytics stack into an expensive trust problem.
dbt usually enters the stack for a good reason.
The company outgrows spreadsheet logic. The reporting layer gets too important to leave inside dashboards. Metric definitions start drifting across teams. The warehouse exists, but the transformation layer still feels improvised.
So the team adopts dbt.
That is often the right move.
But a lot of dbt projects plateau in an awkward middle state:
- better than the old ad hoc SQL mess
- not stable enough to support high-stakes reporting cleanly
- too central to ignore
- too under-governed to trust fully
That is the moment this scorecard is built for.
How to Use the Scorecard
Score each of the seven categories from 1 to 5.
- 1 = fragile
- 2 = reactive
- 3 = usable but risky
- 4 = strong
- 5 = disciplined and repeatable
You are not grading elegance. You are grading whether the project helps the business make decisions without leaning on hidden heroics.
Score Bands: What the Total Usually Means
With seven categories, the total score ranges from 7 to 35.
| Total score | What it usually means | Practical interpretation |
|---|---|---|
| 7-14 | Fragile prototype | dbt exists, but the project is still too dependent on one person, one mental model, or one calm week |
| 15-20 | Working but reactive | the team can ship, but trust and maintainability break down under pressure |
| 21-27 | Operationally useful | the project is good enough for real use, but there are still clear weak points to tighten |
| 28-35 | Trustworthy operating asset | dbt is no longer just a tool in the stack; it is a maintained business system |
A useful rule of thumb:
If your total is below 21, the project probably feels healthier to the data team than it looks to the business.
The 7 Dimensions That Actually Matter
1. Model Organization and Naming
This is the first thing I look at because messy structure usually predicts messy thinking.
A healthy project has:
- a clear staging/intermediate/marts pattern or another equally explicit structure
- model names that tell a human what the table is for
- folders that reflect real business logic rather than one person’s memory
- marts that are aligned to decisions, not just tool outputs
A weak project usually has:
- business logic spread across oddly named models
- duplicated logic in multiple marts
- folders organized by who happened to build them
- “final_v2” style artifacts still in active use
How to score model organization
| Score | What it looks like |
|---|---|
| 1 | models are hard to navigate, naming is inconsistent, and important logic is duplicated or hidden |
| 2 | some structure exists, but conventions are weak and the project still depends on tribal knowledge |
| 3 | the basic layering works, but a few marts or subject areas are already drifting |
| 4 | structure is clear, consistent, and understandable to a new team member |
| 5 | the project structure reflects business domains cleanly and is easy to extend without confusion |
2. Test Coverage
This is usually the category that determines whether the project feels trustworthy or performative.
You do not need every possible test. You do need enough testing that bad data stops being a stakeholder surprise.
At minimum, serious projects should have:
- primary key uniqueness where it matters
not_nullcoverage on important fields- relationship or accepted-values tests on business-critical models
- a habit of adding tests when incidents reveal blind spots
What good test coverage is really protecting against
It protects against:
- schema drift upstream
- duplicate joins that inflate revenue or user counts
- status values that change quietly
- executives making decisions on stale or broken marts
How to score test coverage
| Score | What it looks like |
|---|---|
| 1 | almost no meaningful tests exist, or tests only cover cosmetic cases |
| 2 | some generic tests exist, but important marts still have obvious blind spots |
| 3 | key models have baseline coverage, but business-logic testing is inconsistent |
| 4 | important marts and sources are tested well enough to catch most damaging failures |
| 5 | testing is treated as a core operating discipline and expands when the business risk expands |
3. Documentation Completeness
dbt docs are useful only if the team writes documentation another human can actually use.
“Total revenue: total revenue” is not documentation.
Good documentation explains:
- what the model is for
- what the metric includes and excludes
- which team cares about it
- where caveats still exist
- who owns changes when the definition moves
Why this category matters more than most teams think
Weak documentation does not just slow onboarding. It also makes definition drift harder to catch, because nobody can tell whether a change is a bug, a business update, or a silent rewrite of the rules.
How to score documentation
| Score | What it looks like |
|---|---|
| 1 | descriptions are missing, vague, or only technically correct |
| 2 | some models are documented, but critical metrics still require Slack archaeology |
| 3 | important models have usable descriptions, though consistency is uneven |
| 4 | documentation gives both technical and business context for the models that matter |
| 5 | docs are part of the team’s operating rhythm and clearly support onboarding, trust, and change management |
4. Source Freshness and Observability
A lot of dbt projects look healthy until a source changes quietly.
That is why this category matters.
You need to know:
- when a source stopped landing
- when row volume changed abnormally
- when freshness drift is putting dashboards at risk
- when upstream schema changes are about to cascade through the DAG
If nobody notices those problems until a stakeholder asks why yesterday’s numbers vanished, the project is not healthy.
How to score freshness and observability
| Score | What it looks like |
|---|---|
| 1 | freshness is mostly assumed, and incidents are discovered by end users |
| 2 | a few alerts or manual checks exist, but they do not cover the critical path |
| 3 | important sources are monitored, though coverage and escalation are still inconsistent |
| 4 | the team has reliable freshness checks and catches most upstream issues early |
| 5 | source monitoring is systematic, actionable, and tied to clear ownership and response patterns |
5. DAG Depth and Ref Chain Discipline
A complex DAG is not automatically a problem.
An unnecessary one is.
When important marts depend on too many brittle layers, the project becomes hard to reason about and slow to change. Every request turns into a lineage archaeology exercise.
A healthy project does not optimize for the prettiest graph. It optimizes for understandable dependencies.
Signs the ref chain has become a liability
- teams are afraid to change shared models because the blast radius is unclear
- key marts rely on long chains of transformations that few people understand end to end
- “temporary” intermediate models have become permanent architecture
- debugging one bad number means opening twelve files and three mental models
How to score DAG depth and dependency discipline
| Score | What it looks like |
|---|---|
| 1 | lineage is tangled, dependency chains are excessive, and nobody is confident about blast radius |
| 2 | some domains are manageable, but the important paths are already too deep or brittle |
| 3 | the DAG is workable, though several areas need simplification before they become chronic drag |
| 4 | dependency chains are intentional and understandable for critical business models |
| 5 | the DAG is lean, explicit, and supports fast iteration without hidden complexity tax |
6. Materialization and Cost Strategy
This category exposes whether the team is actually operating the project or just accumulating models.
Materialization strategy is not only a performance question. It is also a clarity question.
The team should be able to explain:
- why a model is incremental, view, or table
- where rebuild cost matters
- where freshness expectations justify heavier storage
- which marts are expensive because they are valuable versus expensive because nobody revisited the design
How to score materialization discipline
| Score | What it looks like |
|---|---|
| 1 | materializations feel accidental and cost/performance trade-offs are mostly unknown |
| 2 | a few patterns exist, but many models still use defaults without clear reasoning |
| 3 | the team has basic discipline, though some high-cost or slow models need redesign |
| 4 | materialization choices are mostly intentional and aligned to usage, cost, and freshness needs |
| 5 | cost, performance, and reliability trade-offs are reviewed deliberately as part of project maintenance |
7. CI/CD and Change Management
This category usually separates “dbt repo” from “reliable dbt operating system.”
A healthy project has a sane path from change to production. That usually includes some mix of:
- pull request review
- environment separation
- build or test checks before merge
- a release process the team can explain
- incident handling when a bad change still gets through
The exact tooling can vary. The discipline cannot.
How to score CI/CD and delivery discipline
| Score | What it looks like |
|---|---|
| 1 | changes go live informally with little review or pre-merge validation |
| 2 | there is some review, but release discipline depends heavily on the person shipping |
| 3 | pull requests and basic checks exist, though production safety still feels uneven |
| 4 | dbt changes move through a clear, reviewable workflow with dependable validation |
| 5 | the team has a stable delivery system that supports iteration without making production trust fragile |
A One-Page Scoring Table
Use this as the fast version.
| Category | Your score (1-5) | Biggest risk right now | Owner |
|---|---|---|---|
| Model organization | |||
| Test coverage | |||
| Documentation completeness | |||
| Source freshness and observability | |||
| DAG depth and ref-chain discipline | |||
| Materialization and cost strategy | |||
| CI/CD and change management |
Which Low Scores Matter Most First?
Not every weak score is equally dangerous.
If you need a prioritization rule, start here:
- Test coverage and source observability first, because silent failure destroys trust fastest
- CI/CD discipline next, because weak shipping process keeps reintroducing avoidable risk
- Documentation and model structure after that, because they determine whether the project can scale beyond one heroic operator
- DAG depth and materialization discipline once the project is stable enough to refactor intentionally
That order is not universal. But it is usually a more useful sequence than “let’s clean up the whole repo.”
Red Flags That Usually Mean the Project Is Less Healthy Than the Team Thinks
If two or more of these are true, the scorecard is probably telling you something important:
- leadership trusts the dashboard only after a verbal caveat from the data team
- a source change has broken production metrics more than once this quarter
- nobody can explain the highest-stakes marts without opening multiple files live
- new team members struggle to tell which model is canonical
- the repo has tests, but the important business rules are still mostly untested
- costs or runtimes are rising, but nobody owns the trade-offs
- the project keeps absorbing unclear business requests faster than it clarifies them
That last point matters more than most technical teams want to admit.
A lot of dbt health problems are upstream translation problems wearing a tooling costume. If the business keeps changing the question, the dbt project often ends up encoding confusion instead of resolving it. That is exactly the moment to start with Translate the Ask before throwing more refactoring hours at the symptom.
What a 30-Day Repair Plan Usually Looks Like
If your project scores in the fragile or reactive range, do not try to fix everything at once.
A practical 30-day sequence:
Week 1: identify the critical path
Pick the models tied to revenue, pipeline, lifecycle reporting, or other executive metrics. Score those first. Ignore low-stakes cleanup until the trust path is visible.
Week 2: patch the trust leaks
Add or improve tests on the most decision-critical models. Make sure source freshness checks exist where broken upstream data would create leadership confusion.
Week 3: clarify the models humans actually rely on
Improve descriptions, owner fields, naming consistency, and the relationship between staging, intermediate, and mart layers.
Week 4: tighten the release path
Make sure dbt changes pass through a reviewable delivery flow with enough validation that production risk stops feeling casual.
That will not make the project perfect. It will make it less fragile. And that is usually the right next goal.
Download the Worksheet
Use the worksheet with your analytics lead, analytics engineer, or head of data.
You do not need an elaborate maturity model. You need an honest score, a clear owner, and a short list of fixes the business will actually feel.
Download the dbt Project Health Scorecard Worksheet (PDF)
A text-first worksheet for grading your dbt project across structure, testing, documentation, freshness monitoring, DAG depth, materialization strategy, and CI/CD discipline.
If the score says the project is under-governed because the requirements are still moving, start with Translate the Ask. If the score says the whole warehouse and transformation layer need stronger foundations, the next step is Data Foundation.
Sources
Related Proof
If you want to see what healthier operating patterns look like in practice, these case studies are the right next read:
Download the dbt Project Health Scorecard Worksheet
A lightweight PDF worksheet with the seven scoring categories, score bands, red-flag prompts, and a 30-day remediation planner.
DownloadSee It in Action
Common questions about dbt project health
What is a healthy dbt project?
What score should worry a data leader?
Should every dbt project optimize for advanced architecture?
What usually breaks trust first in a weak dbt project?

About the author
Jason B. Hart
Founder & Principal Consultant
Founder & Principal Consultant at Domain Methods. Helps mid-size SaaS and ecommerce teams turn messy marketing and revenue data into decisions leaders trust.
Jason B. Hart is the founder of Domain Methods, where he helps mid-size SaaS and ecommerce teams build analytics they can trust and operating systems they can actually use. He has spent the better …
Get posts like this in your inbox
Subscribe for practical analytics insights — no spam, unsubscribe anytime.

