Pointblank - NFHN Reader

Data validation toolkit for assessing and monitoring data quality.

Pointblank is a data validation framework for Python that makes data quality checks beautiful, powerful, and stakeholder-friendly. Instead of cryptic error messages, get stunning interactive reports that turn data issues into conversations.

Here’s what a validation looks like (click “Show the code” to see how it’s done):

Show the code

import pointblank as pb
import polars as pl

validation = (
    pb.Validate(
        data=pb.load_dataset(dataset="game_revenue", tbl_type="polars"),
        tbl_name="game_revenue",
        label="Comprehensive validation of game revenue data",
        thresholds=pb.Thresholds(warning=0.10, error=0.25, critical=0.35),
        brief=True
    )
    .col_vals_regex(columns="player_id", pattern=r"^[A-Z]{12}[0-9]{3}$")        # STEP 1
    .col_vals_gt(columns="session_duration", value=20)                          # STEP 2
    .col_vals_ge(columns="item_revenue", value=0.20)                            # STEP 3
    .col_vals_in_set(columns="item_type", set=["iap", "ad"])                    # STEP 4
    .col_vals_in_set(                                                           # STEP 5
        columns="acquisition",
        set=["google", "facebook", "organic", "crosspromo", "other_campaign"]
    )
    .col_vals_not_in_set(columns="country", set=["Mongolia", "Germany"])        # STEP 6
    .col_vals_between(                                                          # STEP 7
        columns="session_duration",
        left=10, right=50,
        pre = lambda df: df.select(pl.median("session_duration")),
        brief="Expect that the median of `session_duration` should be between `10` and `50`."
    )
    .rows_distinct(columns_subset=["player_id", "session_id", "time"])          # STEP 8
    .row_count_match(count=2000)                                                # STEP 9
    .col_count_match(count=11)                                                  # STEP 10
    .col_vals_not_null(columns="item_type")                                     # STEP 11
    .col_exists(columns="start_day")                                            # STEP 12
    .interrogate()
)

validation.get_tabular_report(title="Game Revenue Validation Report")

		STEP	COLUMNS	VALUES	EVAL	UNITS	PASS	FAIL	W	E	C	EXT
Game Revenue Validation Report
Comprehensive validation of game revenue data Polarsgame_revenueWARNING0.1ERROR0.25CRITICAL0.35
#4CA64C	1	col_vals_regex() Expect that values in `player_id` should match the regular expression: ^[A-Z]{12}[0-9]{3}$.	player_id	^[A-Z]{12}[0-9]{3}$	✓	2000	2000 1.00	0 0.00	○	○	○	—
#EBBC14	2	col_vals_gt() Expect that values in `session_duration` should be > `20`.	session_duration	20	✓	2000	1418 0.71	582 0.29	●	●	○
#FF3300	3	col_vals_ge() Expect that values in `item_revenue` should be >= `0.2`.	item_revenue	0.2	✓	2000	1192 0.60	808 0.40	●	●	●
#4CA64C	4	col_vals_in_set() Expect that values in `item_type` should be in the set of `iap`, `ad`.	item_type	iap, ad	✓	2000	2000 1.00	0 0.00	○	○	○	—
#4CA64C66	5	col_vals_in_set() Expect that values in `acquisition` should be in the set of `google`, `facebook`, `organic`, and 2 more.	acquisition	google, facebook, organic, crosspromo, other_campaign	✓	2000	1975 0.99	25 0.01	○	○	○
#AAAAAA	6	col_vals_not_in_set() Expect that values in `country` should not be in the set of `Mongolia`, `Germany`.	country	Mongolia, Germany	✓	2000	1775 0.89	225 0.11	●	○	○
#4CA64C	7	col_vals_between() Expect that the median of `session_duration` should be between `10` and `50`.	session_duration	[10, 50]	✓	1	1 1.00	0 0.00	○	○	○	—
#4CA64C66	8	rows_distinct() Expect entirely distinct rows across `player_id`, `session_id`, `time`.	player_id, session_id, time	—	✓	2000	1978 0.99	22 0.01	○	○	○
#4CA64C	9	row_count_match() Expect that the row count is exactly `2000`.	—	2000	✓	1	1 1.00	0 0.00	○	○	○	—
#4CA64C	10	col_count_match() Expect that the column count is exactly `11`.	—	11	✓	1	1 1.00	0 0.00	○	○	○	—
#4CA64C	11	col_vals_not_null() Expect that all values in `item_type` should not be Null.	item_type	—	✓	2000	2000 1.00	0 0.00	○	○	○	—
#4CA64C	12	col_exists() Expect that column `start_day` exists.	start_day	—	✓	1	1 1.00	0 0.00	○	○	○	—
2026-01-07 19:15:01 UTC< 1 s2026-01-07 19:15:01 UTC
Notes Step 7 (pre_applied) Precondition applied: table dimensions [2,000 rows, 11 columns] → [1 row, 1 column].

That’s the kind of report you get from Pointblank: clear, interactive, and designed for everyone on your team. And if you need help getting started or want to work faster, Pointblank has built-in AI support through the assistant() function to guide you along the way. You can also use DraftValidation to quickly generate a validation plan from your existing data (great for getting started fast).

Ready to validate? Start with our Installation guide or jump straight to the User Guide.

By the way, Pointblank is made with 💙 by Posit.

What is Data Validation?

Data validation ensures your data meets quality standards before it’s used in analysis, reports, or downstream systems. Pointblank provides a structured way to define validation rules, execute them, and communicate results to both technical and non-technical stakeholders.

With Pointblank you can:

Validate data through a fluent, chainable API with 25+ validation methods
Set thresholds to define acceptable levels of data quality (warning, error, critical)
Take actions when thresholds are exceeded (notifications, logging, custom functions)
Generate reports that make data quality issues immediately understandable
Inspect data with built-in tools for previewing, summarizing, and finding missing values

Pointblank is designed for the entire data team, not just engineers:

🎨 Beautiful Reports: Interactive validation reports that stakeholders actually want to read
📊 Threshold Management: Define quality standards with warning, error, and critical levels
🔍 Error Drill-Down: Inspect failing data to get to root causes quickly
🔗 Universal Compatibility: Works with Polars, Pandas, DuckDB, MySQL, PostgreSQL, SQLite, and more
🌍 Multilingual Support: Reports available in 40 languages for global teams
📝 YAML Support: Write validations in YAML for version control and team collaboration
⚡ CLI Tools: Run validations from the command line for CI/CD pipelines or as quick checks
📋 Rich Inspection: Preview data, analyze columns, and visualize missing values

Quick Examples

Threshold-Based Quality

Set expectations and react when data quality degrades (with alerts, logging, or custom functions):

validation = (
    pb.Validate(data=sales_data, thresholds=(0.01, 0.02, 0.05)) # Three threhold levels set
    .col_vals_not_null(columns="customer_id")
    .col_vals_in_set(columns="status", set=["pending", "shipped", "delivered"])
    .interrogate()
)

YAML Workflows

Works wonderfully for CI/CD pipelines and team collaboration:

validate:
  data: sales_data
  tbl_name: "sales_data"
  thresholds: [0.01, 0.02, 0.05]

steps:
  - col_vals_not_null:
      columns: "customer_id"
  - col_vals_in_set:
      columns: "status"
      set: ["pending", "shipped", "delivered"]

validation = pb.yaml_interrogate("validation.yaml")

Command Line Power

Run validations without writing code:

# Quick validation
pb validate sales_data.csv --check col-vals-not-null --column customer_id

# Run YAML workflows
pb run validation.yaml --exit-code  # <- Great for CI/CD!

# Explore your data
pb scan sales_data.csv
pb missing sales_data.csv

Installation

Install Pointblank using pip or conda:

pip install pointblank
# or
conda install conda-forge::pointblank

For specific backends:

pip install "pointblank[pl]"       # Polars support
pip install "pointblank[pd]"       # Pandas support
pip install "pointblank[duckdb]"   # DuckDB support
pip install "pointblank[postgres]" # PostgreSQL support

See the Installation guide for more details.

Text Formats

The docs are also available in llms.txt format:

llms.txt: a sitemap listing all documentation pages
llms-full.txt: all the documentation in one file