How to Check Data Quality Before Loading Into a Database or Power BI Dashboard

📊

Free Tool

Data Profiler

No account. No install. Runs in browser.

The Expensive Mistake BI Developers Make Every Week

You receive a dataset. A CSV from a client, an Excel export from an ERP system or a JSON response from a third-party API. You connect it to Power BI, run your data model refresh and the dashboard breaks. Or the SQL import runs for twenty minutes and fails at row 8,432 with a type conversion error. Or the report goes live and a stakeholder emails to say the total revenue figure is wrong by forty thousand dollars.

These are not edge cases. They are routine events for anyone who loads external data into dashboards, databases or analytics pipelines. The root cause is almost always the same: the data was not checked before it was loaded.

Data quality issues do not announce themselves. A CSV file with one column containing ninety-seven numeric values and three cells containing the text N/A looks identical to a file where that column contains one hundred numeric values. The column header is the same. The first ten rows look fine. The problem is invisible until your database constraint fires or your Power BI measure returns BLANK instead of a number.

The solution is a pre-load data quality check. A systematic review of the dataset before it goes anywhere downstream. This guide explains exactly which checks matter, why each one can break your data pipeline and how to run all of them automatically in seconds using a free online data profiling tool.

Why Pre-Load Data Quality Checks Are Not Optional

In a well-designed data pipeline, data quality checks happen before data moves to the next stage. This is not a best practice reserved for enterprise teams with dedicated data engineers. It is a basic protective step that prevents debugging time downstream.

The cost of catching a data quality issue at the source is near zero. Open the file, run a profiler, see the problems, fix them before loading. Five minutes.

The cost of catching the same issue after loading is much higher. A failed SQL import may leave the database in a partial state requiring a rollback. A Power BI report with incorrect calculations goes to stakeholders before anyone notices, requiring a correction and re-distribution. A machine learning model trained on data with type mismatches or duplicate records produces results that cannot be trusted without retraining.

Pre-load data quality checks are the cheapest possible insurance against all of these outcomes.

The Seven Checks Every Data Load Needs

After years of data engineering and BI development experience, seven checks cover the majority of issues that cause data loads to fail or produce wrong results. Running all seven before every data load eliminates most downstream problems.

Check One: Column Count and Header Names

The first thing to verify is that the file has the expected number of columns with the correct header names. A file sent weekly from the same source can change structure without any notification. A new column can be added, a column can be renamed, or columns can be reordered. Any of these breaks a pipeline that references columns by name or position.

Check that the column count matches what you expect. Check that every required column name is present and spelled correctly. Check for extra columns that were not present in previous versions of the file, which may indicate a schema change at the source.

Check Two: Row Count

Verify the row count is within a reasonable range for the dataset. A file that normally contains 50,000 records and arrives with 500 records has a truncation problem at the export. A file with 100,000 records when you expected 50,000 has a duplication problem. Both indicate something went wrong at the data source before the file reached you.

Row count checks are especially important for recurring data loads where you have a historical baseline to compare against. A 20 percent drop in row count with no corresponding business explanation is worth investigating before loading.

Check Three: Null and Empty Values Per Column

Every column that is supposed to always have a value needs a null check. This includes foreign key columns, date columns, amount columns and any column that will be used in a JOIN or WHERE clause.

A null in a foreign key column means that row cannot be joined to any reference table. It will drop out of every query that joins on that key, producing silently incomplete results. A null in a date column used in a time intelligence calculation causes that row to be excluded from period comparisons. A null in an amount column changes the result of every aggregation involving that column.

The null check should report both the absolute count of nulls per column and the fill rate as a percentage. A column with 0.1 percent nulls may be acceptable. A column with 30 percent nulls has a systemic collection problem.

Check Four: Data Type Consistency Per Column

Every column should contain values of a consistent type. An amount column should contain only numbers. A date column should contain only date-shaped values. A category column should contain only text strings from an expected set of values.

Type inconsistency is one of the most common causes of SQL import failures. A column defined as INT or DECIMAL in the database target cannot accept a row where that column contains the text value NULL or TBD. The database driver throws a type conversion error and the import fails.

Type inconsistency also causes problems in Power BI when it infers column types from the data. If a column contains mostly numbers but a few text values, Power BI may infer the column type as text, making all arithmetic operations on that column return errors.

The type check should identify not just the dominant type of each column but also flag any values that do not match the dominant type. Knowing that an amount column has 97 numeric values and 3 text values tells you exactly what to fix.

Check Five: Duplicate Rows

An exact duplicate row is a row where every column contains the same value as another row. Duplicates inflate every count and sum calculation. They cause double-counting in dashboards. They violate primary key constraints in databases.

The duplicate check needs to compare complete rows across all columns, not just a single column. A check that looks for duplicate values in an ID column will catch obvious cases but misses duplicates where the ID is slightly different but the rest of the row is identical.

Knowing the exact row numbers of duplicates lets you remove them precisely before loading.

Check Six: Statistical Outliers in Numeric Columns

Outliers in numeric columns are values that are so far from the typical range that they indicate a data entry error or a system problem. An order amount of 1,250 in a file where all other orders are between 10 and 500 might be correct but is worth verifying. A customer age of 847 is not correct.

Outliers do not necessarily mean the data is wrong but they do mean the data deserves a second look before loading. Including a genuinely incorrect extreme value in a dataset skews averages and distorts visualisations.

The standard statistical definition of an outlier is a value more than three standard deviations from the mean. This threshold catches the most extreme values while not flagging normal variation as problematic.

Check Seven: Schema Consistency with Previous Loads

For recurring data loads, the schema of the incoming file should match the schema of previous loads. New columns, removed columns, renamed columns and type changes all need to be reviewed before loading to ensure the downstream data model and dashboard are updated to handle them.

This check is most important for automated pipelines where data loads without human review. Documenting the expected schema and comparing each incoming file against it catches breaking changes before they cause silent failures.

How to Run All Seven Checks in One Step

Running these seven checks manually for every data file takes time. Writing scripts to automate them requires code. For teams without dedicated data engineers, neither option is practical for every data load.

The TOOLBeans Data Profiler at toolbeans.com/tools/data-profiler runs all seven checks automatically for any CSV, Excel or JSON file, and for any REST API endpoint. The analysis takes seconds and requires no code, no installation and no account.

Here is what the tool reports for each of the seven checks.

For column count and header names: the summary cards show the total column count. The Schema tab lists every column name with its inferred type, giving you a complete view of the file structure in one place.

For row count: the summary card shows the exact total row count immediately on analysis completion.

For null and empty values: the Issues tab shows every column with any null values, with the absolute count and fill rate. The Column Stats tab shows the fill rate for every column as a percentage bar, making low fill-rate columns immediately visible.

For data type consistency: the Column Stats cards show the dominant inferred type for each column and flag any type mismatches with the count of non-conforming values. A column inferred as integer with three text values shows the mismatch count directly on the card.

For duplicate rows: the Issues tab shows the exact count of duplicate rows and, when expanded, the row number of each duplicate paired with the row number of its original.

For statistical outliers: any numeric column where outliers are detected shows the outlier count on the column stats card. The outlier definition is values more than three standard deviations from the column mean.

For schema documentation: the Schema tab shows the complete inferred schema as a table. The Copy Schema JSON button exports it as a JSON array that can be saved as documentation or used as the basis for a CREATE TABLE statement.

Using the Data Profiler for Power BI Pre-Load Checks

Power BI Desktop has its own data profiling view in the Power Query Editor. It shows column quality, column distribution and column profile for each column in a query. It is useful but it requires opening Power BI Desktop, loading the data into a query and navigating to the View menu to enable profiling columns.

For a quick check before deciding whether to even start the Power BI load, a standalone browser-based profiler is faster. Upload the CSV or Excel file, see the quality score and issues summary in under ten seconds and decide immediately whether the file needs cleaning before you open Power BI at all.

This is especially useful when you receive a file from an external source and need to quickly assess whether it is ready to use or whether it needs to go back to the sender for corrections.

Using the Data Profiler for Database Import Checks

For SQL database imports, the most valuable checks are null counts in NOT NULL columns and type mismatches in columns with strict type definitions.

Before running any INSERT or COPY INTO command, upload the source file to the Data Profiler and check two things. First, look at the null counts for every column that is defined as NOT NULL in the target table. Any nulls in those columns will cause the import to fail. Second, look for type mismatches in numeric and date columns. Any text values in those columns will cause type conversion errors.

These two checks take about thirty seconds in the profiler and prevent the majority of SQL import failures. If either check finds issues, fix the source data before running the import. Adding a step to clean the problematic values takes far less time than debugging a failed import and rolling back a partial load.

Working With API Data Before Building Dashboards

When building a dashboard that connects to a live API endpoint rather than a static file, the same data quality concerns apply. API responses can contain null values, type inconsistencies and duplicate records just as file-based datasets can.

Switch to the API tab in the Data Profiler, paste the endpoint URL, add a Bearer token if required and click Analyze. The tool fetches the JSON response, flattens nested objects into a flat column structure and runs the same analysis as a file upload.

This is useful before building any Power BI or Looker Studio connector to a new API. Profiling the endpoint first shows you exactly what the data looks like before you invest time building the data model. You see how many records the endpoint returns, which fields are consistently populated, what the value ranges are for numeric fields and whether any structural inconsistencies exist that would complicate the data model design.

Exporting and Documenting Quality Reports

After profiling a dataset, click Export Report to download a structured JSON quality report. This report contains the complete analysis including all column statistics, the duplicate row list, the null column list, the type mismatch list and the overall quality score.

For teams that do regular data loads, saving these reports creates a quality history for each dataset. Comparing quality reports over time shows whether a data source is improving or deteriorating. A vendor whose monthly data file has a quality score of 88 one month and 71 the next has introduced new issues worth investigating.

For data handover situations, the quality report provides documented evidence of what was checked and what was found before a dataset was loaded. This is useful for audit purposes and for communicating data issues back to the source.

The Practical Workflow: Five Minutes Before Every Data Load

Building a pre-load data quality check into your regular workflow takes about five minutes for any dataset and eliminates the majority of downstream load failures and calculation errors.

Open toolbeans.com/tools/data-profiler in your browser. Upload the file you are about to load. Read the summary cards. Check the quality score. If the score is below 80, open the Issues tab and review what the profiler found. Fix the issues in the source file using Excel, a simple script or by sending it back to the source for correction. Upload the cleaned file and confirm the quality score improved. Load the clean file into your database or dashboard.

This workflow catches data problems before they reach production. It takes less time than debugging a broken dashboard and far less time than explaining to a stakeholder why the revenue figure in last week's report was wrong.

The Tools That Work Alongside the Data Profiler

After profiling and cleaning a dataset, the next common steps are often converting it to a format suitable for a database load or analysing it further. The TOOLBeans CSV to SQL converter at toolbeans.com/tools/csv-to-sql converts a cleaned CSV directly to SQL INSERT statements for MySQL, PostgreSQL, SQLite, SQL Server, MariaDB and Oracle. The JSON Formatter at toolbeans.com/tools/json-formatter validates and formats any JSON data the profiler helps you analyse. The API Request Tester at toolbeans.com/tools/api-request-tester lets you inspect raw API responses before profiling them, useful for understanding the response structure of a new endpoint.

All of these tools are free, require no account and run in your browser.

Summary

Data quality issues in CSV, Excel and API datasets cause SQL import failures, wrong dashboard calculations and broken Power BI reports. The seven checks that catch the majority of these issues before a load are column count and header verification, row count sanity check, null value detection per column, data type consistency per column, duplicate row detection, outlier identification in numeric columns and schema consistency with previous loads.

The TOOLBeans Data Profiler at toolbeans.com/tools/data-profiler runs all seven checks automatically for any CSV, Excel, JSON or API dataset in seconds. The analysis is free, requires no account and processes your data entirely in your browser with nothing ever uploaded to a server.

Running a data quality check before every load is the single most effective step you can take to prevent the downstream errors that consume debugging time and damage stakeholder trust in your data work.

Explore More Free Tools

TOOLBeans offers 39 free developer and PDF tools. No account needed.

Browse all 39 free tools

Frequently Asked Questions

Is Data Profiler free to use?

Yes. Data Profiler is completely free on TOOLBeans with no usage limits, no account and no credit card required.

Is my data safe when using TOOLBeans tools?

Browser-based tools run entirely in your browser so your data never leaves your device. PDF server tools process your file on a secure server and delete it immediately after conversion.

Do I need to install anything to use Data Profiler?

No installation is required. Data Profiler runs directly in your browser on any device, including mobile. Just visit TOOLBeans and start using it instantly.

How is TOOLBeans different from other online tools?

TOOLBeans offers 39 free tools with no paywalls, no account requirements and no usage limits. Browser tools process your data locally for maximum privacy.

Try Data Profiler for free on TOOLBeans

📊

Try it yourself