How to Find Duplicate Rows and Null Values in CSV and Excel Files Without Writing Code

📊

Free Tool

Data Profiler

No account. No install. Runs in browser.

The Problem: Your Data Looks Clean But It Is Not

You download a report, open it in Excel, scroll through a few hundred rows and everything looks fine. Column headers are there. Values appear. You load it into Power BI, run your SQL import or hand it to a colleague and then the problems start. The dashboard shows duplicate revenue figures. A database constraint fires because a supposedly mandatory column has empty rows. A pivot table returns wrong totals because two rows for the same customer ID were counted separately.

This is one of the most common and most frustrating situations in data work. The file looked fine on a quick scroll. The problems were invisible until something downstream broke.

Duplicate rows and null values are the two most damaging data quality issues in everyday data work. They are also two of the hardest to detect manually, especially in files with hundreds of columns and thousands of rows. And they almost always arrive silently, with no warning in the file itself.

This guide explains exactly what causes these issues, why they are so hard to find manually, and how to detect both in any CSV or Excel file in under thirty seconds using a free browser-based data profiler tool, with no code, no formulas and no software installation required.

What Duplicate Rows Actually Are and Why They Appear

A duplicate row is a row that contains exactly the same values in every column as at least one other row in the same dataset. The entire record is repeated. Not just a repeated value in one column, but a complete identical copy of an entire row.

Duplicate rows appear for several common reasons. Data exported from two different systems and combined into one file often contains overlapping records that were not deduplicated before the merge. ETL pipelines that run multiple times without checking for existing records produce duplicates on every re-run. Manual data entry over time produces duplicates when the same event or transaction is entered twice. Database exports with certain JOIN configurations multiply rows when one-to-many relationships are not handled correctly.

The danger of duplicate rows is that they are statistically invisible until you calculate something. A table with 1000 rows looks the same whether 50 of those rows are duplicates or not. You cannot see the duplication by scrolling. The column headers look fine. The values look normal. Only when you run a COUNT DISTINCT query, build a revenue aggregation or load the data into a dashboard tool does the inflated count reveal itself.

In SQL terms, a 10 percent duplicate rate means every SUM you run is 10 percent too high and every COUNT is 10 percent too large. In a financial report, that is a material error. In a machine learning training dataset, duplicate records cause the model to overfit to those repeated examples.

What Null Values Are and Why They Cause Problems

A null value is an empty cell. A row where a value was expected but nothing was recorded. In CSV files, null values appear as consecutive commas with nothing between them, or as fields containing only whitespace. In Excel files, they are simply empty cells.

Null values break things in very specific ways depending on what you do with the data downstream.

In SQL databases, null values in columns defined as NOT NULL cause the entire import to fail with a constraint violation error. The import stops, sometimes after loading thousands of rows, leaving the database in a partial state.

In Power BI and Tableau, null values in measure columns produce blank values in visualisations. A revenue chart with nulls shows gaps. An average calculation ignores null rows entirely, which changes the result. A filter on a column with nulls does not return those rows even when the filter should have matched them.

In Python pandas and R, null values propagate through calculations. Any arithmetic operation involving a null produces a null result. A sum of a column with one null returns NaN, not the sum of the non-null values, unless you explicitly handle nulls first.

In Excel, null values cause VLOOKUP and INDEX MATCH to fail on those rows. Pivot tables treat null cells differently from zero values, which can produce misleading totals.

The most damaging null values are in columns you did not expect to be nullable. If a customer ID column should always have a value but 3 percent of rows are empty, every join, every lookup and every aggregation that uses that column will silently drop those rows from the result.

Why Checking Manually Does Not Work

The natural instinct is to open the file in Excel and scroll through it. For small files this is reasonable. For anything over a few hundred rows it becomes unreliable, time-consuming and error-prone.

For duplicate detection, a visual scan is almost useless. Two identical rows will look identical when you see them next to each other, but they are only next to each other if the file happens to be sorted by a column that puts them adjacent. Most real-world files are not sorted. The duplicate row could be anywhere in the file. Row 12 and row 8,447 could be identical and you would never find them by scrolling.

Excel's conditional formatting can highlight duplicate values in a single column but it does not detect full row duplicates across all columns simultaneously. You would need to add a helper column that concatenates every cell in a row into one string, then apply conditional formatting to that helper column. For files with many columns this becomes unwieldy and slow.

For null detection, Excel's Go To Special feature can highlight blank cells but it gives you no summary of where the nulls are, which columns have the most nulls or what percentage of each column is empty. You get a visual highlight across the entire spreadsheet with no column-level breakdown.

Neither approach gives you a systematic, column-by-column data quality report. Both approaches break down completely for large files. Neither works at all for API data that you fetch programmatically.

The Right Approach: Automated Data Profiling

Data profiling is the process of systematically examining a dataset and producing a structured report of its contents, completeness and quality issues. Professional data engineers run profiling scripts using Python pandas or SQL before loading any dataset into a pipeline. The profiling step catches issues before they propagate downstream.

The problem for most people is that running a profiling script requires writing code, setting up a Python environment, installing libraries and knowing which questions to ask. That is a significant barrier if you are a business analyst, BI developer or data manager who works with data regularly but does not write Python daily.

A browser-based data profiler eliminates all of that. You upload the file. The profiler runs the analysis automatically. You see the results immediately. No code, no setup, no installation.

How the TOOLBeans Data Profiler Works

The TOOLBeans Data Profiler at toolbeans.com/tools/data-profiler is a free browser-based data quality tool that accepts CSV files, Excel files in both XLSX and XLS format, and JSON files. It also connects directly to REST API endpoints to profile live data.

Everything runs in your browser. Your file is never uploaded to any server. This is important for files containing sensitive business data, financial records or personally identifiable information. The analysis happens locally using JavaScript and the results are displayed immediately.

When you upload a file or connect an API endpoint, the profiler automatically detects all column headers and analyses each column independently. The analysis covers null and empty value counts per column, fill rate as a percentage showing how complete each column is, unique value count and unique rate, dominant data type inferred from the actual values in each column, type mismatch detection for columns where the dominant type is numeric but some values are non-numeric strings, top five most frequent values, and numeric statistics including minimum, maximum, mean, median, standard deviation and sum for columns that contain numbers.

Duplicate row detection compares every row as a complete record across all columns simultaneously. Any row that is an exact copy of an earlier row is flagged as a duplicate with both the duplicate row number and the original row number reported.

The overall data quality score from zero to one hundred summarises all findings into a single number. It accounts for fill rates across all columns, the presence and volume of type mismatches, and the number of duplicate rows. A score above ninety indicates excellent data quality. A score below fifty indicates significant issues that need to be resolved before using the data.

Step-by-Step: Checking a CSV File for Duplicates and Nulls

Here is the exact process for profiling any CSV or Excel file using the TOOLBeans Data Profiler.

Go to toolbeans.com/tools/data-profiler in any modern browser. No account creation, no signup and no login is required.

Click the upload area or drag your CSV or Excel file onto it. The tool immediately reads the file locally in your browser and begins analysis.

The summary cards at the top show the total row count, total column count, overall completeness percentage, total null value count across the entire file, duplicate row count and the data quality score. These six numbers give you an immediate overview of the data health before looking at any detail.

Click the Issues tab to see the complete breakdown of every detected problem. The Duplicate Rows section shows the exact count of duplicates and, when you expand the list, the row number of each duplicate paired with the row number of the original it matches. For a file imported from two systems that overlapped, this list shows you exactly which rows to remove.

The Columns with Null Values section shows every column that contains any empty cells, with the null count and fill rate for each. A column with a 97 percent fill rate has 3 percent nulls. A column with a 60 percent fill rate is missing 40 percent of its expected values and likely has a data collection or export problem at the source.

The Type Mismatch Columns section shows any column where the dominant type is numeric but some values are text strings. This is the category that breaks SQL imports silently. An amount column that is mostly numbers but contains a few cells with values like n/a or TBD will fail any database load that expects a numeric type. The profiler flags it before you hit that error.

Click the Column Stats tab to see the full per-column analysis in card format. Each card shows the column name, inferred type, fill rate, unique value count, null count and quality score. Numeric columns also show the statistical summary. String columns show the character length range. The top frequent values for each column are displayed so you can immediately see whether a column contains the variety of values you expected or is dominated by one or two values.

The Data Table tab shows your actual data with inline visual highlights. Empty cells appear with a null label in a muted style so they are immediately visible. Cells that contain type mismatches are highlighted in amber. Duplicate rows are marked with a small indicator in the row number column.

The Schema tab shows the complete inferred schema: every column name, its detected type, whether it is nullable, whether it has all unique values and an example value. This is useful for documenting a dataset before loading it into a database or sharing it with another team.

Profiling API Data: Checking Live Endpoint Responses

The Data Profiler also works directly with REST API endpoints. Switch to the API tab, paste any URL that returns JSON and click Analyze. The tool fetches the response, flattens any nested JSON objects automatically and runs the same analysis as a file upload.

This is particularly useful for BI developers who are building dashboards on top of live APIs. Before connecting Power BI or Tableau to an API endpoint, profiling it shows exactly what the data looks like: how many records are returned, which fields have nulls, whether any type inconsistencies exist and what the value ranges are for numeric fields.

Several public demo endpoints are pre-loaded in the tool for immediate testing without any API credentials. The JSONPlaceholder posts and users endpoints are useful for seeing how the profiler handles typical REST API response structures.

For authenticated APIs, enter a Bearer token in the token field. The profiler adds it as an Authorization header in the request. This covers the majority of modern REST APIs that use Bearer token authentication.

Exporting the Data Quality Report

After profiling, click Export Report to download a structured JSON file containing the complete analysis. The report includes a summary section with total rows, total columns, quality score and completeness percentage, a column-level section with the full analysis for every column, and an issues section listing duplicate row indices, columns with nulls, columns with type mismatches and columns with low fill rates.

This report can be shared with a data source owner to document the quality issues found, included in a data handover checklist, used as input for a data cleaning script or archived as a quality baseline for the dataset.

The Schema tab also has a Copy Schema JSON button that copies the inferred schema as a JSON array. Each object in the array contains the column name, inferred type, nullable flag, unique flag and an example value. This is directly usable as a starting point for writing a CREATE TABLE statement or defining a data model.

Fixing the Issues You Find

The Data Profiler tells you where the problems are. Fixing them depends on your workflow.

For duplicate rows, the row numbers in the Issues tab tell you exactly which rows to delete. In Excel, you can navigate to those row numbers directly. For large files, using Excel's Remove Duplicates feature under the Data menu removes all duplicate rows automatically. For CSV files loaded into Python, a single df.drop_duplicates() call removes them.

For null values, the decision is whether to drop the rows with nulls, fill them with a default value or investigate why they are empty. The profiler tells you which columns have nulls and what percentage. If a column is 70 percent null, filling the missing values is not appropriate. The data collection process at the source is the actual problem.

For type mismatches, the most common fix is to clean the text values in the affected column before loading. Replace text like n/a, TBD, or - with actual null values or with zero, depending on what those entries mean in context.

Summary

Duplicate rows and null values are the two most common silent data quality problems in CSV and Excel files. They are invisible to a quick visual scan, they cause calculation errors in dashboards and they break database loads.

The TOOLBeans Data Profiler at toolbeans.com/tools/data-profiler detects both automatically for any CSV, Excel, JSON or API dataset. Upload your file, see the complete quality report in seconds and know exactly where the problems are before they reach your downstream tools.

The tool is free, requires no account and processes your file entirely in your browser with no data ever leaving your device.

Explore More Free Tools

TOOLBeans offers 39 free developer and PDF tools. No account needed.

Browse all 39 free tools

Frequently Asked Questions

Is Data Profiler free to use?

Yes. Data Profiler is completely free on TOOLBeans with no usage limits, no account and no credit card required.

Is my data safe when using TOOLBeans tools?

Browser-based tools run entirely in your browser so your data never leaves your device. PDF server tools process your file on a secure server and delete it immediately after conversion.

Do I need to install anything to use Data Profiler?

No installation is required. Data Profiler runs directly in your browser on any device, including mobile. Just visit TOOLBeans and start using it instantly.

How is TOOLBeans different from other online tools?

TOOLBeans offers 39 free tools with no paywalls, no account requirements and no usage limits. Browser tools process your data locally for maximum privacy.

Try Data Profiler for free on TOOLBeans

📊

Try it yourself