Find Statistical Outliers in CSV and Excel Data
Upload any CSV or Excel file and instantly detect outliers in every numeric column using the 3-sigma rule. See which values are more than 3 standard deviations from the mean the standard statistical definition of an outlier.
CSV & Excel Data Analyzer
Find duplicates, nulls & errors ยท Clean & export ยท Auto dashboard
Drop your data file here
or click to browse CSV, Excel (.xlsx / .xls) or JSON
Related Tools
Free tools that complement your data workflow.
Free CSV and Excel Data Analyzer Find Duplicates, Nulls and Errors Instantly
Upload any CSV or Excel file to instantly find duplicate rows, null values, type mismatches and data quality issues. The Clean Data tab lets you remove duplicates, fill nulls and standardize headers in one click, then download the cleaned file. The Dashboard tab auto-generates charts from your data. No Python, no SQL, no formulas required.
What Is the 3-Sigma Rule for Outlier Detection?
The 3-sigma rule, also called the empirical rule or the 68-95-99.7 rule, is the standard statistical method for identifying unusual values in a dataset. In a normal distribution, 68% of values fall within 1 standard deviation of the mean, 95% fall within 2, and 99.7% fall within 3. Any value outside 3 standard deviations is therefore in the most extreme 0.3% of the distribution and is considered a statistical outlier.
Four Common Causes of Outliers in Business Data
A person typed 100000 instead of 10000. Or a form accepted scientific notation like 1e5 which appears as a massive number. Or a copy-paste duplicated a digit. These are the most common outliers in manually entered spreadsheets and they are genuinely wrong values that should be corrected.
An API returned -1 as a default for a missing reading. A sensor produced a null reading that got stored as 0 or 9999. A database default value of 99999 was used for unknown records. These are not real data points and should be treated as missing values, not extreme but valid ones.
A VIP customer who placed a $500,000 order while every other order is under $10,000. A transaction on Black Friday that is 20x the daily average. These are genuine values that accurately represent reality and should not be removed they are the signal, not the noise.
Records inserted during testing that have amounts like 12345.67 or dates in 1970 or 2099. These should be filtered out before any production analysis. Outlier detection often catches them because test values are chosen to be distinctive rather than realistic.