The ultimate goal of data cleansing is to improve the organization's confidence
in their data. First set the bar for what kind of quality you are trying to
obtain. I usually shoot for a 99% level of confidence in my data. List the types
of data errors that need to be addressed such as
1) Missing data - nulls, zeros, zero length strings, and corrupted rows
2) Data that contains unwanted junk such as an apostrophe or a comma or an extra
space
3) Numeric data errors such as a negative value that should be positive
4) Telephone numbers in the wrong format. Some errors are database errors and
others are business rule errors.
Next, write aggregate queries to find errors. (Or use an ETL tool) Analyze
the query results or transformation reports and measure the impact if the errors
go unfixed and so on....