|
| Total Answers and Comments: 8 |
Last Update: May 01, 2008 Asked by: Ram |
|
| | |
|
Submitted by: jryan999 Data Quality checks come in a number of forms:-
1. For FACT table rows, is there a valid lookup against each of the Dimensions
2. For FACT or DIMENSION rows, for each value:-
- Is it Null when it shouldn’t be
- Is the Data Type correct (eg. Number, Date)
- Is the range of values or format correct
- Is the row valid with relation to all the other source system business rules?
There is no magic way of checking the integrity of data.
You could simply count the number of rows in and out again and assume it’s all OK, but for a fact table (at the very minimum) you’ll need to cope with failed Dimension lookups (typically from late arriving Dimension rows).
Classic solution is, include a Dimension Key Zero and Minus One (Null and Invalid) in your Dimension Table. Null columns are set to the Zero key, and a lookup failure to the Minus One. You may need to store and re-cycle rows with failed lookups and treat these as updates – so if the missing Dimension row appears, the data is corrected.
Otherwise, you’ve no option. If the incoming data is from an unreliable source, you’ll need to check it’s validity or accept the warehouse includes wrong results.
If the warehouse includes a high percentage of incorrect or misleading values – what’s the point of having it ?
Above answer was rated as good by the following members: njvijay | Go To Top
|