|
| Total Answers and Comments: 5 |
Last Update: July 09, 2007 Asked by: GeekAdmin |
|
| | |
|
Submitted by: Hakoonamatata
Data cleaning is a self explainatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood, and hence without the vision to consolidate the same in a single repository of information. In such a scenario, the possiblities of the following are there: 1. Missing information for a column from one of the data sources; 2. Inconsistent information among different data sources; 3. Orphan records; 4. Outliar data points; 5. Different data types for the same information among various data sources, leading to improper conversion; 6. Data breaching business rules
In order to ensure that the data warehouse is not infected by any of these discrepencies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.
Above answer was rated as good by the following members: nitin_sikka | Go To Top
|