GeekInterview.com
   Home |  Tech FAQ  |   Interview Questions |  Placement Papers |  Tech Articles |  Learn |  Freelance Projects |  Online Testing |  Geeks Talk |  Job Postings |  Knowledge Base | Site Search |  Add/Ask Question

  GeekInterview.com  >  Interview Questions  >  Data Warehousing

 Print  |  
Question:  What is data cleaning? How can we do that?



July 07, 2007 10:34:14 #5
 Hakoonamatata   Member Since: July 2007    Total Comments: 5 

RE: What is data cleaning? How can we do that?
 

Data cleaning is a self explainatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood, and hence without the vision to consolidate the same in a single repository of information. In such a scenario, the possiblities of the following are there:
1. Missing information for a column from one of the data sources;
2. Inconsistent information among different data sources;
3. Orphan records;
4. Outliar data points;
5. Different data types for the same information among various data sources, leading to improper conversion;
6. Data breaching business rules


In order to ensure that the data warehouse is not infected by any of these discrepencies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.

     

 

Back To Question