GeekInterview.com
  I am new, Sign me up!
 
GeekInterview.com  >  Interview Questions  >  Data Warehousing
Go To First  |  Previous Question  |  Next Question 
 Data Warehousing  |  Question 32 of 98    Print  
What is data cleaning? How can we do that?

  
Total Answers and Comments: 5 Last Update: July 09, 2007     Asked by: GeekAdmin 
  
 Sponsored Links

 
 Best Rated Answer
Submitted by: Hakoonamatata
 

Data cleaning is a self explainatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood, and hence without the vision to consolidate the same in a single repository of information. In such a scenario, the possiblities of the following are there:
1. Missing information for a column from one of the data sources;
2. Inconsistent information among different data sources;
3. Orphan records;
4. Outliar data points;
5. Different data types for the same information among various data sources, leading to improper conversion;
6. Data breaching business rules


In order to ensure that the data warehouse is not infected by any of these discrepencies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.



Above answer was rated as good by the following members:
nitin_sikka
October 06, 2006 05:28:06   #1  
srinivas vadlakonda        

RE: What is data cleaning? How can we do that?
clearning means cleaning the data like filtering and merging before loading the data into datawarehouse
 
Is this answer useful? Yes | No
October 25, 2006 11:32:37   #2  
jessa        

RE: What is data cleaning? How can we do that?

data cleaning is removing discrepencies from the record like removing duplication redundancy etc..in short making the data as relevant as can be made for the ultimate purpose of business analysis


 
Is this answer useful? Yes | No
February 10, 2007 05:16:24   #3  
jyothi        

RE: What is data cleaning? How can we do that?
Data Cleaning is a process of avoiding the unnecessary information in the process of data maintainance. Data Cleaning can be done by using clustering
 
Is this answer useful? Yes | No
March 02, 2007 08:36:03   #4  
chinnodu        

RE: What is data cleaning? How can we do that?
It is a process of identifying and changing the inconsistencies and inaquerecies
 
Is this answer useful? Yes | No
July 09, 2007 10:34:14   #5  
Hakoonamatata Member Since: July 2007   Contribution: 5    

RE: What is data cleaning? How can we do that?

Data cleaning is a self explainatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood and hence without the vision to consolidate the same in a single repository of information. In such a scenario the possiblities of the following are there:
1. Missing information for a column from one of the data sources;
2. Inconsistent information among different data sources;
3. Orphan records;
4. Outliar data points;
5. Different data types for the same information among various data sources leading to improper conversion;
6. Data breaching business rules


In order to ensure that the data warehouse is not infected by any of these discrepencies it is important to cleanse the data using a set of business rules before it makes its way into the data warehouse.


 
Is this answer useful? Yes | NoAnswer is useful 1   Answer is not useful 0Overall Rating: +1    


 
Go To Top


 Sponsored Links

 
About Us -  Privacy Policy -  Terms and Conditions -  Contact -  Ask Question -  Propose Category -  Site Updates 

Copyright © 2005 - 2009 GeekInterview.com. All Rights Reserved

Page copy protected against web site content infringement by Copyscape