GeekInterview.com
   Home |  Tech FAQ  |   Interview Questions |  Placement Papers |  Tech Articles |  Learn |  Freelance Projects |  Online Testing |  Geeks Talk |  Job Postings |  Knowledge Base | Site Search |  Add/Ask Question

  GeekInterview.com  >  Interview Questions  >  Data Warehousing  >  ETL

 Print  |  
Question:  Clean Data before Loading

Answer: Why is it necessary to clean data before loading it into the Warehouse


June 06, 2009 00:05:45 #1
 SQLGal   Member Since: October 2008    Total Comments: 6 

RE: Clean Data before Loading
 
Warehouse data is used as source data for data analysis and reporting. The data is organized into groups and categories (aggregation) and then summarized upon those groups (dimensions). These groups are based upon exactness.

For example "house", "houses", and "home" would fall into groups because they are not exact. But logically, they are the same and should be of the same group. The process of data cleansing would correct this. This is only one example of data cleansing.


The point is that if data is not cleansed, then the resulting reports and OLAP cubes will contain too many categories, making them hard to read. The results would also be skewed because factual data (totals, counts, etc) would be distributed across the good and bad categories. Once loaded into the data warehouse, it is very difficult, if not impossible to change.

     

 

Back To Question