Lets suppose we have some 10,000 odd records in source system and when load them into target how do we ensure that all 10,000 records that are loaded to target doesn't contain any garbage values. How do we test it. We can't check every record as number of records are huge.

Showing Answers 1 - 18 of 18 Answers

jyothy

  • Sep 12th, 2006
 

Select count(*) From from both source table and Target table and compare the result.

  Was this answer useful?  Yes

prav2000

  • Dec 21st, 2006
 

You should have proper tesing conditions in your ETL jobs for validating all the important columns before they are loaded into the target. Always have proper rejects to capture records containing garbage values.

  Was this answer useful?  Yes

RufusA

  • Feb 15th, 2007
 

To do this, you must profile the data at the source to know the domain of all the values, get the actual number of rows in the source, get the types of the data in the source. After it is loaded into the target, this process can be repeated i.e. checking the data values with respect to range, type, etc and also checking the actual number of rows inserted. If the result before and after match, then we are OK. This process is automated typically in ETL tools.

Maharishi

  • May 7th, 2007
 

This is the base line for Error and Reject Handling. use IS_date, IS_number such function to check the data what we are loading following standard format. Don't forget to identify null values and existance of special characters in the data.

  Was this answer useful?  Yes

abhishek

  • Aug 23rd, 2007
 

Go into workflow monitor after showing the status succeed click right button go into the property and you can see there no of source row and success target rows and rejected rows

Abhishek

  Was this answer useful?  Yes

masatjai

  • Apr 22nd, 2008
 

1.check the number of records on source and the Target.
Select count(*) from source
Select count(*) from Target

2.consider a column of data type numeric say A on source and the Target
select sum(A) from source group by some key
select sum(A) from Target group by some key.
paste these two results in excel and make the difference of these ,they should all have zeros as the result.

Thanks
masatjai

  Was this answer useful?  Yes

Data Quality checks come in a number of forms:-

1. For FACT table rows, is there a valid lookup against each of the Dimensions

2. For FACT or DIMENSION rows, for each value:-

  • Is it Null when it shouldn’t be
  • Is the Data Type correct (eg. Number, Date)
  • Is the range of values or format correct
  • Is the row valid with relation to all the other source system business rules?
There is no magic way of checking the integrity of data.

You could simply count the number of rows in and out again and assume it’s all OK, but for a fact table (at the very minimum) you’ll need to cope with failed Dimension lookups (typically from late arriving Dimension rows).

Classic solution is, include a Dimension Key Zero and Minus One (Null and Invalid) in your Dimension Table. Null columns are set to the Zero key, and a lookup failure to the Minus One. You may need to store and re-cycle rows with failed lookups and treat these as updates – so if the missing Dimension row appears, the data is corrected.

Otherwise, you’ve no option. If the incoming data is from an unreliable source, you’ll need to check it’s validity or accept the warehouse includes wrong results.

If the warehouse includes a high percentage of incorrect or misleading values – what’s the point of having it ?

geetanjali bhatia

  • Feb 1st, 2013
 

Run a Minus query between source and target

  Was this answer useful?  Yes

akshay tanksale

  • Jun 29th, 2013
 

We can carry out following steps :-

1. Check count of records for both source & target tables.

2. If the Source column is
a) Number :- Select count (column_name) from source;
Select count (column_name) from target;
b) Character :- Select sum(length(trim(column_name))) from source;
Select sum(length(trim(column_name))) from target;
(This is a "check sum" method to test columns like customer name,address etc.)

3. If the no. of records are huge,we can group data basis some columns & test the results.

  Was this answer useful?  Yes

copy paste the table data into excel sheet and compare the source and target using macros.

If file is too large say for example table has 1 lac records,import the table data into flat file say for example .csv or .txt file.

Then spli those files into more chunks and convert them to excel and again compare using macros or simple excel formula.

  Was this answer useful?  Yes

Ravi

  • May 15th, 2014
 

HI,

Use minus operator between source and target then u can get the difference.
or

take source count and target count both shold be same.

  Was this answer useful?  Yes

Grace

  • May 29th, 2014
 

As other posts have mentioned, I would do some of the following:


Code
  1. SELECT COLUMN, count(*) FROM TABLE GROUP BY COLUMN ORDER BY COLUMN

  2. SELECT min(COLUMN), max(COLUMN) FROM TABLE

  3. SELECT count(DISTINCT COLUMN) FROM TABLE

  4.  

  5. -- Below query is useful if you want to do manul analysis for first 5 records per column data category

  6. SELECT * FROM

  7.      (SELECT COLUMN, row_number() over(partition BY COLUMN ORDER BY COLUMN) ALIAS_FOR_ROWNUMBER

  8.      FROM TABLE) ALIAS_FOR_TABLE

  9.     WHERE ALIAS_FOR_ROWNUMBER <= 5

  10.  

  11. -- Below query is useful for ensuring no data has been truncated after migration

  12. SQL - SELECT max(len(COLUMN)) FROM TABLE

  13. ORACLE - SELECT max(length(COLUMN)) FROM TABLE

  14.  

  15. /*

  16. Also a good idea to check datatypes

  17. For instance, SOURCE datatype may be varchar and TARGET datatype may be numeric; in which case, all non-numeric data such as Unk may be read as 0 OR any charaters following commas may be truncated

  18. Ex. Source varchar data UNK could be digested as 0 in Target

  19. Ex. Source varchar data 35,000 could be digested as 35 in Target

  20. Likewise, its good to check that fields are appropriately null-able/not-null-able

  21. */

  22.  

  23. IF the TABLES are within same DB/Schema AND ETL IS One-to-One:

  24. SQL - SELECT COLUMN_A FROM TABLE_A except SELECT COLUMN_B FROM TABLE_B

  25. Oracle - SELECT COLUMN_A FROM Table_A minus SELECT COLUMN_B FROM TABLE_B;

  26. -- You can also use INTERSECT function, which works similarly to above example

  27.  


Best of luck.

  Was this answer useful?  Yes

Shwetha Bakkappa

  • Jun 5th, 2014
 

It requires 2 steps:

1.Select count(*) from source
Select count(*) from target

2. If source and target tables have same attributes and datatype

Select * from source
MINUS
Select * from target
Else
We have to go for attribute wise testing for each attribute according to design doc.

  Was this answer useful?  Yes

Santhosh Gujja

  • Oct 15th, 2015
 

Step1: Select count(*) from source
Select count(*) from target
Step2:- Select all columns from source
minus
Select all columns from target-------- It should return zero
Step3:- Select all columns from target
Minus
Select all columns from source -------- It should return zero

  Was this answer useful?  Yes

Raja

  • Jun 12th, 2019
 

if you have Unix, please use this to command on source and target.
(checksum bytecount filename). in the file name pls provide your source file name and once the target is also loded, please provide the same file name and check the bytes.
using this command checksum value might be different but the byte count should remain same. if there is a change in the byte then there is a issue with source and target.

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions