Lets suppose we have some 10,000 odd records in source system and when load them into target how do we ensure that all 10,000 records that are loaded to target doesn't contain any garbage values. How do we test it. We can't check every record as number of records are huge.

2000reddy
Profile

Aug 17th, 2006

dddf

jyothy

Sep 12th, 2006

Select count(*) From from both source table and Target table and compare the result.

prav2000
Profile Answers by prav2000

Dec 21st, 2006

You should have proper tesing conditions in your ETL jobs for validating all the important columns before they are loaded into the target. Always have proper rejects to capture records containing garbage values.

RufusA

Feb 15th, 2007

To do this, you must profile the data at the source to know the domain of all the values, get the actual number of rows in the source, get the types of the data in the source. After it is loaded into the target, this process can be repeated i.e. checking the data values with respect to range, type, etc and also checking the actual number of rows inserted. If the result before and after match, then we are OK. This process is automated typically in ETL tools.

Maharishi

May 7th, 2007

This is the base line for Error and Reject Handling. use IS_date, IS_number such function to check the data what we are loading following standard format. Don't forget to identify null values and existance of special characters in the data.

abhishek

Aug 23rd, 2007

Go into workflow monitor after showing the status succeed click right button go into the property and you can see there no of source row and success target rows and rejected rows

Abhishek

masatjai
Profile Answers by masatjai

Apr 22nd, 2008

1.check the number of records on source and the Target.
Select count(*) from source
Select count(*) from Target

2.consider a column of data type numeric say A on source and the Target
select sum(A) from source group by some key
select sum(A) from Target group by some key.
paste these two results in excel and make the difference of these ,they should all have zeros as the result.

Thanks
masatjai

jryan999
Profile Answers by jryan999 Questions by jryan999

May 1st, 2008

Data Quality checks come in a number of forms:-

1. For FACT table rows, is there a valid lookup against each of the Dimensions

2. For FACT or DIMENSION rows, for each value:-
Is it Null when it shouldn’t be
Is the Data Type correct (eg. Number, Date)
Is the range of values or format correct
Is the row valid with relation to all the other source system business rules?
There is no magic way of checking the integrity of data.

You could simply count the number of rows in and out again and assume it’s all OK, but for a fact table (at the very minimum) you’ll need to cope with failed Dimension lookups (typically from late arriving Dimension rows).

Classic solution is, include a Dimension Key Zero and Minus One (Null and Invalid) in your Dimension Table. Null columns are set to the Zero key, and a lookup failure to the Minus One. You may need to store and re-cycle rows with failed lookups and treat these as updates – so if the missing Dimension row appears, the data is corrected.

Otherwise, you’ve no option. If the incoming data is from an unreliable source, you’ll need to check it’s validity or accept the warehouse includes wrong results.

If the warehouse includes a high percentage of incorrect or misleading values – what’s the point of having it ?

geetanjali bhatia

Feb 1st, 2013

Run a Minus query between source and target

akshay tanksale

Jun 29th, 2013

We can carry out following steps :-

1. Check count of records for both source & target tables.

2. If the Source column is
a) Number :- Select count (column_name) from source;
Select count (column_name) from target;
b) Character :- Select sum(length(trim(column_name))) from source;
Select sum(length(trim(column_name))) from target;
(This is a "check sum" method to test columns like customer name,address etc.)

3. If the no. of records are huge,we can group data basis some columns & test the results.

govkj
Profile Answers by govkj Questions by govkj

Nov 7th, 2013

copy paste the table data into excel sheet and compare the source and target using macros.

If file is too large say for example table has 1 lac records,import the table data into flat file say for example .csv or .txt file.

Then spli those files into more chunks and convert them to excel and again compare using macros or simple excel formula.

Ravi

May 15th, 2014

HI,

Use minus operator between source and target then u can get the difference.
or

take source count and target count both shold be same.

Grace

May 29th, 2014

As other posts have mentioned, I would do some of the following:

Code
SELECT COLUMN, count(*) FROM TABLE GROUP BY COLUMN ORDER BY COLUMN


SELECT min(COLUMN), max(COLUMN) FROM TABLE


SELECT count(DISTINCT COLUMN) FROM TABLE


 


-- Below query is useful if you want to do manul analysis for first 5 records per column data category


SELECT * FROM 


     (SELECT COLUMN, row_number() over(partition BY COLUMN ORDER BY COLUMN) ALIAS_FOR_ROWNUMBER


     FROM TABLE) ALIAS_FOR_TABLE


    WHERE ALIAS_FOR_ROWNUMBER <= 5


 


-- Below query is useful for ensuring no data has been truncated after migration


SQL - SELECT max(len(COLUMN)) FROM TABLE


ORACLE - SELECT max(length(COLUMN)) FROM TABLE


 


/*


Also a good idea to check datatypes


For instance, SOURCE datatype may be varchar and TARGET datatype may be numeric; in which case, all non-numeric data such as Unk may be read as 0 OR any charaters following commas may be truncated


Ex. Source varchar data UNK could be digested as 0 in Target


Ex. Source varchar data 35,000 could be digested as 35 in Target


Likewise, its good to check that fields are appropriately null-able/not-null-able


*/


 


IF the TABLES are within same DB/Schema AND ETL IS One-to-One:


SQL - SELECT COLUMN_A FROM TABLE_A except SELECT COLUMN_B FROM TABLE_B


Oracle - SELECT COLUMN_A FROM Table_A minus SELECT COLUMN_B FROM TABLE_B;


-- You can also use INTERSECT function, which works similarly to above example

Best of luck.

Shwetha Bakkappa

Jun 5th, 2014

It requires 2 steps:

1.Select count(*) from source
Select count(*) from target

2. If source and target tables have same attributes and datatype

Select * from source
MINUS
Select * from target
Else
We have to go for attribute wise testing for each attribute according to design doc.

PAvel

Sep 28th, 2015

Code
SELECT * FROM source


 MINUS


 SELECT * FROM target


 


AND THEN


 


SELECT * FROM target


 MINUS


 SELECT * FROM source

Santhosh Gujja

Oct 15th, 2015

Step1: Select count(*) from source
Select count(*) from target
Step2:- Select all columns from source
minus
Select all columns from target-------- It should return zero
Step3:- Select all columns from target
Minus
Select all columns from source -------- It should return zero

Ayaskanta Ratha
Profile Answers by Ayaskanta Ratha

Nov 17th, 2015

You can union both the minus and then look for the result

Raja

Jun 12th, 2019

if you have Unix, please use this to command on source and target.
(checksum bytecount filename). in the file name pls provide your source file name and once the target is also loded, please provide the same file name and check the bytes.
using this command checksum value might be different but the byte count should remain same. if there is a change in the byte then there is a issue with source and target.

Lets suppose we have some 10,000 odd records in source system and when load them into target how do we ensure that all 10,000 records that are loaded to target doesn't contain any garbage values. How do we test it. We can't check every record as number of records are huge.

2000reddy
Profile

jyothy

prav2000
Profile Answers by prav2000

RufusA

Maharishi

abhishek

masatjai
Profile Answers by masatjai

jryan999
Profile Answers by jryan999 Questions by jryan999

geetanjali bhatia

akshay tanksale

govkj
Profile Answers by govkj Questions by govkj

Ravi

Grace

Shwetha Bakkappa

PAvel

Santhosh Gujja

Ayaskanta Ratha
Profile Answers by Ayaskanta Ratha

Raja

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Login

Lets suppose we have some 10,000 odd records in source system and when load them into target how do we ensure that all 10,000 records that are loaded to target doesn't contain any garbage values. How do we test it. We can't check every record as number of records are huge.

2000reddy Profile

jyothy

prav2000 Profile Answers by prav2000

RufusA

Maharishi

abhishek

masatjai Profile Answers by masatjai

jryan999 Profile Answers by jryan999 Questions by jryan999

geetanjali bhatia

akshay tanksale

govkj Profile Answers by govkj Questions by govkj

Ravi

Grace

Shwetha Bakkappa

PAvel

Santhosh Gujja

Ayaskanta Ratha Profile Answers by Ayaskanta Ratha

Raja

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

2000reddy
Profile

prav2000
Profile Answers by prav2000

masatjai
Profile Answers by masatjai

jryan999
Profile Answers by jryan999 Questions by jryan999

govkj
Profile Answers by govkj Questions by govkj

Ayaskanta Ratha
Profile Answers by Ayaskanta Ratha