A sequential file has 8 records with one column, below are the values in the column separated by space, 1 1 2 2 3 4 5 6
In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates. File 1 records separated by space: 1 1 2 2 File 2 records separated by space: 3 4 5 6 How will you do it
We can segregate the file into two files by using AGGREGATOR stage. One file with records having count 1 and the other file with records having count more than 1.
In the properties tab of AGGREGATOR stage select the Aggregation type as Count Rows and Count Output Column as Count . Next by using transformer you can devide the file as desired.
Condition used in the transformer DSLink.Count 1 for the Non-Duplicate rows. Rest of the records will be duplicates.
By Using Transformer the data from source sequential file is segregate in to two links (Link A and Link B) the link A is followed by Aggregator the Aggregator type isset to be count rows and count output column name is XXX then perform left outer join with the Link B and link from aggregator after that by using transformer we segregate the data as two by using constraints as XXX 1 and XXX >1 so we get out put as 1122 and 34536