Sequential file with Duplicate Records

A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6

In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How will you do it

Questions by rajivkumar23us   answers by rajivkumar23us

Showing Answers 1 - 14 of 14 Answers


  • Apr 16th, 2009

We can segregate the file into two files by using AGGREGATOR stage. One file with records having count 1 and the other file with records having count more than 1. 

In the properties tab of AGGREGATOR stage select the Aggregation type as  "Count Rows"  and Count Output Column as "Count". Next by using transformer you can devide the file as desired.

Condition used in the transformer DSLink.Count=1 for the Non-Duplicate rows. Rest of the records will be duplicates.


  • Aug 20th, 2009

By Using Transformer the data from source sequential file is segregate in to two links (Link A and Link B) the link A is followed by Aggregator, the Aggregator type  is  set to be count rows , and count output column name is  XXX, then perform left outer join  with the Link B and link from  aggregator , after that by using transformer we segregate the data as two by using constraints as XXX =1 and XXX >1, so we get out put as 1122 and 34536

  Was this answer useful?  Yes

1) We have an stage called Remove duplicate stage through which we can delete the duplicate records.

2) Use the aggregator stage and specify the particular column on which you want to delete the duplicates

  Was this answer useful?  Yes


  • Apr 27th, 2010

First count the length of the record is 15,

Link the source to transformer and ouput links for transfomer have  two sequential file.

Double click the transfomer stage link the source to target in first sequential file that column name is file1 and same thing link source to target for second sequntial file then column name is file2, now double click the derivation of file1 column specify the syntax of  linkname.source columnname(8,1),and double click derivation of  file2 column specify the syntax of  linkname.source columnname(15,8)

Derivation                            Column
Lo3.data_Line(9,6)            currency
record length  from 6 to  9 value is currency

  Was this answer useful?  Yes


we are having the data

By using sort we can send the duplicates into one link and non-duplicates into another link.
In sort by using keychange column we can identify the duplicates .By using the code0,1
After that we use filter stage to divide the duplicates & non-duplicates by key change column.

vinod uppuuturi

  • Jul 28th, 2011


First of all take a source file then connect it to copy stage. Then, 1 link is connected to the aggregator stage and another link is connected to the lookup stage or join stage. In Aggregator stage using the count function, Calculate how many times the values are repeating in the key column.

After calculating that it is connected to the filter stage where we filter the cnt=1(cnt is new column for repeating rows).
Then the o/p from the filter is connected to the lookup stage as reference. In the lookup stage LOOKUP FAILURE=REJECT.

Then place two output links for the lookup, One collects the non-repeated values and another collects the repeated values in reject link..

  Was this answer useful?  Yes


  • Feb 14th, 2012

Hi friends,
I think first of all seq file stage doesnt allow duplicate records

  Was this answer useful?  Yes

Hemant Kanthed

  • Feb 19th, 2012

After source sequential we can use sort stage with dump_key in which 0 is assigned to duplicate record and 1 is assigned to non duplicate record after sort stage we can use transformer stage in which we can use constraint as if dump_key = 0 then records in Seq_Duplicate stage and if dump_key = 1 then records in furthers stage as we required

  Was this answer useful?  Yes


  • Dec 5th, 2012

Its very simple:
1. Introduce a sort stage very next to sequential file,
2. Select a property (key change column) in sort stage and you can assign 0-Unique or 1- duplicate or viceversa as you wish.
3. Put a filter or transformer next to it and now you have unique in 1 link and duplicates in other link.
Hope this suits your question.

  Was this answer useful?  Yes


  • Jun 30th, 2016

This will not give the correct output
eg if the file contains
then the output will be :
col keychange
1 1
1 0
1 0
2 1
2 0
3 1
4 1
5 1
so basically it just remove the duplicates from the file.

  Was this answer useful?  Yes

Pooja Trivedi

  • Jun 30th, 2016

This will not give the desired output as the we want the duplicate records also n number of times where n is the number of record present in the file.

  Was this answer useful?  Yes


  • Jul 1st, 2016

Hi Pooja,
Its absolutely possible..
Src --> Copy(linksort) ---> Aggr(count rows)
another link from copy -----------------> Join (Copy & aggr) ---> Filter(count=1 for trg1 and count>1 for trg2 ) --->trg1
(Count>1) ---> trg2

Thank you !

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.


Related Answered Questions


Related Open Questions