Sequential file with Duplicate Records

A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6

In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How will you do it

Questions by rajivkumar23us   answers by rajivkumar23us

Showing Answers 1 - 38 of 38 Answers

khasimsyda

  • Apr 16th, 2009
 

We can segregate the file into two files by using AGGREGATOR stage. One file with records having count 1 and the other file with records having count more than 1. 

In the properties tab of AGGREGATOR stage select the Aggregation type as  "Count Rows"  and Count Output Column as "Count". Next by using transformer you can devide the file as desired.

Condition used in the transformer DSLink.Count=1 for the Non-Duplicate rows. Rest of the records will be duplicates.

rameshkm

  • Aug 20th, 2009
 

By Using Transformer the data from source sequential file is segregate in to two links (Link A and Link B) the link A is followed by Aggregator, the Aggregator type  is  set to be count rows , and count output column name is  XXX, then perform left outer join  with the Link B and link from  aggregator , after that by using transformer we segregate the data as two by using constraints as XXX =1 and XXX >1, so we get out put as 1122 and 34536

  Was this answer useful?  Yes

1) We have an stage called Remove duplicate stage through which we can delete the duplicate records.

2) Use the aggregator stage and specify the particular column on which you want to delete the duplicates

  Was this answer useful?  Yes

winslong

  • Apr 27th, 2010
 

First count the length of the record is 15,

Link the source to transformer and ouput links for transfomer have  two sequential file.

Double click the transfomer stage link the source to target in first sequential file that column name is file1 and same thing link source to target for second sequntial file then column name is file2, now double click the derivation of file1 column specify the syntax of  linkname.source columnname(8,1),and double click derivation of  file2 column specify the syntax of  linkname.source columnname(15,8)

Example,
Derivation                            Column
Lo3.data_Line(9,6)            currency
record length  from 6 to  9 value is currency

  Was this answer useful?  Yes

Hi,

we are having the data
1122
345

By using sort we can send the duplicates into one link and non-duplicates into another link.
In sort by using keychange column we can identify the duplicates .By using the code0,1
After that we use filter stage to divide the duplicates & non-duplicates by key change column.

vinod uppuuturi

  • Jul 28th, 2011
 

Hello,

First of all take a source file then connect it to copy stage. Then, 1 link is connected to the aggregator stage and another link is connected to the lookup stage or join stage. In Aggregator stage using the count function, Calculate how many times the values are repeating in the key column.

After calculating that it is connected to the filter stage where we filter the cnt=1(cnt is new column for repeating rows).
Then the o/p from the filter is connected to the lookup stage as reference. In the lookup stage LOOKUP FAILURE=REJECT.

Then place two output links for the lookup, One collects the non-repeated values and another collects the repeated values in reject link..

  Was this answer useful?  Yes

sudheer

  • Feb 14th, 2012
 

Hi friends,
I think first of all seq file stage doesnt allow duplicate records

  Was this answer useful?  Yes

Hemant Kanthed

  • Feb 19th, 2012
 

After source sequential we can use sort stage with dump_key in which 0 is assigned to duplicate record and 1 is assigned to non duplicate record after sort stage we can use transformer stage in which we can use constraint as if dump_key = 0 then records in Seq_Duplicate stage and if dump_key = 1 then records in furthers stage as we required

  Was this answer useful?  Yes

hussy

  • Dec 5th, 2012
 

Its very simple:
1. Introduce a sort stage very next to sequential file,
2. Select a property (key change column) in sort stage and you can assign 0-Unique or 1- duplicate or viceversa as you wish.
3. Put a filter or transformer next to it and now you have unique in 1 link and duplicates in other link.
Hope this suits your question.

  Was this answer useful?  Yes

pooja

  • Jun 30th, 2016
 

This will not give the correct output
eg if the file contains
col
1
1
1
2
2
3
4
5
then the output will be :
col keychange
1 1
1 0
1 0
2 1
2 0
3 1
4 1
5 1
so basically it just remove the duplicates from the file.

  Was this answer useful?  Yes

Pooja Trivedi

  • Jun 30th, 2016
 

This will not give the desired output as the we want the duplicate records also n number of times where n is the number of record present in the file.

  Was this answer useful?  Yes

Ram

  • Jul 1st, 2016
 

Hi Pooja,
Its absolutely possible..
Src --> Copy(linksort) ---> Aggr(count rows)
another link from copy -----------------> Join (Copy & aggr) ---> Filter(count=1 for trg1 and count>1 for trg2 ) --->trg1
(Count>1) ---> trg2

Thank you !

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions