Sequential file with Duplicate Records

A sequential file has 8 records with one column, below are the values in the column separated by space,
1 1 2 2 3 4 5 6

In a parallel job after reading the sequential file 2 more sequential files should be created, one with duplicate records and the other without duplicates.
File 1 records separated by space: 1 1 2 2
File 2 records separated by space: 3 4 5 6
How will you do it

rajivkumar23us
Profile Answers by rajivkumar23us Questions by rajivkumar23us
Mar 13th, 2009
26
20339

Questions by rajivkumar23us answers by rajivkumar23us

DataStage

Answer

Showing Answers 1 - 26 of 26 Answers

khasimsyda
Profile Answers by khasimsyda

Apr 16th, 2009

We can segregate the file into two files by using AGGREGATOR stage. One file with records having count 1 and the other file with records having count more than 1.

In the properties tab of AGGREGATOR stage select the Aggregation type as "Count Rows" and Count Output Column as "Count". Next by using transformer you can devide the file as desired.

Condition used in the transformer DSLink.Count=1 for the Non-Duplicate rows. Rest of the records will be duplicates.

rameshkm
Profile Answers by rameshkm

Aug 20th, 2009

By using aggregator we can obtain this but the out put is like 1 2 and 3456

rameshkm
Profile Answers by rameshkm

Aug 20th, 2009

By Using Transformer the data from source sequential file is segregate in to two links (Link A and Link B) the link A is followed by Aggregator, the Aggregator type is set to be count rows , and count output column name is XXX, then perform left outer join with the Link B and link from aggregator , after that by using transformer we segregate the data as two by using constraints as XXX =1 and XXX >1, so we get out put as 1122 and 34536

nagoosk
Profile Answers by nagoosk Questions by nagoosk

Oct 22nd, 2009

1) We have an stage called Remove duplicate stage through which we can delete the duplicate records.

2) Use the aggregator stage and specify the particular column on which you want to delete the duplicates

winslong
Profile Answers by winslong

Apr 27th, 2010

First count the length of the record is 15,
Link the source to transformer and ouput links for transfomer have  two sequential file.
Double click the transfomer stage link the source to target in first sequential file that column name is file1 and same thing link source to target for second sequntial file then column name is file2, now double click the derivation of file1 column specify the syntax of linkname.source columnname(8,1),and double click derivation of file2 column specify the syntax of linkname.source columnname(15,8)

Example,
Derivation                            Column
Lo3.data_Line(9,6)            currency
record length from 6 to 9 value is currency

narra satish
Profile Answers by narra satish

Mar 21st, 2011

By using Merge, Lookup, transformer also we get the duplicate records into one link.

narra satish
Profile Answers by narra satish

Mar 28th, 2011

Hi,

we are having the data
1122
345

By using sort we can send the duplicates into one link and non-duplicates into another link.
In sort by using keychange column we can identify the duplicates .By using the code0,1
After that we use filter stage to divide the duplicates & non-duplicates by key change column.

vinod uppuuturi

Jul 28th, 2011

Hello,

First of all take a source file then connect it to copy stage. Then, 1 link is connected to the aggregator stage and another link is connected to the lookup stage or join stage. In Aggregator stage using the count function, Calculate how many times the values are repeating in the key column.

After calculating that it is connected to the filter stage where we filter the cnt=1(cnt is new column for repeating rows).
Then the o/p from the filter is connected to the lookup stage as reference. In the lookup stage LOOKUP FAILURE=REJECT.

Then place two output links for the lookup, One collects the non-repeated values and another collects the repeated values in reject link..

sudheer

Feb 14th, 2012

Hi friends,
I think first of all seq file stage doesnt allow duplicate records

Hemant Kanthed

Feb 19th, 2012

After source sequential we can use sort stage with dump_key in which 0 is assigned to duplicate record and 1 is assigned to non duplicate record after sort stage we can use transformer stage in which we can use constraint as if dump_key = 0 then records in Seq_Duplicate stage and if dump_key = 1 then records in furthers stage as we required

hussy

Dec 5th, 2012

Its very simple:
1. Introduce a sort stage very next to sequential file,
2. Select a property (key change column) in sort stage and you can assign 0-Unique or 1- duplicate or viceversa as you wish.
3. Put a filter or transformer next to it and now you have unique in 1 link and duplicates in other link.
Hope this suits your question.

pooja

Jun 30th, 2016

This will not give the correct output
eg if the file contains
col
1
1
1
2
2
3
4
5
then the output will be :
col keychange
1 1
1 0
1 0
2 1
2 0
3 1
4 1
5 1
so basically it just remove the duplicates from the file.

Pooja Trivedi

Jun 30th, 2016

This will not give the desired output as the we want the duplicate records also n number of times where n is the number of record present in the file.

Ram

Jul 1st, 2016

Hi Pooja,
Its absolutely possible..
Src --> Copy(linksort) ---> Aggr(count rows)
another link from copy -----------------> Join (Copy & aggr) ---> Filter(count=1 for trg1 and count>1 for trg2 ) --->trg1
(Count>1) ---> trg2

Thank you !

Sequential file with Duplicate Records

khasimsyda
Profile Answers by khasimsyda

rameshkm
Profile Answers by rameshkm

rameshkm
Profile Answers by rameshkm

nagoosk
Profile Answers by nagoosk Questions by nagoosk

winslong
Profile Answers by winslong

narra satish
Profile Answers by narra satish

narra satish
Profile Answers by narra satish

vinod uppuuturi

sudheer

Hemant Kanthed

hussy

pooja

Pooja Trivedi

Ram

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Login

Sequential file with Duplicate Records

khasimsyda Profile Answers by khasimsyda

rameshkm Profile Answers by rameshkm

rameshkm Profile Answers by rameshkm

nagoosk Profile Answers by nagoosk Questions by nagoosk

winslong Profile Answers by winslong

narra satish Profile Answers by narra satish

narra satish Profile Answers by narra satish

vinod uppuuturi

sudheer

Hemant Kanthed

hussy

pooja

Pooja Trivedi

Ram

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

khasimsyda
Profile Answers by khasimsyda

rameshkm
Profile Answers by rameshkm

rameshkm
Profile Answers by rameshkm

nagoosk
Profile Answers by nagoosk Questions by nagoosk

winslong
Profile Answers by winslong

narra satish
Profile Answers by narra satish

narra satish
Profile Answers by narra satish