How can I identify the duplicate rows in a seq or comma delimited file?the case is...> the source has 4 values like, agent id, agent name, etc... our requirement is that the ID shouldn't be repeated. so how can i identify the duplicate rows , set a flag and send the rejects to the specified reject file? the source systems data is directly given to us. tha's why we are getting these duplicates.if it has a primary key set up already then it would have been very easy.thanks in advance.

Interview Candidate
Apr 16th, 2006
9
2517

Showing Answers 1 - 9 of 9 Answers

Mansoor

Jan 29th, 2007

Sort the sequential file based on the key AGENT_ID adn set the option "Create Key Change Column" to TRUE in the sort stage. The records which has the duplicate records will be populated with the value 0(Zero) in the KeyChange field. Now reject the records which has the value 0.

the_xxx
Profile Answers by the_xxx

Jul 3rd, 2007

Hi, if your working on Server jobs. Sort the data first and use Hash file stage which has the property of eliminating duplicates.

ds_ng
Profile Answers by ds_ng Questions by ds_ng

Jul 23rd, 2008

If your working server jobs, use Sort stage then use Aggregator stage. Use property like 'Last' or 'First'. Then duplicated rows will be removed