GeekInterview.com
   Home |  Tech FAQ  |   Interview Questions |  Placement Papers |  Tech Articles |  Learn |  Freelance Projects |  Online Testing |  Geeks Talk |  Job Postings |  Knowledge Base | Site Search |  Add/Ask Question

GeekInterview.com  >  Tech FAQs  >  DataStage
Go To First  |  Previous Question  |  Next Question 
 DataStage  |  Question 18 of 123    Print  
How can I identify the duplicate rows in a seq or comma delimited file?
the case is...> the source has 4 values like, agent id, agent name, etc... our requirement is that the ID shouldn't be repeated. so how can i identify the duplicate rows , set a flag and send the rejects to the specified reject file? the source systems data is directly given to us. tha's why we are getting these duplicates.if it has a primary key set up already then it would have been very easy.
thanks in advance.

  
Total Answers and Comments: 3 Last Update: July 23, 2008     Asked by: DS beginer 
  
 Sponsored Links

 
 Best Rated Answer

No best answer available. Please pick the good answer available or submit your answer.
January 29, 2007 14:24:28   #1  
Mansoor        

RE: How can I identify the duplicate rows in a seq or ...
Sort the sequential file based on the key AGENT_ID adn set the option "Create Key Change Column" to TRUE in the sort stage. The records which has the duplicate records will be populated with the value 0(Zero) in the KeyChange field. Now reject the records which has the value 0.
 
Is this answer useful? Yes | No
July 03, 2007 01:54:09   #2  
the_xxx Member Since: March 2007   Contribution: 12    

RE: How can I identify the duplicate rows in a seq or ...
Hi, if your working on Server jobs. Sort the data first and use Hash file stage which has the property of eliminating duplicates.
 
Is this answer useful? Yes | No
July 23, 2008 03:21:31   #3  
ds_ng Member Since: July 2008   Contribution: 3    

RE: How can I identify the duplicate rows in a seq or comma delimited file?the case is...> the source has 4 values like, agent id, agent name, etc... our requirement is that the ID shouldn't be repeated. so how can i identify the duplicate rows , set a
If your working server jobs,  use Sort stage then use Aggregator stage. Use property like 'Last' or 'First'. Then duplicated rows will be removed
 
Is this answer useful? Yes | No


 
Go To Top


 Sponsored Links

 




About Us  |   Privacy Policy  |   Terms and Conditions  |   Contact  |   Site Map  |   Add Question  |   Propose Category  |   RSS Feeds  |   Articles Sitemap  |   Site Updates  |   Add Resource

Copyright © 2005 - 2008 GeekInterview.com. All Rights Reserved
Page copy protected against web site content infringement by Copyscape