How to remove reverse duplicates in datastage ?

Objective: The interface aims to consolidate the round-trip ticket cost of passengers.
Description: The source file is a .txt file, train_route_src.txt, consolidated with trains from a particular source to destination. There are many repetitions in the source file (Reverse Duplication: eg. BLR --> BBS and BBS --> BLR). Remove the reverse duplicates and maintain the target in the following manner.
SRC-DES-SRC Distance Fare
SRC1-DES1-SRC1 2*Distance 2*fare
Source File:
SBC BBS 1500 1500
GWH PUN 1750 1800
PUR CTC 90 90
FAR GHZ 750 750
BBS SBC 1500 1500
PUN GWH 1750 1800
CTC PUR 90 90
SIM OOT 500 500
GHZ FAR 750 750
OOT SIM 500 500
BRD REI 900 900
SRI CHN 4500 4500
REI BRD 900 900
GHN MNG 1650 1650
HWH BBS 550 550
MNG GHN 1650 1650
BBS HWH 550 550
CHN SRI 4500 4500
BBS GHN 1800 1800
BBS OOT 1500 1500
BBS DEL 2200 2200
OOT BBS 1500 1500
DEL BBS 2200 2200

Showing Answers 1 - 9 of 9 Answers

D Hemakumar

  • Apr 24th, 2015

generate the two columns col1 and col2 and generate the sequence numbers for both columns in transformer write constraint col1>col2
take the source destination and distancefare columns in the output link

i hope this may work

  Was this answer useful?  Yes

Pavan Kumar

  • May 7th, 2015

Source >> Pivot enterprise {(horizontal pivot) combine both source and destination} >> remove duplicate stage >> Pivot enterprise stage {(vertical pivote) group by fare and set array size to 2, this will split the data into two columns (source and destination column)} >> Dataset/ Fileset.

  Was this answer useful?  Yes


  • Jul 15th, 2015

source -> Sort -> filter > output
We can have sort stage and set the property create key change column give it a name "KeyChange".Map this column to output link.
Put a filter stage after sort satge and write a where condition KeyChange =1 and set output Reject =False
Hope this will work

  Was this answer useful?  Yes

  • Aug 10th, 2015

See if this works


  2. SELECT a.Source1,a.destination,b.destination, 2*a.distance,2*a.fare

  3. FROM train A, train B

  4. WHERE (A.Source1=B.destination

  5. AND A.destination=B.Source1)

  Was this answer useful?  Yes


  • Jun 21st, 2016

ANS) seq---->Pivot---->RD----->Pivot----->DS
first take PIVOT Horizontal -->pivot properties -->city1,city2(derivation)
In RD we can give combination of key =city & key=distance
In Pivot Vertical select Group by Distance & Pivot city.
we will get out put
i tried this its me working fine..

  Was this answer useful?  Yes

Naga Prasad

  • Jul 7th, 2016

I Think Sreenivass soultion will work.

  Was this answer useful?  Yes

Kumar Sonal

  • Jul 8th, 2016

1.Take another column called KEY in transformer. Derivation for key will be like If Source
2. Use sort stage to remove duplicate values on the basis of key column derived in transformer

  Was this answer useful?  Yes


  • Feb 15th, 2018

Ans--> Seq_File------>Sort------>Filter------->Seq_file
Sort Stage--->Generate key change column(it return for first value 1 and for duplicate 0) then used filter stahe
Filter Stage----> Filter by Key change column=1.
I think you will get the result.

  Was this answer useful?  Yes


  • Mar 2nd, 2018

I don't think you got the output for city in the format of SRC1-DES1-SRC1 (I mean two city names)? I tired your way but didn't get the expected output.

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.


Related Answered Questions


Related Open Questions