Duplicate Record in Datastage

I have one scenario below, how can we achieve.
City1,City2,Distance
==============
blr, pune,1000 km
pune, blr, 1000 km
As we have same data in 2 records but need to delete any one of the duplicate record.

Interview Candidate
Mar 17th, 2017
16
12771

DataStage

Answer

Showing Answers 1 - 16 of 16 Answers

Ranajoy Gupta

Mar 20th, 2017

Use sort stage, put city1, city2 and distance as key columns, set allow duplicates as false.

Karan

May 4th, 2017

Use Remove duplicate Stage..And Select Distance as Key coloum.. one record will be deleted.

Naresh

May 11th, 2017

By using the sort stage we cant remove the duplicates. It sorts the values by mentioned columns and it wont delete the records such scenario

Bosemani

Jun 13th, 2017

Hi Priya,

In Transformer stage,

Right(City1,1)

Prashanth

Jun 13th, 2017

1. Create 2 Stage variables in transformer stage as NewCity1 & NewCity2
2. Give value for NewCity1 = if City1 < City2 then City1 else City2 end
3. Give value for NewCity2 = if City1 < City2 then City2 else City1 end
4. Use remove duplicates stage and give Key as NewCity1 & NewCity2

siva krishna

Jun 16th, 2017

Hi, Ive solved this as mentioned below.
seq stage (source)----->sort stage ------> transformer stage -----> remove dupli stage ---->seq stage(target)
read the source and add FILENAME COLUMN column and sort the 1st column in SORT STAGE and in TRANSFORMER stage take in StageVariable and write an expression like "if DSLink18.COL1 < DSLink18.COL2 then DSLink18.COL1 : DSLink18.COL2 else DSLink18.COL1 : DSLink18.COL2".
and assign this sv to a new column COl4 and in REMOVEDUPLCATE stage use COL4.
thats it. irreseptive of no of records this job will run.
thanks.
siva

Sirisha d
Profile Answers by Sirisha d

Aug 8th, 2017

Job Design :
Sequential file(I/p) --> Transformer --> Sequential file (o/p)
In the Transformer stage , define 2 stage varibles st1 and st2.
Derivation for st1 => If city2:city1=st2 Then 1 Else 0
Derivation for st2 => city1:city2
Now, in the transformer give a constraint => st1=0.
This should remove duplicate records in this scenario.

kalyanreddy

Jan 30th, 2024

use the remove duplicate stage and use the hash partition, select the key column as 3 rd column (1000 km).
and select record retained = first , it will remove the duplicate record, you can also use sort stage for this.

Duplicate Record in Datastage

Ranajoy Gupta

Karan

Naresh

Bosemani

Prashanth

siva krishna

Sirisha d
Profile Answers by Sirisha d

kalyanreddy

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Login

Duplicate Record in Datastage

Ranajoy Gupta

Karan

Naresh

Bosemani

Prashanth

siva krishna

Sirisha d Profile Answers by Sirisha d

kalyanreddy

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Sirisha d
Profile Answers by Sirisha d