Join on partitioned flow

If I have 2 files containing field file1(A,B,C) and file2(A,B,D), if we partition both the files on key A using partition by key and pass the output to join component, if the join key is (A,B) will it join or not and WHY?

abinitio17
Profile Answers by abinitio17 Questions by abinitio17
May 24th, 2008
24
15376

Questions by abinitio17

Abinitio

Answer

Showing Answers 1 - 24 of 24 Answers

Puneet123
Profile Answers by Puneet123

Jul 1st, 2008

Partition component divides the data into different partitions depending upon the key. Join component expect data to be in a ordered flow if "Input must be sorted" is checked.
In this case join will not going fail but it will not give the correct output.

sixto.dsilva
Profile Answers by sixto.dsilva Questions by sixto.dsilva

Jul 24th, 2008

Key is always important in Join component else you many not get the desired result.In abinitio everything is key based if the key is wrong everything can go wrong but the graph will run successfully. sometime you many not get the result atall.

srinivas.rao.etl
Profile Answers by srinivas.rao.etl

Aug 13th, 2008

.dbc : Database Connectivity - In input table specify db version, host location, user name, password etc.

.cfg : Server Connectivity

Nayak_AbIntio
Profile Answers by Nayak_AbIntio

Jan 16th, 2009

I believe "Join component expect data to be in a ordered flow if you select " Input must be sorted" as checked so that the input to JOIN will be a ordered set of data.
Then I believe the join results would be as expected.

Anyone pls comment if thinks with this the expected output wont be there and if so why?

anujaja
Profile Answers by anujaja

Jan 27th, 2009

Yes you can join and the can get the desired result

Subhra Dhar
Profile Answers by Subhra Dhar

Mar 25th, 2009

I do not think the join output would be correct. The partition key fields for the two input streams should be same as the join key fields in the join component, otherwise the data from stream 1 would be partitioned in a different way than data from stream 2 and won't find all matches in the join component.

vss34
Profile Answers by vss34

Jul 16th, 2009

The partition key and join key do NOT have to be the exact same. In order to join properly, you just have to make sure the records being compared are in the same partition.

So if the partition key is broader than the join key (which it is in this case since the partition key is just field A, and the join key is A and B), then the join will work fine as long as you sort the data after the partition or make it an in-memory join. For example, all records on both datasets with a value of 1 for field A will be placed in the same partition regardless of the value of field B. So then values for field A,B as (1,X) where X is any value on both datasets will join up correctly since they will be in the same partition.

If the partition key is narrower than the join key (for example, the partition key is A and B, and the join key is just A), then the join will most likely not work correctly since you cannot guarantee the hashing algorithm of partition by key will place the proper records in the same partition.

Abhishek

Feb 5th, 2013

Yes, this is going to work fine provided u do it as in-memory. Let me explain why, firstly whenever you are using the field A as a key, for the same data in the both the files, would definitely go into the same partition. For example say the values in my key filed is 2,3,4 in both the files. Now, say by hash value calculation the first 2,3 goes to partition 1 and 3 goes to partition 2, then as we know that the WHOLE RECORD would be available in that particular partition, join would be working just fine.

If the Partition was done by A,B keys, then the performance of the graph would have been better.

This scenario would have failed if you would have given B,A as key in join instead of A,B.

Hope this helps.

Join on partitioned flow

Puneet123
Profile Answers by Puneet123

sixto.dsilva
Profile Answers by sixto.dsilva Questions by sixto.dsilva

srinivas.rao.etl
Profile Answers by srinivas.rao.etl

Nayak_AbIntio
Profile Answers by Nayak_AbIntio

anujaja
Profile Answers by anujaja

Subhra Dhar
Profile Answers by Subhra Dhar

vss34
Profile Answers by vss34

Abhishek

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Login

Join on partitioned flow

Puneet123 Profile Answers by Puneet123

sixto.dsilva Profile Answers by sixto.dsilva Questions by sixto.dsilva

srinivas.rao.etl Profile Answers by srinivas.rao.etl

Nayak_AbIntio Profile Answers by Nayak_AbIntio

anujaja Profile Answers by anujaja

Subhra Dhar Profile Answers by Subhra Dhar

vss34 Profile Answers by vss34

Abhishek

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Puneet123
Profile Answers by Puneet123

sixto.dsilva
Profile Answers by sixto.dsilva Questions by sixto.dsilva

srinivas.rao.etl
Profile Answers by srinivas.rao.etl

Nayak_AbIntio
Profile Answers by Nayak_AbIntio

anujaja
Profile Answers by anujaja

Subhra Dhar
Profile Answers by Subhra Dhar

vss34
Profile Answers by vss34