GeekInterview.com
   Home |  Tech FAQ  |   Interview Questions |  Placement Papers |  Tech Articles |  Learn |  Freelance Projects |  Online Testing |  Geeks Talk |  Job Postings |  Knowledge Base | Site Search |  Add/Ask Question

  GeekInterview.com  >  Interview Questions  >  Data Warehousing  >  Abinitio

 Print  |  
Question:  Join on partitioned flow

Answer: If I have 2 files containing field file1(A,B,C) and file2(A,B,D), if we partition both the files on key A using partition by key and pass the output to join component, if the join key is (A,B) will it join or not and WHY?


July 07, 2009 15:16:29 #7
 vss34   Member Since: July 2009    Total Comments: 1 

RE: Join on partitioned flow
 
The partition key and join key do NOT have to be the exact same.  In order to join properly, you just have to make sure the records being compared are in the same partition. 

So if the partition key is broader than the join key (which it is in this case since the partition key is just field A, and the join key is A and B), then the join will work fine as long as you sort the data after the partition or make it an in-memory join.  For example, all records on both datasets with a value of 1 for field A will be placed in the same partition regardless of the value of field B.  So then values for field A,B as (1,X) where X is any value on both datasets will join up correctly since they will be in the same partition.

If the partition key is narrower than the join key (for example, the partition key is A and B, and the join key is just A), then the join will most likely not work correctly since you cannot guarantee the hashing algorithm of partition by key will place the proper records in the same partition.
     

 

Back To Question