Which partition we have to use for Aggregate Stage in parallel jobs ?

izack
Profile Answers by izack Questions by izack
Jan 25th, 2007
12
11474

Showing Answers 1 - 12 of 12 Answers

srinivasguptha
Profile Answers by srinivasguptha

Jan 28th, 2007

By default this stage allows Auto mode of partitioning. The best partitioning is based on the operating mode of this stage and preceding stage. If the aggregator is operating in sequential mode, it will first collect the data and before writing it to the file using the default Auto collection method. If the aggregator is in parallel mode then we can put any type of partitioning in the drop down list of partitioning tab. Generally auto or hash can be used.

Thanks

Srinivas

harishsj
Profile Answers by harishsj Questions by harishsj

Aug 9th, 2007

I think the above answer is a little misleading. Most of the time you'll be using aggr. stage in parallel mode. Now if you use the auto partioning mode, it doesnt indicate that the key columns that you are grouping on will lie in the same partition. Thus the result will not be useful for this aggregation.

1) Identify the grouping keys you want to aggregate on.
2) In a stage prior to aggr. , Do a hash partition on the grouping keys. This will ensure that all the similiar group keys lie in a particular partition.
3) Now the result of partition will be appropriate.
4) I even think the entire partition method can be usefull, But it will be slightly higher overhead as compared to hash partitioning.

Hope that helps....

Thanks
Harish

manoharkolukula
Profile Answers by manoharkolukula Questions by manoharkolukula

Feb 12th, 2008

same as harish

swapnilverma
Profile Answers by swapnilverma

Feb 19th, 2008

Its always preferable & appropriate that we must use a sort stage beore aggregate stage.
Hence based on the aggregate logic we should sort the incoming data by using hash partintion on keys.

Then we can use same partition on Aggregate stage.

This is most commonly used.

yassine

Jul 12th, 2017

Hello Harish I would like to ask you a question How I can choose the appropriate partition for each stage and job how can I analyse situation
thank you

Anjaneyulu Pagadala

Mar 15th, 2018

Hash partitioning and in link sorting on grouping keys give better performance and correct results if it is in parallel mode and Auto partition will give correct results if there is no sorting happened only one of the keys we are grouping in previous stage

Which partition we have to use for Aggregate Stage in parallel jobs ?

srinivasguptha
Profile Answers by srinivasguptha

harishsj
Profile Answers by harishsj Questions by harishsj

manoharkolukula
Profile Answers by manoharkolukula Questions by manoharkolukula

swapnilverma
Profile Answers by swapnilverma

yassine

Anjaneyulu Pagadala

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Login

Which partition we have to use for Aggregate Stage in parallel jobs ?

srinivasguptha Profile Answers by srinivasguptha

harishsj Profile Answers by harishsj Questions by harishsj

manoharkolukula Profile Answers by manoharkolukula Questions by manoharkolukula

swapnilverma Profile Answers by swapnilverma

yassine

Anjaneyulu Pagadala

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

srinivasguptha
Profile Answers by srinivasguptha

harishsj
Profile Answers by harishsj Questions by harishsj

manoharkolukula
Profile Answers by manoharkolukula Questions by manoharkolukula

swapnilverma
Profile Answers by swapnilverma