RE: Which partition we have to used for Aggregate Stag...
By default this stage allows Auto mode of partitioning. The best partitioning is based on the operating mode of this stage and preceding stage. If the aggregator is operating in sequential mode it will first collect the data and before writing it to the file using the default Auto collection method. If the aggregator is in parallel mode then we can put any type of partitioning in the drop down list of partitioning tab. Generally auto or hash can be used.
RE: Which partition we have to used for Aggregate Stag...
I think the above answer is a little misleading. Most of the time you'll be using aggr. stage in parallel mode. Now if you use the auto partioning mode it doesnt indicate that the key columns that you are grouping on will lie in the same partition. Thus the result will not be useful for this aggregation.
1) Identify the grouping keys you want to aggregate on. 2) In a stage prior to aggr. Do a hash partition on the grouping keys. This will ensure that all the similiar group keys lie in a particular partition. 3) Now the result of partition will be appropriate. 4) I even think the entire partition method can be usefull But it will be slightly higher overhead as compared to hash partitioning.
RE: Which partition we have to use for Aggregate Stage in parallel jobs ?
Its always preferable & appropriate that we must use a sort stage beore aggregate stage. Hence based on the aggregate logic we should sort the incoming data by using hash partintion on keys.
Then we can use same partition on Aggregate stage.