Datastage Performance Tuning

How to improve the Performance Tuning any 5 points

Questions by rameshkk

Showing Answers 1 - 12 of 12 Answers

Raju Nath

  • Jun 27th, 2013
 

1. By query optimization i.e you check your extract and insert query
2. From Server side you can increase the memory
3. By removing unnecessary stages which are not required i.e with the proper job design
4. By using the constrain you can also increase the performance
5. Use parametrized jobs and sequence with the proper parallel/sequential mapping flow.

  Was this answer useful?  Yes

Prabhakar Achyuta

  • Jul 23rd, 2013
 

1. First filter then extract. But dont extract and filter. Use SQL instead of table method when extracting. Say 1 million records are coming from input table but there is a filter condition (Acct_Type=S) in job as per business documents which results only few records say (100).
2. Reduce as many as transformer stages.
3. Reduce stage variables.
4. Remove sort stage and apply the partition techniques at stage level (Ex: for join-hash, lookup-entire).
5. Be careful while operating with Joins. Be specific to inner join untill business needs left outer.

Use Copy stage instead of a Transformer for simple operations like :

•placeholder between stages
•renaming Columns
•dropping Columns
•implicit (default) type Conversions

USe Stage variables wisely.The more teir number, the slower the transformer and the job is. A job should not be overloaded with stages, so split job design into smaller jobs.

Reading or writing data from/to a sequential file is slow and this can be a bottleneck in caseof huge data. So, in such cases, to have faster reading from the Sequential File stage the number of readers per node can be increased (default value is one). Or Read from multiple nodes can be set.

Ensure that RCP is propagating unnecessary metadata to down stream stages.

Have a volumetric analysis done when you introduce Lookup(Normal/Sparse), Join and Merge stage in the design.

Always use CONNECTOR stages for connecting to databases as they are more robust and fast. If not available, use Enterprise stage. Plug-in stages come next.

Examine the execution plans for SQL queries used in jobs and create indices for appropriate columns. Having indices improves performance drastically.

Some times dropping the index and loading the table and re-creating once load is done may be a good option.

  Was this answer useful?  Yes

Lalit

  • Oct 6th, 2017
 

1)Set Proper configuration file
2)Handle null and duplicate values properly
3)Reduce number of transformers as much as possible
4)Do not take more than 20 stages in a perticuler job
5)Use Dataset stage instead of sequential file stage
6)Use join merge and lookup stages properly

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions