How to Improve Performance of graphs in Ab initio? Give some examples or tips.thanks,

Questions by windows   answers by windows

Showing Answers 1 - 9 of 9 Answers


  • Jun 13th, 2006

There are somany ways to improve the performance of the graphs in Abinitio.

I have few points from my side.

1.Use MFS system using Partion by Round by robin.

2.If needed use lookup local than lookup when there is a large data.

3.Takeout unnecessary components like filter by exp instead provide them in reformat/Join/Rollup.

4.Use gather instead of concatenate.

5.Tune Max_core for Optional performance.

6.Try to avoid more phases.

  Was this answer useful?  Yes

satish kumar

  • Jun 22nd, 2006


to improve the perfomance of the graph,

  1. Go Parallel as soon as possible using Ab Initio Partitioning technique.
  2. Once Data Is partitioned do not bring to serial , then back to parallel. Repartition instead.
  3. For Small processing jobs serial may be better than parallel.
  4. Do not access large files across NFS, Use FTP component
  5. Use Ad Hoc MFS to read many serial files in parallel and use concat coponenet.


1.      Using Phase breaks let you allocate more memory to individual component and make your graph run faster

2.      Use Checkpoint after the sort than land data on to disk

3.      Use Join and rollup in-memory feature

4.      Best performance will be gained when components can work with in memory by MAX-CORE.

5.      MAR-CORE for SORT  is calculated by finding size of input data file.

6.      For In-memory join memory needed is equal to non-driving data size + overhead.

7.      If in-memory join cannot fir its non-driving inputs in the provided MAX-CORE then it will drop all the inputs to disk and in-memory does not make sence.

8.      Use rollup and Filter by EX as soon as possible to reduce number of records.

9.      When joining very small dataset to a very large dataset, it is more efficient to broadcast the small dataset to MFS using broadcast component or use the small file as lookup.




satish kumar

  Was this answer useful?  Yes


  • Jul 5th, 2006

1. Use MFS, use Round robin partition or load balance if you are not joining or rollup

2. Filter the data in the beginning of the graph.

3.Take out unnecessary components like filter by expression instead use select expression in join, rollup, reformat etc

4. Use lookups instead of joins if you are joining small tale to large table.

5. Take out old components use new components like join instead of math merge .

6. Use gather instead of  concat

7. Use Phasing if you have too many components

8. Tune the max core for optimal performance

9.Avoid sorting data by using in memory for smaller datasets join

10.Use Ab Initio layout instead of database default to achieve parallel loads

11. Change AB_REPORT parameter to increased monitoring duration ( )

12. Use catalogs for reusability


  Was this answer useful?  Yes


  • Jul 6th, 2006

The performance can be improved in several ways, I put some of them what I remembered...

1. Use sort after partition component instead of before.

2. Partition the data as early as possible and departition the data as late as possible.

3. Filter unwanted fields/records as early as possible.

4. Try to avoid the usage of join with db component.

  Was this answer useful?  Yes

Niranjan D

  • Jan 20th, 2016

Anybody Can explain #9, in which cases/situations we can use broadcast component for joining the smaller dataset and increase the graph performance.

  Was this answer useful?  Yes


  • Apr 8th, 2016

Avoid using SORT components.
If input data is less in size use component folding concept.
Only use a file as lookup when its size is decent other wise use join component.
Avoid full unload of a table and then use reformat to get the required fields, define only the selected fields in select query in table components.
If two heavy components are being used in a flow put them in different phases.

  Was this answer useful?  Yes


  • Dec 28th, 2017

For point no. 2) lookup_local (Which is now replaced with lookup_first_local) can be used efficiently only when data is available in same partition of parent lookup file else it will give NULL results

  Was this answer useful?  Yes

Ketan Khot

  • Jan 12th, 2018

Go with the profile transform.

Run your graph with profile transform option to track which statement or function is taking more time so you can updated the code accordingly.

Go with graph tracking option (Control+F2) to analysis the CPU utilization.

Understand the data. do not use in memory component unless if it not as much as necessary.

If you are using a rollup component in your graph then first understand behavior of the key (if key has unique behavior go with sorted input and if key has many duplicate then you can may go with in memory rollup)

Go with dynamic lookup and keep load_once true so for each record lookup will not load again and again

Ketan Khot

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.


Related Answered Questions


Related Open Questions