Join Stratagies

There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes it

Now the Question is: By that plan is the optimizer is correct job or not ? Justify Ans

2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it,

Now is the optimizer is doing best? and How you avoid this situation

Questions by satishlive

Showing Answers 1 - 11 of 11 Answers

Teradata is smart enough to decide when to redistribute and when to copy....
It compares the tables. Are they comparable? or one is big as compared to the other?
Based on simple logic it decides whether to distribute the smaller table on all the AMPs or to copy....what I mean is the small table is copied into all the AMPs in the SPOOL space...Remember all always the JOINs has to take place on the AMPs SPOOL Space...By redistributing it is making sure that the 100 million rows table gets the feeling that it is  making AMP local JOIN...

Remember the basic thing what ever Teradata does...it does keeping in consideration for Space and Performance and not to forget the Effiiciency...

My simple formula: If the table is small redistribute them to all the AMPs to have the AMP local Join. Always JOINs are made AMP local if it cannot then you have the high chance of running out of SPOOL space.

  Was this answer useful?  Yes

senthang

  • Jun 22nd, 2011
 

For your case 1:

Ideally the smaller table has to be redistributed or duplicated and product join. That's the way optimizer was built to perform, this is proper.

For your case 2:

In this case optimizer assumption was wrong, so we can help the optimizer for a better plan by...
1) Collecting stats
2) avoiding any functions used in the joining columns like cast, coalesce, substring etc

  Was this answer useful?  Yes

Rafi

  • Jul 14th, 2011
 

Hi friend
The table A has 10 million rows and table B has 100 million rows
the optimizer take A table to join with B its Based on the join conditions .First you choose which join condition u can apply
inner -takes optimizer should comparing each equality rows
outer-its more complex to compare inner join ,so optimizer is take too time to comparing
First you choose join condition ,based upon join condition the optimizer change execution plan

  Was this answer useful?  Yes

  • Sep 18th, 2012
 

There are 2 tables, table A with 10 million records, table B has 100 million records, now we are joining both tables, when we seen Explain Plan the plan showing TD will took the table A and it will redistributes it.
Now the Question is: By that plan is the optimizer is correct job or not ? Justify Ans

There are 3 types of joining strategies:
1- duplicate all rows of one table onto every AMP (The duplication of all rows is done when the non-PI column is on a small table),
2- redistribute the rows of one table by hashing the non-PI join column and sending them to the AMP containing the matching PI row,
3- redistribute both tables by hashed join column value

Now, in this case if the table A joining column is Non - PI then, TD redistributes the rows by hashing the non-PI join column and sending them to the AMP containing the matching PI row of Table B.


2. From the same above example now the optimizer is taking Table B (100 million records) and it is distributing it,

In this it might happen that the Joining column of Table B is Non - PI and that of Table A is PI, due to which it is trying to redistribute.

  Was this answer useful?  Yes

yuvaevergreen

  • May 22nd, 2013
 

Merge join strategies are

1. Big and small table joins - duplicate small tables
2. Column used in join is PI for one table - redistribute the other table in spool.
3. Column used in join is non PI for both tables - redistribute both tables in spool.

In the above case, table A has got 10 million and table B has got 100 million rows. Since table a is 1/10th of table b, TD redistributes table a on all amps.
In the second scenario, redistribution of table b by TD is highly unlikely, because TD optimizer perfectly chooses the join strategy. IF it chooses this scenario, then the query has to be tuned.

Try to use a nested join, which would have better performance.

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions