What is skew and skew measurement?

Questions by sekr   answers by sekr

Showing Answers 1 - 17 of 17 Answers

gour

  • Apr 14th, 2006
 

The sKew of a data partition is the amount by which its size deviates from the average partition size, expressed as a percentage of the largest partition:

PaARTION size - avg partition size

------------------------------------- X 100 %

size of larget partion

  Was this answer useful?  Yes

skew is the mesaureof data flow to each partation .

suppose i/p is comming from 4 files and size is 1 gb

1 gb= ( 100mb+200mb+300mb+5oomb)

1000mb/4= 250 mb

(100- 250 )/500= --> -150/500 == cal ur self it wil come in -ve value.

calclu for 200,500,300.

+ve value of skew is allways desriable.

skew is a indericet measure of graph.

correct me if im wrong

  Was this answer useful?  Yes

IND

  • May 25th, 2006
 

I heard that partition is better (more evenly done) as skew nears 0. You can check the formula for the reason. As the size of the partition is nearer to the average (which makes skew nearer to zero), the skew will be more even. When skew is zero, it is the best partitioned data.

Correct me please if I a wrong.

  Was this answer useful?  Yes

sunny

  • Jul 5th, 2006
 

Hi ind,

         can u please send the formula to calculate the skew.

  Was this answer useful?  Yes

satya

  • Sep 21st, 2011
 

The Skew measure forula is as follows

Measure: ( N – AVERAGE )/ MAX

  Was this answer useful?  Yes

PN Reddy

  • Feb 7th, 2012
 

Skew is the measure of the data flow on the particular partition
Take an example
4 way partitioned
1flow---200recs
2flow---600recs
3flow---400recs
4flow---800recs

Take average = (200+600+400+800)/4 = 500

Skew on 1st flow=(200-500)/800 * 100= -3/8 * 100 =... -ve low skew
...
...
Skew on 4th flow=(800-500)/800 *100 = 3/8 * 100=....+ve more skew

I think this will help you.

Thanks,
PN Reddy

  Was this answer useful?  Yes

vSudheer

  • Mar 2nd, 2012
 

statistically, skew represent distribution of data..
when all partitions share equal amount of data, it is the best use of portioning. This can be achieved by partition-by-roundrobbin or by using equal %s in partion-by-percentage

Please note, mathematically when all partitions get same amount of data standard deviation will be 0, hence skew cant be calculated. But technically you can say it is 0

  Was this answer useful?  Yes

Rajat Singh

  • Feb 20th, 2013
 

Skew of a partition is the amount by which its size deviates from the average partition size

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions