Process 1TB Data and get Max age for each gender group

I have 1 TB of record with below format:
CUST_ID
CUST_NAME
GENDER
MOBILE_NO
AGE
I want to fetch max age from each Gender group by only using Reformat component. How to achieve this?

Interview Candidate
Sep 24th, 2016
14
16525

Abinitio

Answer

Showing Answers 1 - 14 of 14 Answers

Shalini Sharma

Oct 5th, 2016

Use Rollup with gender as a key and get max(age) for output in age attribute. Use in-memory sorting in Rollup. The Rollups in-memory requirement is based on its expected output, not its expected input. As we have only two rows of output here using in-memory sorting will give faster results. In case of large output files use sort component before Rollup.

shreya gupta

Oct 22nd, 2016

Hello
Here is your solution step by step:

1) Configure the input file.

2) Take a sort component and sort it according to age (Desc) order. Take a reformat and add another column into the immediate next output as next_in_sequence(). This will add a serial number to your output that has been arranged according to max - min age.

3) Now, the highest age person would be the topmost and the lowest most would be last most record in your table.

4) Apply filter by expression and fetch out the record that has the serial number 1.

5) This is your record with max age.
P.S. You can also achieve this with sort + dedup sort. Let me know if you require that.

Aadi

Dec 7th, 2016

Here is the flow of components.
Input file > partition by round robin (to process 1TB file) > Roll up {key gender} to take max(age) > gather > Roll up {key gender} to take max(age) > output file.
NOTE we cannot use partition by Key.

irfan1patel1
Profile Answers by irfan1patel1

Jun 18th, 2017

Step 1 : use output index in reformat to separate male and female in 2 flows
Step 2 : sort by age in desc.
Step 3 : filter by expression where next_in_sequece is 1

Mahesh

May 26th, 2021

i/p file --> reformat(as asked) - add output_indexes to separate flows --> sort(desc) on age --> FBE where next_in_sequence() == 1 --> concat/gather both output_indexes flows --> output file

Manish

Jun 24th, 2021

Hi Mahesh, will this solution work if the input data file is a mfs file and we are supposed to run this in parallel?

Sohil

Sep 12th, 2021

input->sort with Age->Partition By Roundrobin(as 1 TB of data)->Filter By Expression(GENDER==M)->2 Dedup Sort (with select & Deselect port of FBE out) with Key Blank and Keep First -> Concate -> Gather ->output File

Process 1TB Data and get Max age for each gender group

Shalini Sharma

shreya gupta

Aadi

irfan1patel1
Profile Answers by irfan1patel1

Mahesh

Manish

Sohil

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

Login

Process 1TB Data and get Max age for each gender group

Shalini Sharma

shreya gupta

Aadi

irfan1patel1 Profile Answers by irfan1patel1

Mahesh

Manish

Sohil

Give your answer:

Related Answered Questions

Related Open Questions

Latest News

It looks like you are using an AD Blocker!

irfan1patel1
Profile Answers by irfan1patel1