Pipeline, Component and Data Parallelism

Give examples for pipeline, component and data parallelism

Showing Answers 1 - 6 of 6 Answers

inamdnik

  • Jul 3rd, 2018
 

1. Pipeline Parallelism
In pipeline parallelism, multiple components process data simultaneously. Each component in the pipeline continuously reads from upstream components, processes data, and writes to downstream components. Since a downstream component can process records previously written by an upstream component, both components can operate in parallel.
For example you can keep on reading the data from input file(say 10 records) but till now processed only 6 records. This is called pipeline parallelism when one component does not wait for all the data to come and starts processing parallely in a pipe.
NOTE: there are certain cases where pipeline parallelism breaks for example on Sort component since for sorting, all data must be read. At phase brake also pipeline parallelism breaks.
2. Data Parallelism
In Data parallelism, data is processed on different servers parallely. Most commonly data parallelism occurs in Multi Files that is in partitioning.
For example, if we have 4 way multifile then after partitioning data, it gets divided in 4 processes and same component acts parallely 4 times.
3. Component Parallelism
A graph with multiple processes running simultaneously on separate data uses component parallelism.
This kind of parallelism is specific to your graph when 2 different components are not interrelated and they process the data parallely. For example you have 2 input files and you sort the data of both of them in 2 different flows. Then these 2 components are under component parallelism.

  Was this answer useful?  Yes

Prasanth

  • Nov 17th, 2020
 

Pipeline: If the data is executing component by component is called pipeline parallelism.
Remember sort component breaks pipeline parallelism.
Component: If different data is executing in different flows is called Component parallelism.
Data : if the same data is executing in different partitions by using some partitions components is called Data parallelism.

  Was this answer useful?  Yes

Give your answer:

If you think the above answer is not correct, Please select a reason and add your answer below.

 

Related Answered Questions

 

Related Open Questions