What is the difference between partitioning with key and round robin?

PARTITION BY KEY:
In this, we have to specify the key based on which the partition will occur. Since it is key based it results in very well balanced data. It is useful for key dependent parallelism.

PARTITION BY ROUND ROBIN:
In this, the records are partitioned in sequential way, distributing data evenly in blocksize chunks across the output partition. It is not key based and results in well balanced data especially with blocksize of 1. It is useful for record independent parallelism.


0 comments: