Sorting using spark

I need to sort an RDD. The sort needs to be on multiple fields of my record and I hence need a custom Comparator.

I see that the sortBy as it accepts only a single key. I chanced upon http://codingjunkie.net/spark-secondary-sort/ and thus used repartitionAndSortWithinPartitions to achieve the same.

Why doesn't sortBy accept a custom Comparator and sort? Why do I have to repartition just inorder to user a custom Comparator?

0 Comment

NO COMMENTS

LEAVE A REPLY

Captcha image