Fork/Join
The fork/join framework from Java 7 has been on my list of things to check out for a while now. Basically, it provides an easy way to parallelize work that can be broken up into smaller parts. Work is distributed to threads in a pool, and idle threads can steal work from busy threads (work stealing). There are many parallel algorithms and I decided to try parallel merge sort. I have two processors, so I would expect to see some speedup over the canonical top-down merge sort.
My fork/join parallel implementation does not become faster than the non-parallel implementation until I try to sort over about 25 million integers. It is obvious fork/join comes with significant overhead, but I suspect that would be less noticeable if I had more processors to work with.
Streams
As I have been trying to become more adept at functional programming and am also a Java fan, I have been starting to explore what's new in Java 8 as well. This probably is not at all practical, but I had the idea to try to re-implement merge sort using streams. To do this I had to use the non-recursive bottom-up merge sort algorithm.
This is faster than the fork/join version up to about 3 million elements, then it becomes slower. It was not really clear to me how to properly parallelize this one. Converting it to a parallel stream before doing the merge seems to only slightly improve performance. Again, maybe it is difficult to really see the benefit with only two processors.
It is also worth noting that since all parallel streams use the same fork/join thread pool, they will be affected by each other's performance if used concurrently as explained in Java Parallel Streams Are Bad for Your Health.
Notes
For completeness, here is the merge method I used in all three merge sort implementations: