Sunday 27 July 2014

Parallel Linq

Parallel Linq has been around for a long while, but I've hardly had the need nor the chance to play with it. Some discussions at work as to whether some parts of our current Perl application could get noticeable performance improvements by running them in parallel have whet my appetite for running operations like sorting, filtering... in parallel, and this has drawn me to take a look into Parallel Linq.

Apart from all the parallel magic, I find interesting the way how these parallel operations have been made available to Collections, so I'll write here some notes about it.

Given that Parallel Linq is a parallel implementation of Linq To Objects, the design is very similar. While Linq to Objects is implemented as a set of Extension Methods that apply to IEnumerable objects (I'm using IEnumerable to refer both to IEnumerable<TSource> and IEnumerable), Parallel Linq is implemented as a set of Extension Methods that apply to ParallelQuery objects. The former are located in the System.Linq.Enumerable class, and the latter in the System.Linq.ParallelEnumerable class.

If you have an IEnumerable object and want to switch from applying normal (sequential) Linq methods on it to its parallel counterpart, all you need is calling the AsParallel method on it. How does this work? AsParallel will return a ParallelQuery objet, so now the different Where, Find, First... extension methods will be taken from System.Linq.ParallelEnumerable rather than from System.Linq.Enumerable. Notice that ParallelQuery implements IEnumerable, but the resolution rules for Extension Methods guarantee that "the closest" to the given object are taken, meaning that Where(ParallelQuery... is chosen over Where(IEnumerable...

Let's say you have one collection and for whatever the reason (depending on different factors the parallel version could be slower than the sequential one) you want to apply some parallel methods and some sequential methods (and you're chaining all those calls), you can then use the AsSequential method to convert your ParallelQuery back into a plain and simple Enumerable. Thinking a bit about this, it seems clear that all what AsParallel and AsSequential classes do is wrap/unwrap your normal Enumerable Object in a ParallelQuery object. We can launch ILSpy to confirm this:

Indeed, internally the framework is using a ParallelEnumerableWrapper class, that inherits from ParallelQuery. Not sure about the rationale for this extra class.

You'll see that we also have an AsEnumerable method, that seems to do just the same as AsParallel. An yes, if you check its code, it just calls AsSequential.

You could be wondering why I'm using ILSpy to peek into the Framework code when the Framework's source code is publically available (both online and for download). Well, first I'm a creature of habit, and second, using a decompiler still gives me sort of a slight sense of "hacker" that the source code doesn't :-D

No comments:

Post a Comment