Friday 19 November 2010

Linq (IQueryable) providers

When all the Linq thing started a few years ago, there was something that confused me quite a lot, and that still today we can find in some articles.

There are articles that claim to explain how to implement a Linq Provider. This is not completely accurate, they usually explain how to implement a Linq IQueryable Provider. If we do a google search for "implement linq provider" we get a list of mixed results, many of them add "IQueryable<T>" to the title, but some of them not.

The IQueryable word is important, because it's in that case when we have to go through the (complex) process of implementing an IQueryProvider<T>. This IQueryProvider<T> implementation is who contains the querying logic-magic. This is how Linq to Sql works:
The DataTable<T> objects in our System.Data.Linq.DataContext do implement the IQueryable<T> interface. So, when we call the Where method, we're invoking the System.Linq.Queryable.Where method, that calls into the CreateQuery method of the IQueryProvider referenced by our DataTable<T>, that I think is:

System.Data.Linq.SqlClient.SqlProvider

However, for other "Linq systems" like Linq to Objects (I used to call it Linq to Collections) there's not an IQueryProvider<T>. The querying logic is scattered over multiple classes. Using Reflector we can see that calling to System.Linq.Enumerable.Where<T> returns an instance of WhereIterator<T>. This class implements IEnumerable<T>, and it's the MoveNext method where all the querying logic lies in.

The most important point of all this is that making our classes "Linq friendly" does not necessarily involve implementing an IQueryProvider<T>, in many cases all we need is doing our classes implement IEnumerable<T>, so that all methods in System.Linq.Enumerable can be applied to them.

You have a good sample here for querying AD users.

Another sample would be making a CSV document queryable via Linq (so, a very basic form of Linq To CSV, there are several real implementations over there, this is just a read only toy sample). We could just have a GetEntries method returning an IEnumerable, and then apply our querying logic (I'm not taking headers into account here):


public IEnumerable> GetDataEntries()
{
using (FileStream fs = new FileStream(this.path, FileMode.Open, FileAccess.Read))
{
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
{
string curEntry = sr.ReadLine();
yield return curEntry.Split(this.separator).ToList();
}
}
}
}


so, given some data like:
Coutry;Name;Code
Germany;Gunter;24000
UK;John;18000
Germany;Herman;25000

we could do queries like:


myCsvDao.GetEntries().Where(row => row[0] == "Germany").OrderBy(row => row[1])


A nice feature here is that the IEnumerable<T>-IEnumerator<T> created by our iterator block (yes, it's rather odd to have a class that is both Enumerable and Enumerator, check out this article) is not loading all the lines into memory, but keeping the file open and reading-loading one by one, which can be pretty useful for performance considerations (and pretty shitty if we also had implemented write access and we could have concurrent operations on it...)

Update: I've found this good recent article that is pretty related to this and to my previous post.

No comments:

Post a Comment