Monday 14 June 2010

Closure in a Loop


Closures are one of those things that when you get used to them it seems absurd that a modern language does not implement them (hey java, wake up, it's time for you too...)
They're really important in JavaScript for the Asynchronous Ajax calls and callbacks or the binding of "this" to an event handler.
I've also done good use of them in C#, saving me from having to write "intermediate classes" just to hold parameters to be passed to a method when inside a parameterless delegate... yes, it sounds confusing, this sample with ThreadStart delegate makes it clear, though:



//we have this method that expects two parameters:

public static void Format(string title, string txt)

{

Console.WriteLine( "<h1>" + title + "</h1>"

+ "<p>" + txt + "</p>");

}

 

//and we want to run it in its own thread, but ThreadStart delegate does not accept any parameters, so what? just create a parameterless closure that captures the needed values and passes them on to our method:

 

string s1 = "title";

string s2 = "description";

Thread th1 = new Thread( new ThreadStart(() => HtmlFormatter.Format(s1, s2)));

th1.Start();



How a language implements closures is interesting and bizarre at the same time. Scope chains in JavaScript, auxiliar classes generated by the C# compiler...

Well, I've come across several times with a case that does not work the same in JavaScript and C# and that makes you think about how language semantics and internal implementation shaped this case. It's when you create a closure in a loop.

Let's say we have loop like this in C#:


for(int i=0; i<5; i++)

    actions.Add(()=>print("value = " + i));



or in javascript:


for(var i=0; i<5; i++)

    actions.push(function(){print("value = " + i);});



if now we invoke the functions in the actions array, it's not 0, 1, 2, 3, 4 what we're going (and that in principle would be desired behavior) to obtain, but 5, 5, 5, 5, 5.

This makes good sense in both cases if we think in terms of scope.
In the JavaScript case, the scope for i is the whole function, in the C# case, the whole loop, so in both cases, all actions are capturing the same i variable and all of them see any change done to the variable.
Thinking in implementation terms, if we know how closures are implemented in each language, this makes perfect sense with JavaScript, when each function is declared it captures the execution context object of the function where it's declared, so all these new functions point to the same execution context that contains that i variable.
However, in C# this behavior caught me by surprise. In principle, the compiler generates a class where the method for the delegate and the free vars are stored, so there should be one instance of that class per method and then a different i variable (data field) in each case. I was wrong, in this case an only instance of the class is used, so i is shared.

To obtain the intended behavior, we would have to go this way in C#:


for (int i = 0; i<5; i++)

{

int k = i;

actions.Add(() => print("value = " + k));

}



in this case we're using a new variable k, that is scoped only for each loop iteration, so each closure captures a different k variable.

however, if we do the same in JavaScript:


for (var i = 0; i<5; i++)

 {

  var j = i;

  actions.push(function(){ print("value = " + j);});

 }



we still get the initial results: 5, 5, 5, 5, 5.
This is the right thing according both to semantics and implementation. In spite of being declared inside the loop, the scope for the j variable is the whole function... so this it the same as for the first case. Likewise, from the implementation standpoint, we have an only variable in the execution context of the main function... so that's fine.

So for JavaScript in order to achieve the intended behavior, we need to add an extra function with it's extra scope, execution context:


for (var i = 0; i<5; i++)

  actions.push(function first(it){return function second(){print("value = " + it);}}(i));



when the "second" function is declared, it captures the it parameter passed to the "first" function where it's declared. We have 5 "instances" of the "first" function, each one with its own "it" parameter, that will be captured by the "second" function.

The equivalent c# code would be something like this:


for (int i = 0; i<5; i++)

{

 Func<int, Action> intermediate = it => (() => Console.WriteLine(it));

 actions.Add(intermediate(i));

}



You can check the source code:


Hey, I've got an update to this

No comments:

Post a Comment