Saturday 25 August 2012

Avoid creating Custom Delegate Types

  • I used to think about custom delegate types as something useful due to how they can convey semantics, that's why I used to create delegate types like "DrawingStrategy", "DateFormatter" and so on. Unfortunately the .Net Framework BCL with its huge number of delegate types (ThreadStartDelegate, TimerCallback...) is a good source of inspiration to follow this wrong pattern. Why I'm saying this is wrong?, cause contrary to what we could expect, there's not any kind of automatic conversion between different Delegate types which signature is exactly the same, I mean, if we have 2 delegate types like these:

    delegate bool FormatValidator(string st);

    delegate bool OrtographyValidator(string st);

    though the same method can be wrapped in any of them, once the delegate is constructed we can't pass a FormatValidator to a method expecting an OrtographyValidator, and viceversa

    This means that you should avoid custom delegates types and use the generic delegates: Action, Action<T1>... and Function<TR>, Function<T1, TR>...

    As stated above, the BCL is staffed with custom delegates types, so more often than not you'll be faced with a situation where you'll need to obtain a WhateverFancyDelegateName1 delegate from another WhateverFancyDelegateName2 delegate. You have two options, either use a lambda/Anonymous method to wrap DelegateName2 into a DelegateName1, or use the slower, Delegate.CreateDelegate. I could spend time writing sample code here, but this excellent post does just that, so no need to rewrite what is already properly put there.

    I think a good source of confusion that could wrongly draw us to think that the compiler would accept any delegate type when the signatures match, is the always complex topic of Covariance and Contravariance, so let's review a bit how delegate variance works:

    • C# 2.0 added delegate return type covariance and delegate parameter type contravariance. Indeed, this has nothing to do with the issua at hand, as it's not dealing with different delegate types, but with what methods can be used for a specific delegate type. We can see a sample here
    • C# 4.0 added Variance for Generic delegates. This is a bit more related to our case, as in a sense we're talking about differente Delegate types, but only a subset of them.

    I've put together this sample with the different kinds of delegate variance that is supported. To my surprise, it fails to compile under mono (both on Windows and Linux)!.

  • Tuesday 14 August 2012

    Ödon Lechner

    Probably this brilliant architect is not much known outside of Hungary, but if you've ever been to Budapest or Bratislava, have a certain taste for Art Nouveau (or just beauty) and have run into any of his works, you sure will have him added to your list of most astonishing creators of beauty. He's probably the most praised Hungarian secesionist (Art Nouveau, Jugendstil... you know) and earned himself to be known as "the Hungarian Gaudí".

    I knew of him because of Bratislava's Blue Church, so before visiting Budapest last June I added to my ToDo List paying a visit to some of his works there.
    I have to admit that my first encounter with one of Ödon's creatures was unplanned. I was indulging myself in some aimless wandering around Pest when a dreamlike green and golden roof some blocks away caught my attention. It was the Postal Savings Bank. The facade is gorgeous, but it's the fairy tale roof what really stands out for me. Indeed, I think I'd never realized before this how astonishing beautiful a roof can be. I've ever loved onion domes, and living in a city where we haven't had a decent snow cover in more than 20 years... it's easy to understand how charming I find any kind of steep-slope roof (and more if it's a slate one, like for example these in Namur). The mosaic roofs relatively common in Hungary had already amazed me my first time there, but this artist knew how to take it to the next level.

    The following day it was time for a visit to one of his most famous buildings, The Geological Institute. Unfortunately, I had to abandon the idea when I was already on my way to it. After getting off the Metro at the beautiful Keleti station, I began my stroll along an avenue flanked by old, dark, nice, worn out buildings. I was already short of time, and if we add to it my continuous stops to take one pic after another, and the 35 degrees that we were suffering... I ended up going back on my steps and heading for something different... so this should go first in my list for the next time I'm lucky enough to be in Budapest

    The next day, after a nice stroll along Raday utca and surroundings I made my way to the lovely Museum of Applied Arts. Once again, the green, mosaic roof stands out beautifully, but another element that I pretty much appreciated is a certain neo-gothic feeling that I think is mainly due to those elongated windows. I had a busy agenda for that day, and it didn't seem worth to me to pay for a ticket to spend time viewing the collection, so I just stood for a while at the entrance marvelled by the organic, imaginative lamps and handrails. Sure another must see.

    While preparing this post I found this excellent blog about Art Nouveau, check it!

    Thursday 9 August 2012

    Recursive anonymous methods/functions

    At first thought the idea of a recursive anonymous function can seem rather unfeasible, how can it call itself if it has no name? Well, in the old times, the way to achieve this in JavaScript was by means of arguments.callee. Its use has been deprecated, and the recommended solution is naming function expressions, you know:

    var f1 = function f1(){
    };
    

    That's fine, but if for the sake of knowledge we wanted to use an anonymous function, we still can do it, thanks to closures. A function can trap itself, (I've already used this technique before, so we could just write:

    var printNTimes = function(txt, n) {
       if (n > 0){
        console.log(n + " - " + txt);
        printNTimes(txt, --n);
       }
      };
    printNTimes("Asturies", 3);
    

    Our anonymous function is trapping in its closure the printNTimes variable, that is pointing to the function itself. Thinking in terms of the [[scope]] object and Executions context, the implementation seems clear.

    Could we do the same in C# with Anonymous Methods and Closures?
    Yes, we can, we already saw in this sample what a brilliant beast the C# compiler is in order to create the oddly named classes underlying the closure mechanism in .Net. There's a small detail to take into account, though:

    Action<string, int> printNTimes = (txt, n) => {
       if (n > 0)
       {
        Console.WriteLine(n.ToString() + " - " + txt);
        printNTimes(txt, --n);
       }
      };
      printNTimes("Asturies", 3);
    

    The compiler will spit this error when going through the code above:

    Error CS0165: Use of undefined local variable...

    The fix is simple, we just need to declare the variable in a previous statement

    Action<string, int> printNTimes = null;
    printNTimes = (txt, n) => {
       if (n > 0)
       {
        Console.WriteLine(n.ToString() + " - " + txt);
        printNTimes(txt, --n);
       }
      };
      printNTimes("Asturies", 3);
    

    If we think about the structure generated by the compiler, it's interesting. We have a class with a method containing the code for the anonymous method, and a data field that points to a delegate created for that method (and of course, that method is using that data field), so all in all, we have a cute recursive structure

    Watchout! Testing this on Linux I've seen that under the Mono compiler (gmcs) we don't need the extra declaration, the first case compiles fine (and also under the Windows Mono compiler)

    Sunday 5 August 2012

    When is an Assembly or a Class Loaded?

    I began to discuss this as part of my previous post, but I've preferred to set a whole post for a more complete discussion including some code.

    So, from my previous post:
    In .Net, an assembly is loaded the first time that a method referencing classes in that Assembly is Jitted. Jitting happens before running the method, so the runtime does not know if the instructions in that method that need that assembly will ever be really executed (they could be inside an if that never happens to be true...). When developing code with critical performance requirements, we should have this present, cause some minor modification to our code can save us an unnecessary Assembly load). With the HotSpot JVM, due to the initial interpretation, class loading is not done on the method border, but on the instruction border. I mean, one class is not loaded until the first instruction that needs that class is interpreted.

    OK, first let's go with the Java part:

    public static void main(String[] args) throws java.io.IOException{
            System.out.println("Started");
      System.out.println("insert option: B or P");
      
      BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
      String input = br.readLine();
      System.out.println(input.length());
      if (input.equals("P")){
       Person p = new Person("Xana");
       System.out.println("Person created");
      }
      else{
       Book b = new Book();
       System.out.println("Book created");
      }
      input = br.readLine();
        }
    

    When you run the code above with the -verbose option, you'll see that the Book or the Person class get loaded when the first use of such class is done (the new Person() or the new Book() statements). So, depending on whether the condition is true or not, only the Person or Book class will get loaded. This behaviour makes pretty much sense given that at first is the interpreter who is running, not the Jit.

    You can grab the code here if you want to play with it yourself

    For the .Net side, I had tested this some years ago, but as I found that sample a bit convoluted, I decided to write a new one:

    class Parser
    {
     [MethodImpl(MethodImplOptions.NoInlining)]
     public void ParseFile(string fileType)
     {
      Console.WriteLine ("ParseFile");
      if (fileType == "xml")
      {
       XmlDocument doc = new XmlDocument();
      }
      else{/*do something here*/}
     }
     
     [MethodImpl(MethodImplOptions.NoInlining)]
     public void ParseFileAssemblyAware(string fileType)
     {
      Console.WriteLine ("ParseFileAssemblyAware");
      if (fileType == "xml")
       this.ParseXml();
      else{/*do something here*/}
     }
     
     [MethodImpl(MethodImplOptions.NoInlining)]
     private void ParseXml()
     {
      Console.WriteLine ("ParseXml");
      XmlDocument doc = new XmlDocument();
     }
    }
    
    class App
    {
     public static void Main(string[] args)
     {
      Console.WriteLine("started");
      AppDomain.CurrentDomain.AssemblyLoad += new AssemblyLoadEventHandler(delegate(object sender, AssemblyLoadEventArgs args1)
                                                                           {
                                                                            Console.WriteLine("ASSEMBLY LOADED: " + args1.LoadedAssembly.FullName);
                                                                           });
            
      Parser parser = new Parser();
      Console.WriteLine("file type (xml or other)?");
      string fileType = Console.ReadLine();
      
      Console.WriteLine("AssemblyLoadingAware (Y or N)?");
      bool assemblyLoadingAware = Console.ReadLine().ToLower() == "y";
      
      if(assemblyLoadingAware)
       parser.ParseFileAssemblyAware(fileType);
      else
       parser.ParseFile(fileType);
     }
    }
    
    The source

    If you run the code above on Microsoft's CLR you'll find this:

    When Main starts, the System.Xml assembly has not been loaded yet

    If we say No to the "AssemblyLoadingAware" option, the System.Xml assembly gets loaded as we call into the ParseFile method (that's when the whole method is Jitted, and the Jitter sees that there's a reference to XmlDocument there, not taking into account that it's inside a conditional and could never be run) irrespective of whether we'll be parsing a xml file or other.

    If we say Yes to the "AssemblyLoadingAware" option, if we have entered "other" instead of "xml", the System.Xml assembly will not be loaded. Notice that what we've done in that case is moving the Xml code to a separate method, that way, when the ParseFileAssemblyAware method is Jitted, no reference to XmlDocument is found there and the assembly is not loaded, being that loading put off to the moment when the ParseXml method is run for first time (that in this case never happens).

    The bottom line here is that with .Net, if our software is performance critical (both in terms of memory, obviously a new Assembly in memory takes space, or in terms of time, as loading the Assembly also takes time...) we should pay some extra care to cases like the one shown above

    Notice that I'm using the MethodImpl attribute for the methods in Person. I'm doing so to prevent the Jitter from inlining those methods (these are very short methods, so candidates to be inlined) when it Jits the Main method. If you run the sample without those attributes, you won't see any message about the System.Xml assembly getting loaded in any of the cases, that's just because it gets loaded just when it Jits the Main method, before we add the handler to the AssemblyLoad event (you can check with ProcessExplorer that System.Xml is already in memory in that case).

    Saturday 4 August 2012

    CLR vs JVM: Loading, unloading...

    I'll try to sum up here some things I've been reading/thinking about lately, trying to glue them with some order/sense.

    - Loading Unit

    The CLR manages assemblies (.exe's and .dll's) as loading units, while the JVM manages classes (.class files). Pondering a bit about this, it makes a huge difference. Assemblies can contain millions of classes, so loading an assembly can be costly, loading a JVM class has to be a rather fast process. When an assembly is loaded the CLR has to create structures in memory for all the classes contained there: EEClass, MethodTable..., so it can be lengthy. I assume the JVM uses quite similar memory structures.

    - Interfering in the loading process.

    In the JVM we can alter how classes are loaded by creating our own ClassLoaders (this article is a pretty good reference). For example, when running groovy code from the groovy interpreter, the GroovyClassLoader loads the .groovy file and compiles it on the fly to JVM bytecodes.

    In the CLR, we don't have the notion of custom assembly loaders and there's not any sort of AssemblyLoader class that you can subclass. You can dynamically load an assembly via Reflection with Assembly.Load, but same as with the automatic loading done by the runtime, it does not seem simple to hook your own code into the process. There are some google results that seem to indicate that Aspect Weaving at Assembly load time is possible, so there must be some sort of hack to get in the process, but it's far from obvious.
    Java Class loaders are responsible for both finding the code to be loaded (you could get that code on the fly from the network for example) and loading it (from the Oracle documentation):

     class NetworkClassLoader extends ClassLoader {
             String host;
             int port;
    
             public Class findClass(String name) {
                 byte[] b = loadClassData(name);
                 return defineClass(name, b, 0, b.length);
             }
    
             private byte[] loadClassData(String name) {
                 // load the class data from the connection
                  . . .
             }
         }
    

    Indeed, this is not so different in the .Net world, cause you could also fetch the bytes representing your assembly from wherever you wanted, you just would create a byte[] and then would pass it over to Assembly.Load. With this in mind, it occurs to me that we could conduct load time Assembly modification by getting the assembly in memory (I'm not phrasing this with the "load" verb, cause I'm not talking about a normal Assembly.Load), modify the Assembly with with Cecil, save to a byte[], and then Load the Assembly from that in memory byte[] using Assembly.Load(byte[]).

    - When is an Assembly or a Class loaded?

    On the Java side I'll talk here only about Oracle's Hotspot VM, as I have no idea of how other JVM's work (I'm quite interested though in how Oracle ends up integrating Hotspot and JRockit). This said, it's quite important to understand the huge differences between the CLR and HotSpot. On the CLR there's not any interpretation step, all code is JITed (apart from code already Ngened) before its first execution. On the JVM code is interpreted first and for many methods it will be always like that, interpretation. Nevertheless, those sections of code that are run frequently enough (hot spots) end up being Jitted to Native code. Its adaptive optimizations do heavy use of inlining and it goes one step further replacing already compiled code by a new optimized version (it can for example revert inlined codeed if conditions change). This is an interesting fast read

    What I've described above clearly influences when the CLR or the JVM load a "code unit" (assembly or class).

    In .Net, an assembly is loaded the first time that a method referencing classes in that Assembly is Jitted. Jitting happens before running the method, so the runtime does not know if the instructions in that method that need that assembly will ever be really executed (they could be inside an if that never happens to be true...). When developing code with critical performance requirements, we should have this present, cause some minor modification to our code can save us an unnecessary Assembly load).

    With the HotSpot JVM, due to the initial interpretation, class loading is not done on the method border, but on the instruction border. I mean, one class is not loaded until the first instruction that needs that class is interpreted.

    I intend to show some samples about this in a separate post

    - Assembly unloading/reloading:

    This is an interesting topic, particularly for long lived applications, but at first sight neither the CLR nor the JVM are too collaborative on this regard. There's not any kind of Assembly.Unload or Class.Unload methods... so what? Well, we all should know at this point that in the .Net world assemblies are loaded into Application Domains, that act almost as ".Net processes" into OS processes. We can unload an Application Domain and all the assemblies loaded there will be unloaded. This is what the w3wp process does. When several Asp.Net applications run within the same IIS App Pool, each one is loaded into a different Application Domain. This way, we can reload an Asp.Net application by unloading is App Domain and loading the app again into a new App Domain. It seems like in the Java world this unloading feature is achieved through custom ClassLoaders, as explained here In both cases, achieving this is not trivial, but things like MEF of OSGI seem to help.

    I should clarify that unloading an Assembly would mainly mean removing from memory all the metadata structures created for it (EEClasses, Method Tables...), and removing the Jitted code

    Many people have previously posed this question: Why there's not an Assembly.Unload method? and it's been answered in detail since many years ago. Long in short, first, you would need to make sure that at that point there's not any referencesto instances of classes defined in that assembly, and second, apart from removing the metadata structures from memory, you would need to remove the Jitted code... and both things are quite comples

    From the article above, and from this one I've found out about the .Net Loader Heap. I was aware of the 3 generational Heaps and the Large Object Heap (collectively known as Managed Heap), but didn't know that statics, MetaData and Jitted code were put in this separate, Loader Heap (there's one per Application Domain). You can read more here

    Another one that you can't entirely ignore in managed code is the heap that stores static variables. It is associated with the AppDomain, static variables live as long as the AppDomain lives. Commonly named "loader heap" in .NET literature. It actually consists of 3 heaps (high frequency, low frequency and stub heap), jitted code and type data is stored there too but that's getting to the nitty gritty.