Saturday 29 March 2014

Lambdas, InvokeDynamic and the DLR

While reading some stuff about Java 8, it came as a huge surprise to me to see that Lambdas depend on invokedynamic, that new bytecode instruction added in Java 7 and that in principle would only be used by implementors of dynamic languages on the JVM, but not by Java itself (notice that Java 8 has not added any sort of dynamic support like C#'s dynamic keyword).

For example, we experimented with different alternative implementations of Lambda on the VM, and found that we could use InvokeDynamic. In doing so, we found things that could be improved in InvokeDynamic. The desire to use Lambdas in a natural way with collections led to the design of the Streams API, and so to extension methods, which required support in the language.

I found it surprising because lambdas have existed in other languages (either C#/VB.Net for the CLR or Groovy/Scalar for the JVM) since a long while before additional dynamic support were added to their respective platforms (DLR for the CLR and invokedynamic for the JVM). In these languages Lambdas are implemented at the compiler lever, that creates new methods or new classes (if it's dealing with clousures and therefore it needs state). This prompted me to try to find some information as to what are the reasons/advantages of this different approach, and I came across a very interesting post. The most informative part comes in the comments, where one of the guys in the scala team states:

From that document you can see that Java 8 does create an anonymous class that implements SAM interface for for each lambda separately. The difference between Scala and Java 8 is that Java 8 will create those anonymous classes at runtime using LambdaMetaFactory. The class is named "meta factory" for two reasons: 1. LambdaMetaFactory is being called during program linkage (so at meta level). 2. LambdaMetaFactory is "a factory of factories" (hence meta factory) which means, it creates classes that have constructors and those constructors are factories for each lambda they represent. Therefore, the invokedynamic instruction is there to get the code that will create an instance of a lambda. As mentioned above, that invokedynamic instruction will get expanded to a call to a constructor of anonymous class that LambdaMetaFactory creates at runtime. This means that at runtime the bytecode you get for Scala and Java lambdas looks very similar. It also means that Java 8 doesn't have any more efficient way of implementing lambdas if you care about runtime performance. Java 8's strategy does result in smaller JAR files because all anonymous classes are created on the fly. However, the key thing about invokedynamic is that it's essentially a JVM-level macro that defers lambda translation strategy to LambdaMetaFactory which is a library class. If Java 9 or 10 gets more direct (and performant) support for lambdas that won't require classes and object allocations then it can swap implementation of LambdaMetaFactory and all _existing_ code written for Java 8 will get performance boost. That is the brilliance of using invokedynamic in context of translating lambdas. We'll have to wait for future versions of Java to experience those benefits, though.

So it seems like in the end also additional (anonymous) classes are being created, only that at runtime rather than at compile time, and the main reason for this is being ready to take advantage of some future, hypothetical changes in the JVM. This decision of creating support classes at run- time rather than compile-time also caught me a bit by surprise, but well, that's also what the DLR does

All this has awaken my interest on how invokedynamic works and how it compares to other approaches. The CLR lacks of a similar instruction at the bytecode level, and all the dynamic stuff (IronPython, IronRuby, c#'s dynamic) tends to be based on the DLR, that was introduced with .Net 4.0 and did not involve any changes (i.e. new bytecode instructions) in the underlying VM. There's a good explanation here about what the DLR brought into the table. This paragraph comparing it to the (at that time under development) JVM equivalent is noteworthy.

The Davinci project will add "method handles" to the JVM which the CLR has already had in the form of delegates. It will also add a new "invokedynamic" opcode which the CLR doesn't have and didn't gain w/ the DLR work. Instead the DLR just uses the existing primitives (delegates, reified generics) plus libraries for defining the interop protocol. Both add the concept of call sites and those may be fairly similar between the two.

One of the elements used by the DLR, callsites (you'll see them for example if you check the bytecodes generated for C# code using dynamic) and callsites caching, should be also familiar to anyone that has ever decompiled the code generated by pre-invokedynamic Groovy. Groovy is one of the most dynamic (and to my taste most advanced) languages that I can think of, and it has worked great with that callsite caching approach, but it's been ported now to use invokedynamic (the indy version as it's called), and important performance improvements are expected.
Notice that invokedynamic seems to be based on MethodHandles and CallSites (indeed MethodHandles were added to Java in order to provide support for invokedynamic)

I've found this write up by a JRuby guy stating that the JVM is way superior to the CLR as an environment for dynamic languages. Honestly, I don't have the knowledge about either the JVM and invokedynamic or the CLR and DLR to judge by myself, so I've compiled a few interesting links that hopefully I'll be able to dig into sometime in the future:

Thursday 20 March 2014

Reflection and Parameters Name

Reading the What's New? for the loooooooong awaited Java 8 release, I came across a new feature I hadn't heard about before: Method parameter reflection. Doing some search you'll findthat it's just getting the names of the parameters of a method. It seems like in order to enable this (by storing the parameter names in the .class), you'll have to compile your code with the -parameters option (notice that contrary to .Net, debug information in Java is also stored inside the .class file)

To store formal parameter names in a particular .class file, and thus enable the Reflection API to retrieve formal parameter names, compile the source file with the -parameters option to the javac compiler.

This seemed to bring some .Net echoes to my mind, so digging a bit in the recent past, I came up with this previous post. Retrieving the parameters names has always been possible as these names are stored in the Assembly metadata along with the rest of the method signature.

The method signature is part of metadata. Just to call the method it would be enough to know the binary offset of the method and its number and size of parameters. However, .NET stores the full signature: method name, return type, each parameter's exact type and name, any attributes on the method or parameters, etc.

So far JavaScript lacks a Reflection API per se, as Reflection is just an integral part of the language. That integral that we even can get the source of a function by calling its toString method. This way, we can get the names of the parameters with a simple regular expression.

Sunday 2 March 2014

Perl is Cool 3, Destructuring Assignment

Destructuring Assignment (DS, also known by other names like Multiple Assignment) is one of those features that after the first time you come across with it in one of the languages providing it (Python, Groovy, ES6...) you wonder why in hell other languages (C#, Java...) don't implement it. It's even harder to understand why it was left out of these languages when you find that it's not a new idea at all, and a language quite prior to them, Perl, features it. Indeed, Perl allows for some very interesting uses of this feature, let's take a look:

Assigning from an Array

This is the most well known use case of DS, It's massively employed in Perl for arguments assignment (given that a subroutine receives all its arguments in the @_ array), but we can also use it the other way around:

#function and parameters
sub format{
 my ($f1, $f2) = @_;
 ...
}
format("aa", "bb");

#function and return values
sub getBestFilms{
 return "Incendies", "Seven";
}
my ($film1, $film2) = getBestFilms();
Assigning from a Hash

This one is pretty interesting, and I had never seen it in other languages. You can easily assign values in a hash to different variables:

my %countries = (
 Asturies => "Xixon",
 Germany => "Berlin"
);

my ($astCity, $gerCity) = @countries{"Asturies", "Germany"};
($gerCity, $astCity) = @countries{"Germany", "Asturies"};
Create a Hash from 2 Arrays

Again, pretty interesting, and had never seen it before in any other language

my @people = ("Xuan", "Iyan");
my @cities = ("Uvieu", "Berlin");
my %peopleToCity;
@peopleToCity{ @people } = @cities;
foreach my $person (keys(%peopleToCity)){
 print $person . ": " . $peopleToCity{$person}. "\n";
}
Iterate a Hash

DS provides a very convenient way to iterate Keys and Values in one Hash

while ( my ( $person, $city ) = each (%peopleToCity) ) {
 print $person . ": " . $city. "\n";
}

Probably there are many more tricks/idioms in Perl involving DS, but so far these are the ones I can think of.