Thursday 31 October 2013

Generators

It seems like there's been quite excitement lately around one of the new beautiful features with which one day ES6 will delight us, Generators. They've been added to the last version of node.js, and have been available in Firefox since time immemorial (fuck you Chrome), but notice that Firefox will have to update its implementation to the syntax agreed for ES6 (function*...).

I first learnt about Generators back in the days when I was a spare time pythonist. Then the feature was added to C# 2.0, and all my relation with the yield statement has been in the .Net arena. Notice that unfortunately Microsoft decided to use the term Iterator for what most other languages call Generators, same as they decided to use the terms Enumerable-Enumerator for what most other languages call Iterable-Iterator... frankly a quite misleading decision.

I've been quite happy with C# Generators (Iterators) ever since, especially with all the cool black magic that the compiler does under the covers (create aux class implementing IEnumerator (and IEnumerable if needed), __this property pointing to the "original" this...), but reading about JavaScript generators I've found some powerful features that are missing in C# version.

It's no wonder that in JavaScript generator functions (functions with yield statements) and iterator objects (__iterator__) are pretty related. Documentation talks about generator functions creating an generator-iterator object. Let's clarify it, what they mean with generator-iterator object is an iterator object with some additional methods apart from next:
send, throw, and close

These extra methods are quite interesting and are not present in C# implementation. send seems particularly appealing to me. As explained in different samples, basically it allows you to pass a value to the generator object that it can use to determine the next steps in the iteration. You can use it to reset your iterator or to jump to a different stage. I've said reset, well, IEnumerator has a Reset method, but the IEnumerator/IEnumerable classes that the C# compiler generates when we define an Iterator (generator) method lack a functional Reset method (it just throws a NotSupportedException).

Pondering a bit over this, we can sort of add a Send/Reset method to our C# iterators. From the iterator method we have normal access to properties in the class where it's defined (as the generated Enumerator class keeps a __this reference to that class), so we can add the Send/Reset method directly there. This means that if we want to Reset the Enumerator created from an iterator method, we'll have to do it through the class where that iterator method is defined, rather than directly through the Enumerator. Obviously it's not a much appealing solution, but well, it can be useful in some occasions.

public class FibonacciHelper
{
 private bool reset;
 public void Reset()
 {
  this.reset = true;
 }
 public IEnumerator<int> FibonacciGeneratorSendEnabled()
 {
  int prev = -1;
  int cur = 1;
  while (true)
  {
   int aux = cur;
   cur = cur + prev;
   prev = aux;
   yield return cur;
   if (this.reset)
   {
    this.reset = false;
    prev = -1;
    cur = 1;
   }
  }
 }
}

FibonacciHelper fibHelper = new FibonacciHelper();
  IEnumerator<int> fibEnumerator = fibHelper.FibonacciGeneratorSendEnabled();
  for (var i=0; i<10; i++)
  {
   fibEnumerator.MoveNext();
   Console.WriteLine(fibEnumerator.Current);
  }
  Console.WriteLine("- Reset()");
  fibHelper.Reset();
  for (var i=0; i<10; i++)
  {
   fibEnumerator.MoveNext();
   Console.WriteLine(fibEnumerator.Current);
  }

I've got a full sample here

Another difference is that while JavaScript's send will both reposition the iterator and return a value, my implementation above will just reposition the enumerator, but no value will be returned until the following call to Next.

Another interesting feature in JavaScript generators is the additional syntax for composing generators, aka "delegated yield": yield* (seems Python's equivalent is yield from)

function myGenerator1(){
yield* myGenerator2();
yield* myGenerator3();
}

As C# lacks that cutie, we have to write:

IEnumerator<T> MyGenerator1()
{
foreach(var it in MyGenerator2()){
yield it;
foreach(var it in MyGenerator3()){
yield it;
}

Saturday 26 October 2013

Make me a German

Make me a German is an equally funny and informative BBC documentary. The curiosity for understanding why the German economy is doing so good while the economies of the rest of the European Union are doing so dramatically bad, compels a British family to move to Germany and try to convert themselves in the average German family in an attempt to understand the country from inside.

The experiment is pretty funny, and brings up some interesting points, at least for someone like me, that in spite of having an obsessive fascination for Berlin, having read quite a lot about German history and society and counting some German artists (Caspar David Friedrich) among my all time favorites, has never had a too intense interaction with locals (I've been to Germany quite many times, but as a solitary person with not much significant social skills... I've hardly scratched the surface of the German mind). I'll list below some of their findings:

  • Germans really work less hours than most Europeans (at least British and Asturians), but they work so focused and hard that they are much more productive. It's astonishing how outraged one German lady felt when talking about her experience in U.K. in an office where people where checking their personal emails or talking about their private life during the work day. Quite hilarious.
  • There seems to be a very strong sense of community at work, and also an identification with the company. You are part of a team and as everyone in the team is working hard you can't fail them. I guess most people will appreciate this, but notice that taking to the extreme this feeling of unity and belonging and the denial of individualism helped set the backdrop for the Nazi regime. I'm a very individualist person, so I'm a bit biased on this point.

  • Germans are cautious with money and save more than the rest of Europe. This can be easily traced to the brutal crisis after WWI and WWII. Germans are quite little fond of credit cards (hum, that's a rather Germanic trait of mine). This background also explains something that I pretty enjoy when being there, Supermarkets are cheap, indeed it turns out that German supermarkets have the tightest profit margins in Europe.
  • It's easier to be a mother in Germany. Families with kids get enormous fiscal advantages, Kinder Gartens are really cheap, and there's a sense of pride in being a mother that has left her job to take care of her little kids and the house. It's so common that mothers that decide to carry on with their jobs are generally seen with a certain disapproval. What seems odd to me is that having all these advantages the birthrate continues to be so low.
  • I'd never noticed that Sundays as a rest day were so important to Germans, well probably it's cause they're not that sacred in Anarchist Berlin as they are in Christian southern Germany. Combine this with that almost genetic obsession with abiding by the rules and civic behaviour, and doing some more noise than expected in a Sunday morning can end up with the Polizei paying you a visit and giving you a fine. On a personal note, the unruly Astur-Gallaecian in me can't help enjoying the bad looks I get there each time I cross a red light :-D (and what to say about travelling without a ticket in Berlin's BVG)

It quite caught my attention that a British family were looking at Germany as a better place. For an Asturian like me, that lives in a place with 27% unemployment, where youngsters are much more ignorant now than they were 100 years ago, having as their main aspiration in life to turn into a TV crap celebrity and partying as hard as possible, where politicians are mainly a bunch of thieves, where "picaresca" (that is, getting whatever you want by means of tricks and cheating rather than by effort) is a chronic illness... it's normal to perceive Germany as a better place and look at it with a certain sense of inferiority (though based on culture, history and geography Asturies is NOT Southern Europe, in the end we share and suffer too many traits with the rest of Southern Europeans), but I also perceive UK as a better place/society, so it seemed funny to me seeing the Brits envious of the Germans. Who knows, maybe Swedes are also envious of Norwegians or Danes... but for us, they're all just "first class countries".

Sunday 20 October 2013

Stalin's Birthday Cakes

My fascination with architecture (I'm talking about buildings today, not about Software) has grown over and over along the years. My main source of fascination are Gothic Cathedrals (and to a lesser extent Baroque structures) and slim sky scrappers (aka "Business Cathedrals"). I'll also leverage this entry to make public my discomfort with simplistic buildings that for some reason "self proclaimed intellectuals" decided to consider "revolutionary". I'm talking about the main current in functionalism and that sort of Bauhaus crap. For me Aesthetic pleasure should be one of the main aims of architecture, indeed, it's one of its basic functions.

This said, is easy to understand my fascination with Stalinist Style sky scrappers (aka Stalin's Birthday Cakes). This wikipedia article gives an excellent introduction to the broader subject of Socialist Realism. Though I've never been to Russia, "the Soviet sphere of influence" after WW II (aka occupation and puppet states) has meant that I've been able to indulge myself with the views of some extraordinary pieces of this style with no need to leave the European Union.

Along with the 2 most well known buildings, Warsaw's Palace of Culture and Science and its little brother (or sister) Latvian Academy of Science in Riga, I've also set my eyes on 3 other beautiful (though obviously not so magnificent) constructions in:

  • Prague (the Crowne Plaza Hotel). I knew about this building through some web research, otherwise it's a bit far from city centre and I don't think I would have come across it just by chance.
  • Tallinn (nice residential building close to city centre). You can read more about Tallinn's "Soviet legacy" here.
  • .
  • and Vilnius. I just came across this building by chance. It's close to the city centre, by the Neris river, just next to the pedestrian bridge crossing to the business district (by the way, that bridge gives you a pretty nice view of that area). I haven't found any additional information about it.

I've created a new Picasa Gallery with some more related pictures.

While I find this "Birthday cakes" style buildings the most noticeable of the genre, the whole Socialist Classicism style seems fascinating to me. My visits to the Soviet War Memorial in Berlin have been sort of spiritual experiences (apart from the imposing architecture it confronts you with the miserable condition of human beings when you think about how the heroes that liberated Europe from fascism turned into the brutal rapists of millions of German women...), and visiting the memorials in Tallinn or Riga, or just strolling along Karl Marx Allee in Berlin or Nowa Huta in Krakow are absolutely recommendable activities.

Saturday 12 October 2013

Debug, Release and PDBs

Many people (obviously I was among them) feel surprised when they build a Release version of a .Net Application with Visual Studio and find that .pdb (Program Database) files have been output to the Release folder along with the generated binaries. A better understanding of Release builds and pdb files will explain it.

Based on some online resources it seems like the Release configuration in Visual Studio invokes the compiler like this:
csc /optimize+ /debug:pdbonly

The /optimize flag instructs the compiler as to whether generate optimized MSIL code. My understanding is that there are few optimizations that the C# compiler applies when generating MSIL, I think the main difference is that a good bunch of NOPs is added to non optimized MSIL in order to make subsequent debugging easier. Take into account that most of the optimization tasks are done by the JIT when compiling from MSIL to Native Code. I'm not sure what effect the unoptimized MSIL has in the JIT compilation, as what I've read is that in principle JIT always tries to generate optimized code except when a method is decorated with the MethodImplAttribute set to NoOptimization, or while debugging with the Suppress JIT optimization on module load option. Also, I'm not sure whether the /optimize flag option has any effect on the JIT behaviour (it could set some additional metadata instructing the JIT to optimize or not Native code). Based on this article your can also manipulate the JIT behavior by means of a .ini file

The /debug flag tells the compiler whether it has to generate pdb files, and how complete the debug info should be (full vs pdbonly). This excellent post gives a neat analysis. It mentions another attribute to tell the JIT to what extent it must perform optimizations, the DebuggableAttribute. Related to this, it seems like the addition of MSIL NOPs has more to do with the /debug flag that with the /optimize one.

PDBs are a fundamental piece for any debugging attempt. This article will teach you almost everything you need to know about PDBs. Basically, you should always generate PDBs for your release versions and keep them stored along with your source code, in case you ever need to debug your Release binaries.

PDBs are used for a few more things other than debugging itself:

  • The .Net runtime itself uses the information in pdb files in order to generate complete stack traces (that include file names and line numbers). I guess the stack trace is built from StackFrame objects, about which we can read this:

    A StackFrame is created and pushed on the call stack for every function call made during the execution of a thread. The stack frame always includes MethodBase information, and optionally includes file name, line number, and column number information.

    StackFrame information will be most informative with Debug build configurations. By default, Debug builds include debug symbols, while Release builds do not. The debug symbols contain most of the file, method name, line number, and column information used in constructing StackFrame objects.

    I would say when they say talk about release/debug they should really talk about the presence or not of pdb files, cause as explained in the previous article, both full and pdbonly options generate the complete stacktraces.
  • ILSpy makes use of PDBs to get the names of the local variables (as these are not part of the Assembly Metadata). Assembly Metadata includes method names and parameter names, but not local variable names, so when decompiling an assembly into C# code ILSpy will read the variable names from the associated pdbs. I found these related paragraphs somewhere:

    Local variable names are not persisted in metadata. In Microsoft intermediate language (MSIL), local variables are accessed by their position in the local variable signature.

    The method signature is part of metadata. Just to call the method it would be enough to know the binary offset of the method and its number and size of parameters. However, .NET stores the full signature: method name, return type, each parameter's exact type and name, any attributes on the method or parameters, etc.

    Given this source code:

    ILSpy will decompile a Debug build like this when PDBs are present

    like this for a Releasse build also with PDBs present

    and like this when PDBs do not exist

It's interesting to note that Java does not use separate files for its debugging information, debug information (if present) is stored inside .class files. More on this here

Sunday 6 October 2013

Windows vs Linux: Processes and Threads

I'm both a Windows and Linux (Ubuntu, of course) user, and I'm pretty happy with both systems. I find strengths and weaknesses on both of them, and love to try to understand how similar and how different both systems are. It's important to note that I don't have any sort of "moral bias" against Commercial Software. I deeply appreciate Open Source and almost all software I run on my home PCs is Open Source, but I have absolutely nothing against selling software, on the contrary, provided that it's sold by a fair price, I fully support it (until they day capitalism is overthrown and we start to live in a perfect "communist with a human face" society...) People buy and sell hardware, so what's the problem with buying/selling software?

What really annoys me (so much that it made me move away from Linux for several years) are the typical open source bigots that spend the whole day bashing Microsoft (a company where employees earn pretty decent salaries and enjoy a huge level of respect from their employer) because of the inherent evilness in selling software, but don't give a shit about wearing clothes produced by people earning 2 dollars a month under enslavement conditions... It's obvious that if you're involved in an anarchist hacklab you should avoid Closed Software, but someone with a iphone in the pocket of his Levi's trousers is not entitled to give moral lessons to Microsoft, Adobe or whatever... well, enough philosophy, let's go to the business :-)

There are a few Windows/Linux differences that I find interesting and I'd like to touch upon, I'll start off today by Processes and Threads:

For years I've had the impression than Threads in Linux play a rather less important role than in Windows. I can think of a handful of reasons for this:

  • It seems to be common knowledge that Process creation is cheaper in Linux, this discussion makes a pretty enriching read. In short, fork and even fork + exec seem cheaper than CreateProcess, and some aspects of Windows (like security) are fair more complicated (which does not necessarily mean better) than in Linux, which adds overhead. Regarding fork, when a process A starts a second copy of itself it's just a simple fork not followed by an exec, so my understanding is that no hard disk access is involved, while a CreateProcess will always involve disk access.
  • Traditionally Linux threads have been far from optimal, though all this seems to have changed since the introduction of NPTL in Kernel 2.6
  • I think we could say that for the Linux Kernel a Thread and a Process are quite more similar than they are for the Windows Kernel. In Linux both Process creation and Thread creation make use of the clone syscall (either invoked by fork for the former or by pthread_create for the latter), though both calls are done differently so that some data structures (memory space, processor state, stack, PID, open files, etc) are shared or not. This paragraph I found somewhere is good to note:

    Most of today's operating systems provide multi-threading support and linux is not different from these operating systems. Linux support threads as they provide concurrency or parallelism on multiple processor systems. Most of the operating systems like Microsoft Windows or Sun Solaris differentiate between a process and a thread i.e. they have an explicit support for threads which means different data structures are used in the kernel to represent a thread and a process.
    Linux implementation of threads is totally different as compared to the above-mentioned operating systems. Linux implements threads as a process that shares resources among themselves. Linux does not have a separate data structure to represent a thread. Each thread is represented with task_struct and the scheduling of these is the same as that of a process. It means the scheduler does not differentiate between a thread and a process.

    Please, with respect to the last sentence notice that the Windows Scheduler does not differentiate between threads and processes either, it just schedules threads, irrespective of their process. It's nicely confirmed here:

    Scheduling in Windows is at the thread granularity. The basic idea behind this approach is that processes don't run but only provide resources and a context in which their threads run. Coming back to your question, because scheduling decisions are made strictly on a thread basis, no consideration is given to what process the thread belongs to. In your example, if process A has 1 runnable thread and process B has 50 runnable threads, and all 51 threads are at the same priority, each thread would receive 1/51 of the CPU time—Windows wouldn't give 50 percent of the CPU to process A and 50 percent to process B. To understand the thread-scheduling algorithms, you must first understand the priority levels that Windows uses.

    This is another good read about Linux Threads and Processes

One consequence of these differences in importance is that getting thread figures is more straightforward in Windows.
Viewing the threads associated to a process is pretty simple in Windows, you don't even need the almighty ProcessExplorer and just can get by with Task Manager if you add the Threads column to it. This is not that out of the box in Linux. Ubuntu's System Manager does not have a Threads column, and most command line tools do not show the threads number by default, so you'll need to use some additional parameters:

with ps you can use the o option to specify the nlwp column, so you can end up with something like this:
ps axo pid,ppid,rss,vsz,nlwp,cmd
When using top in principle you can pass the -H parameter so that it'll show threads rather than processes, but I find the output confusing.

I think another clear example of the differences in "thread culture" between Linux/Windows communities is Node.js. Its asynchronous programming model is great for many scenarios, but it's easy to get to a point where you really need two "tasks" running in parallel (2 cpu bound tasks like decrypting 2 separate streams), when I first read that the only solution for those cases is spawning a new process, such answer came as a shock as I've got mainly a Windows background. When you think that though it's now massively used in Windows Node.js started with Linux as its main target, the answer is not that surprising.

Wednesday 2 October 2013

Delay-Loaded Dlls

Something I love of using several similar (sometimes competing) technologies (C#-Java, Windows-Linux...) is that ever that I learn something new in one of them I try to find how it's done in its counterpart.

Some days ago I came across Delay-Loaded Dlls in native Windows applications. It's sort of a middle ground between the 2 normal dll loading techniques, that is:

  • The most common/obvious case: statically Linked Dlls. When you compile your application references to those dlls that it needs get added to the Import Table of your PE. As soon as your Application is loaded these dlls will get loaded into its address space, irrespective of whether they'll end up being used in that particular execution.
  • Dynamically Loaded Dlls. In this case the Dll is not loaded when the application starts, but you decide dynamically not only when, but also what, to load (you can decide to load one or another dll based on runtime conditions). This is all done with the LoadLibrary and GetProcAddress Win32 functions.

As I've said, Delay-loaded Dlls are something in between supported by our cute linker. You have to know at compile time the name of the Dll that you'll be delay-loading and it'll be added to the Import Table, but it won't be loaded until one of its functions is used for the first time. This means that it's like a transparent lazy version of the first case described above. Indeed, this is very similar to how .Net applications load assemblies "statically" (I mean, not using Assembly.Load...). The Assembly needs to be present at compile time and it'll be referenced from the metadata, but it won't be loaded until the first time that one of its functions is close to being executed (it's better explained here).

Linux can load Shared Objects (SO's) also in a purely static manner (statically linked) or in a purely dynamic fashion (using dlopen rather than LoadLibrary and dlsym rather than GetProcAddress)

And now the question is, how do we delay-load Shared Objects in the Linux world?
Well, to my surprise, based on this question/answer it can't be done!

You can read more about the loading of Shared Libraries here