Friday 25 January 2013

Lazy Object Construction

It's funny to see how an initial interesting reading (about the present and future of JavaScript's arguments object) takes me to a pretty good answer in StackOverflow respecting how to invoke a constructor using apply, and that ends up leading me to an old post of mine and rethinking it in JavaScript.

The idea behind Lazy objects is simple. We have an object which construction/initialization can be rather costly, so we create an empty facade to that object that we can pass around, and then when the client really needs the the real object, this facade will take care of its creation.

Drawing inspiration from the Lazy class in .Net, and from the aforementioned article, I ended up with 2 different approaches to the problem

Approach 1

function Lazy(type){
	var _arguments = [].slice.call(arguments, 1);
	this.create = function(){
		var inst = Object.create(type.prototype);
		//need this extra inst2 to follow the correct behaviour in case the constructor returns somethins
		var inst2 = type.apply(inst, _arguments);
		return inst2 || inst;
	};
}

var p = new Lazy(Person, "Xuan", 37);
p = p instanceof Lazy ? p.create() : p;
console.log(p.sayHi());

I would say this first approach seems cleaner to me based on the usage pattern. Consumers have to be aware of being dealing with a Lazy wrapper, but once they obtain the real object, they work directly with it

Approach 2

function Lazy2(type){
	var _arguments = [].slice.call(arguments, 1);
	Object.defineProperty(this, "val",{
		get: function(){
			//replace the property on first access (lazy access...)
			Object.defineProperty(this, "val", {
				value: (function(){
					var inst1 = Object.create(type.prototype);
					//need this extra inst2 to follow the correct behaviour in case the constructor returns something
					var inst2 = type.apply(inst1, _arguments);
					return inst2 || inst1;
				}()),
				writable: false
			});
			return this.val;
		},
		enumerable: true,
		configurable: true
	});
}

var p2 = new Lazy2(Person, "Xuan", 37);
if (p2 instanceof Lazy2){
	console.log(p2.val.sayHi());
}
else
	console.log(p2.sayHi());

This other approach, though more interesting in its implementation (for the val property I'm using an accessor descriptor that on first access replaces itself with a data descriptor, yes, pure JavaScript magic...) seems less appealing to me cause we're forcing the client to always access the functionality in the object through "val". Well, it's the same that happens with .Net's Lazy.

In both cases our Lazy function receives a constructor function as parameter, and we use the same technique to properly invoke it:

//create an object with its __proto__ pointing to type.prototype
var inst1 = Object.create(type.prototype);

//pass that object as "this" to the constructor, along with the other arguments
var inst2 = type.apply(inst1, _arguments);

//let's be consistent with how the new operator works, if the constructor returns a value let's return it instead of our initial object
return inst2 || inst1;

Notice that I'm using a Lazy constructor function instead of a lazy factory function. It seems more natural to me, but I've also included here the factory function approach.

A better solution would be using proxies, something like what I implemented for .Net in that previous entry. I'm not familiar at all with ES6 proxies, so maybe I'll give it a go in the future (but with ES6 still far in the horizon, I prefer to focus on other tasks). I've already complained previously about JavaScript not exhibiting any sort of Invoke Method or Get Property hooks. If that were the case, implementing the proxy technique would be so drastically simplified. The clear advantage of proxies is that the "laziness" of the object turns invisible to users, who interact with it right the same as with a non lazy version. This is ideal, as being "Lazy" is an implementation detail that should not get leaked to the client.

Thursday 24 January 2013

Vinyian

Vinyan is a rather interesting film. Directed by the guy behind Calvarie, one of the films clearly classified as part of the New French Extremity "movement", I had also read somwhere that Vinyan was related to that style (indeed that's what prompted me to download it), but I wouldn't say so.

This is a film about how pain turns love into obsession, and how this degrades into madness. The story is rather original, not that the plot has brilliant sharp turns or a frenetic pace, but it's fresh and different, and the settings (a touristic Thai zone first, and the dark, damp and wild Burmese jungle) are really appropriate. There are some fantastic elements since the beginning, but the horror components don't come into action until the last third of the film. That's probably the best developed part of the film. The "evil tribal kids" are quite original and terrifying, that terrifying as seeing beauty torn apart by madness.

Not much more to say, ah, yes, that the main female character is quite a cutie.

Saturday 19 January 2013

Interesting Object Initialization Trick

One of the reasons why I love JavaScript as a language so much is because you constantly find awesome tricks that make you exercise your brain. Some months ago I wrote here about how to circumvent the limitations of Object literals when it comes to initializing a field based on other fields.

var foo = {
   a: 5,
   b: 6,
   init: function() {
       this.c = this.a + this.b;
       delete this.init; //do this to keep the foo object "clean"
 return this;
   }
}.init();

I was quite happy with the solution that I copy-pasted there, but today just by chance I've come across another interesting one, that has the advantage that it will most certainly make the eyes of whoever reads your code blink quite a few times :-D

The technique discussed here:

var o = new function() {
    this.num = Math.random();
    this.isLow = this.num < .5; // you couldn't reference num with literal syntax
}();

It makes a very intelligent use of what we could call "anonymous constructors" or "temporal constructor" as the author says.

Friday 18 January 2013

.Net DSV DAO

Delimiter-separated Values files (CSV, TSV...) are a really convenient way to store data (I've started to use the term DSV recently, I used to wrongly employ the CSV term when I was referring to TSV's...).
They're simple to understand, read and modify both manually or by code. As I mentioned in my previous entry, dealing with Excel files tends to be more hassle than it should, so in many occasions when presented with a large excel file to read, I'll save it to TSV and use one home made parser to read it.

You can write a basic DSV reader in just a few lines of code, just using ReadLine and Split:

public IEnumerable<List<string>> GetDataEntries(string filePath)
{

using (StreamReader sr = new StreamReader(filePath))
{
while (!sr.EndOfStream)
{
string curEntry = sr.ReadLine();
yield return curEntry.Split(this.separator).ToList();
}
}
}

The code above will work fine with simple data where your fields won't need to include break lines or field separators, otherwise, you'll get a mess of results. The standard way to store these special characters in your fields is wrapping fields in quotes (usually double quotes, I'll call it "quote character"). If you also want to have "quote characters" as part of the normal content of your field, you'll have to double them (i.e. you'll have 2 double quotes or 2 single quotes).
With this in mind, I decided to write a "decent" DSV parser that contemplates these cases. For the fun of it, I ended up adding some Add-Remove functionality to it, so it turned into a DSV DAO (Data Access Object).

I've uploaded it to my GitHub account, so you can view the code and download it in case you think it can be of any interest to you.

The BasicDsvDao class should only be used for those cases when we're 100% certain that our fields won't include "especial" characters, otherwise use the AdvancedDsvDao, that carefully tries to deal with those cases (break lines or separators as part of your fields...). Though I think that the standard uses double quotes as field delimiters, I've given the option to use single quotes.

Both DAO's will work with files with or without headers and with your Delimiter of choice. You can read the Headers, read the Data Entries, Update/Remove the headers, and Add/Remove Data Entries.

These DAO's have no knowledge about what the data in your DSV's represent, so they just return (or expect, if used for adding new entries) an IEnumerable of Strings for each "real line" in your DSV. If these Collections of Strings can be mapped to Objects, that's your job.

After writing code for mapping those collections to Objects multiple times, I set out to find some way to generalize this a little bit. I winded up with a ColletionToFromObjectMapper<T>, that will map into a T object one of those string collections obtained by your DAO, and viceversa. For each field in your DSV-Object that you want to map, you'll need a function that maps from String to Object, a function that maps from Oject to String, the name of the Property in the Object, and as the DAO works with header and headerless DSV's the position of the field in the DSV line. We use the CollectionItemToObjectMappingInfo class to express this.

public class CollectionItemToObjectMappingInfo 
    {
        public int Index { get; set; }
        public string PropertyName { get; set; }
        public Func<string, Object> FromStringMapping { get; set; }
        public Func<Object, string> ToStringMapping { get; set; }
    }

When working with DSVs with header, we would need the name of the name of the header corresponding to that DSV field, instead of the position, so we'll use for that this DictionaryItemToObjectMappingInfo class:

public class DictionaryItemToObjectMappingInfo /* where T : new()*/
    {
        public string Key { get; set; }
        public string PropertyName { get; set; }
        public Func<string, Object> FromStringMapping { get; set; }
        public Func<Object, string> ToStringMapping { get; set; }
    }

and then we'll use the MappingInfoConverter class to obtain a CollectionItemToObjectMappingInfo from a DictionaryItemToObjectMappingInfo and the List of Headers in the DSV. we'll use it all like this:

var dicToObjMappingInfos = new List<DictionaryItemToObjectMappingInfo>()
            {
                new DictionaryItemToObjectMappingInfo()
                {
                    Key = "person name",
                    PropertyName = "Name",
                    FromStringMapping = st => st,
                    ToStringMapping = ob => (string)ob
                }, 
                new DictionaryItemToObjectMappingInfo()
                {
                    Key = "how old",
                    PropertyName = "Age",
                    FromStringMapping = st => Convert.ToInt32(st),
                    ToStringMapping = num => num.ToString()
                }
            };

            var dao = new AdvancedDsvDao(filePath, '\t', true);
            var headersList = dao.GetHeaders();

            var colToObjMappingInfos = new MappingInfoConverter().Convert(dicToObjMappingInfos, headersList).ToList();
            
            //the "real life" usage would go on like this:
            var mapper = new CollectionToFromObjectMapper<Person>(colToObjMappingInfos);
            var people = dao.GetDataEntries()
                .Select(personRow => mapper.Map(personRow))
                .ToList();

And well, I think that's all. I haven't done an thorough use of this code, but so far it has served me well.

Saturday 12 January 2013

Some Very Useful Libraries

I'm talking about the .Net ecosystem here, but I guess it applies to most other development environments. I think we tend to talk, discuss, praise... mainly about frameworks and libraries dealing with the last cool and funny stuff (IoC, Dynamic Proxies, RIA development, REST...) and tend to forget about other smaller pieces of software tackling more mundane tasks that oddly enough are not covered by the BCL. At the end of the day these less popular libraries are so or even more important than the other "trendy ones", and given that they get quite less fuzz around them, I think I could pay them a kind of little homage (as if someone was going to read this... :-) by listing here the ones I've found myself using more intensively in the last years

  • HTMLAgilityPack. Bearing in mind the vast amount of Xml support provided by the .Net BCL (XmlDocument, XmlReader, Linq To Xml...) it's really hard to explain why there's not any html parsing and manipulation support, but hopefully, the open source HTML Agility Pack brilliantly fills the gap. It provides a powerful Html parser that will create a read/write DOM for you. It works pretty neat with not too well formed (and sadly too common) html documents.
  • ExcelDataReader Over the years I've used all sort of approaches to such a common task like reading Excel files: Interop and COM (please, avoid this by all means if your code is going to run on a server and don't want to end up with tons of excel.exe processes running in the background), using odbc or ado, saving the excel file to a CSV or TSV file and reading it with an in house built parser... but in the last times I've moved to this small library. It's not perfect (it's given me problems with some very large files), but overall I recommend it.
  • ClosedXml If reading excel data is more problematic that it should be, creating or updating excel files is much worse. Using COM is a mess (as it is for just reading) so years ago we used to leverage an in house library built by a very clever guy at work. With the advent of Office Open Xml, I expected things would be much easier, but seriously, OpenXml is a rather complicated format, and any basic manipulation involves tons of repetitive lines of code. Hopefully, some clever guy has made the effort to write a beautiful library that makes it bread and butter writing non trivial Excel OpenXml files. I've used it quite extensively, and it works like a charm, so, thumbs up and many thanks!
    Notice that (it should be obvious) it relies on Microsoft's DocumentFormat.OpenXml.dll library (you don't need to install the OpenXml SDK, just get hold of this dll).
  • MigraDoc and PDFSharp Last year I had to decide on some free tool for generating Pdf documents from .Net. The list of options is not too large, and in the end we settled on Migradoc. It perfectly served our needs (that I have to admit where not too complex, we were generating simple pdf documents), so it's well deserved a reference here.
  • MS LogParser I assume that the developers of this tool would get a fair salary from Microsoft, so probably some praise in an irrelevant blog won't have much effect on them :-) but anyway, here I go.
    Another blogger explains pretty well in a few words how powerful this tool is

    Log Parser is often misunderstood and underestimated. It could possibly be the best forensic analysis tool ever devised. Imagine having the ability to take almost any chunk of data and quickly search it using SQL-based grammar. That's Log Parser in a nutshell. It is a lightweight SQL-based search engine that operates on a staggering number of different input types

    The LogParser.exe tool comes really handy for some stand alone parsing, but in my case I needed to integrate that parsing in a custom reporting application I was developing, and for that, you can leverage all the power of LogParser through its COM interface. The functionality lives in the LogParser.dll COM library, so .Net developers we'll have to generate the .Net wrapper for this COM, by means of tlbimp:

    tlbimp "C:\Program Files (x86)\Log Parser 2.2\LogParser.dll" /out:Interop.LopParser.dll

    Now add the reference to Interop.LogParser.dll (or whatever you've called it) to your project and you're ready to go.

Sunday 6 January 2013

Pausable Async Loop

I already did a write up about async loops in JavaScript a good while ago. These days, while working on a toy project of mine, it occurred to me that it would be useful to be able to pause-continue asynchronous loops. I wrote a first implementation that didn't seem too natural to use, so I decided to write a new non pausable-continuable function to run async functions in a loop, and based on that write the pausable-continuable version.

So, the thing is that we have an asynchronous function, that once finished invokes a callback, and we want to run it n times (each new iteration will be launched when the previous async operation is done, so it will be invoked through the mentioned callback). We'll leverage for this (as so many other times) the power of closures and the arguments pseudo array.

If we don't need the pause-continue functionality, we can write a function like this:

//runs an async function n times
runInLoop = function(fn, times /*, parameters to fn, the last one is a callback, and is compulsory (but can be null)*/){
 var timesDone = 0;
 
 //remove from arguments the 2 first and the last parameters, and add a new callback
 //the new callback calls the old callback and calls the next iteration (_run)
 var _arguments = Array.prototype.slice.call(arguments, 0);
 _arguments.splice(0, 2);
 var initialCallback = _arguments.pop();
 _arguments.push(function(){
  if (typeof initialCallback == "function"){
   initialCallback();
  }
  run();
 });
  
 var run = function(){
  if (timesDone < times){
   debugPrint("iteration: " + timesDone);
   fn.apply(null, _arguments);
   timesDone++;
  }
 };
 run();
};

that will be used like this

var asyncPrint = function(txt1, txt2, onDone){
 console.log("print start: " + txt1 + " - " + txt2);
 setTimeout(function(){
  console.log("print end: " + txt1 + " - " + txt2);
  onDone();
 }, 400);
};

//callback to be invoked once asyncPrint is done
var asyncPrintOnDone = function(){
 console.log("asyncPrintOnDone");
};

runInLoop(asyncPrint, 20, "hi there", "guys", asyncPrintOnDone);

Notice that if we don't want to invoke a callback (asyncPrintOnDone) after the function is done, we need to pass a null parameter (we can't simply skip it), as our runInLoop function needs to distinguish the normal parameters from the (optional) callback. We wrap the invocation to this normal callback and the invocation to the next iteration into a new function. This new function is the one that we'll end up being passed as callback to our function when it's complete.

Now, we can improve this by adding a pause-continue functionality. For this, we'll have a factory function, that returns a new function that can be invoked (just use ()), paused and continued. This returned function is intended to be invoked (started) only once, so further invocations will have no effect (but we can pause-continue it to our will).
The function looks like this:

//creates a "pausable loop" that runs an async function "fn" for x times
runInPausableLoop = function(fn, times /*, parameters to fn, the last one is a callback, and is compulsory (but can be null(*/){
 var paused = false;
 var timesDone = 0;
 
 //remove from arguments the 2 first and the last parameters, and add a new callback
 //the new callback calls the old callback and calls the next iteration (run)
 var _arguments = Array.prototype.slice.call(arguments, 0);
 _arguments.splice(0, 2);
 var initialCallback = _arguments.pop();
 _arguments.push(function(){
  if (typeof initialCallback == "function"){
   initialCallback();
  }
  run();
 });
 var self; //I don't think this _self will be of much use, but well, in case we wanted to use a this
 //the loop should be launched only once, so use this started flag so that on second invocation it does nothing
 var started = false;
 var me = function(){
  if (!started){
   self = this;
   started = true;
   run();
  }
  return me; //to allow chaining
 };
 var run = function(){
  if (!paused && timesDone < times){
   debugPrint("iteration: " + timesDone);
   fn.apply(self, _arguments);
   timesDone++;
  }
 };
 
 //"public functions"
 me.pause = function(){
  debugPrint(fn.name + " paused");
  paused = true;
 };
 me.continue = function(){
  debugPrint(fn.name + " continued");
  paused = false;
  run();
 };
 
 return me;
};

I think the more natural (and elegant) use case is a one liner where the function is created and immediately invoked (that's why the invocation returns the function itself, so that we get a reference that will later on use to pause and stop). Anyway, for cases where you want to store the reference to the returned function for invoking later on, I've also contemplated that you could want to invoke it via myObj.myPausableAsyncLoop... and want it to properly use myObj as "this", so you'll see the "self" pattern in the code.

Let's see it in action:

//async function that we want to run n times in a pausable loop
//the last parameter is a callback to be run after each iteration
var asyncPrint = function(txt1, txt2, onDone){
 console.log("print start: " + txt1 + " - " + txt2);
 setTimeout(function(){
  console.log("print end: " + txt1 + " - " + txt2);
  onDone();
 }, 400);
};

var asyncPrintOnDone = function(){
 console.log("asyncPrintOnDone");
};

var pausablePrintLoop = runInPausableLoop(asyncPrint, 20, "hi there", "guys", asyncPrintOnDone)();

//let's try the pause-continue functionality
setTimeout(function(){
 console.log("pausing");
 pausablePrintLoop.pause();
}, 1000);

setTimeout(function(){
 console.log("continuing");
 pausablePrintLoop.continue();
}, 4000);

Note that on both cases I've tried to do the looped invocation of the async function as similar as possible to a simple invocation:

asyncPrint("hi there", "guys", asyncPrintOnDone);

runInLoop(asyncPrint, 20, "hi there", "guys", asyncPrintOnDone);

var pausablePrintLoop = runInPausableLoop(asyncPrint, 20, "hi there", "guys", asyncPrintOnDone)();

I've uploaded this to GitHub along with a couple of samples (samples to run under node so far, but it should work smoothly on any modern browser)

Update, 2012/01/31

I've done quite a few additions to this code and got it updated in GitHub:

  • We can repeat the Loop by calling its "repeat" method.
  • I've added the possibility of invoking another callback (onLoopDoneCallback) function when the loop is done (this is mainly useful for the repeat case)
  • Now I also contemplate the possibility of this callback or the onIterationDone callback been asynchronous. I identify this by annotating the function with an "isAsync" property
  • The asynchronous function to be run inside the loop could expect as first parameter the iteration index. We let the loop know about this by annotating the function with an "isLoopAware" property.