Skip to main content

Visitor pattern considered harmful (unless functionalized)

To kick off my first post, I thought I'd cover a post on another blog that I found interesting.

Basically, the gist of the post is that object oriented visitor pattern does not lend itself to the problem of walking trees, and general symbolic manipulation, despite that being its claim to fame. In general, algebraic data types and pattern matching are much simpler, and more concise.

I agree, however I think that post goes a bit too far in saying that visitors simply can't do this elegantly, because in fact there's a straightforward translation of a pattern matching function into a visitor, as long as you keep your objects "functional", ie. treat them as variants.

For instance, let's say we are constructing an evaluator:

//the Expression type
public abstract class Exp
{
public abstract Val Visit(IVisitor v);
}

//the type of Values
public abstract class Val : Exp
{
}

//the type representing an application
public sealed class App : Exp
{
Exp e0;
Exp e1;

public App(Exp e0, Exp e1)
{
this.e0 = e0;
this.e1 = e1;
}

public override Val Visit(IVisitor v)
{
return v.App(e0, e1);
}
}

//the pattern matching visitor
public interface IVisitor
{
Val App(Exp e0, Exp e1);
//...
}


This is slightly different from the typical visitor pattern you see in books, where the object itself would be passed in to IVisitor.App(). In fact, this looks almost exactly like a pattern matching function in a language like OCaml:

type Exp = App of Exp * Exp | Val   (* | ... *)

let rec eval exp =
match exp with
App e0 e1 -> (* ... evaluate the application *);;


Thus, properly designed, a visitor is simply a pattern matching function. By convention, we use the class name as the visitor's method, so it matches with the convention in functional language's (ie. the constructor name). The visitor method's parameters are the object's private fields, just like a variant. I'm using this approach for an interpreter I'm implementing and it works quite well (unfortunately no source is publicly available at the moment).

In terms of efficiency, the pattern matching function is probably faster. After all, it's simply a tag check plus a branch, whereas the visitor must perform a double dispatch (two method lookups incurring two indirections and two indirect branches).

Clearly, the visitor is also far more verbose, as you must create the "variant classes", manually add a "Visit" method to each, and since there is no "default" case as in pattern matching, like the "default" case in a switch statement, you have to exhaustively handle each variant explicitly in each visitor, even if the case isn't relevant. You can mitigate the latter problem by implementing a "Default" abstract base class with dummy implementations on all cases, and then override only the cases you are interested in:

public abstract class Default : IVisitor
{
//overridden in my Interpreter visitor
public virtual Val App(Exp e0, Exp e1)
{
throw new NotSupportedException();
}
}

As a potentially interesting extension, the variant classes may not even have to be in the same inheritance hierarchy, they only have to implement a marker interface with the appropriate Visit() method. This may increase extensibility along similar lines to polymorphic variants; only the visitor has to be kept up to date with all the relevant cases, and it can evolve separately from the core.

Summary:

So if you're doing symbolic manipulation, and are forced to operate in a OO language like C# above, you can still approximate the elegance of algebraic data types and pattern matching, it just takes a little more work on your part. Pretty much par for the course in the OO world. :-)

Comments

Popular posts from this blog

async.h - asynchronous, stackless subroutines in C

The async/await idiom is becoming increasingly popular. The first widely used language to include it was C#, and it has now spread into JavaScript and Rust. Now C/C++ programmers don't have to feel left out, because async.h is a header-only library that brings async/await to C! Features: It's 100% portable C. It requires very little state (2 bytes). It's not dependent on an OS. It's a bit simpler to understand than protothreads because the async state is caller-saved rather than callee-saved. #include "async.h" struct async pt; struct timer timer; async example(struct async *pt) { async_begin(pt); while(1) { if(initiate_io()) { timer_start(&timer); await(io_completed() || timer_expired(&timer)); read_data(); } } async_end; } This library is basically a modified version of the idioms found in the Protothreads library by Adam Dunkels, so it's not truly ground bre

Building a Query DSL in C#

I recently built a REST API prototype where one of the endpoints accepted a string representing a filter to apply to a set of results. For instance, for entities with named properties "Foo" and "Bar", a string like "(Foo = 'some string') or (Bar > 99)" would filter out the results where either Bar is less than or equal to 99, or Foo is not "some string". This would translate pretty straightforwardly into a SQL query, but as a masochist I was set on using Google Datastore as the backend, which unfortunately has a limited filtering API : It does not support disjunctions, ie. "OR" clauses. It does not support filtering using inequalities on more than one property. It does not support a not-equal operation. So in this post, I will describe the design which achieves the following goals: A backend-agnostic querying API supporting arbitrary clauses, conjunctions ("AND"), and disjunctions ("OR"). Implemen

Easy Automatic Differentiation in C#

I've recently been researching optimization and automatic differentiation (AD) , and decided to take a crack at distilling its essence in C#. Note that automatic differentiation (AD) is different than numerical differentiation . Math.NET already provides excellent support for numerical differentiation . C# doesn't seem to have many options for automatic differentiation, consisting mainly of an F# library with an interop layer, or paid libraries . Neither of these are suitable for learning how AD works. So here's a simple C# implementation of AD that relies on only two things: C#'s operator overloading, and arrays to represent the derivatives, which I think makes it pretty easy to understand. It's not particularly efficient, but it's simple! See the "Optimizations" section at the end if you want a very efficient specialization of this technique. What is Automatic Differentiation? Simply put, automatic differentiation is a technique for calcu