2 is a smell


Clojure is a sequence processing languagesequences are the core abstraction between data and function. I have come to the notion that code that handles 1 or N of something is far more likely to be correct than code that handles 2 (or any fixed number > 1).

In general, any time you use the second function, your warning sensors should go off. Take a function that adds x-y vectors:

(defn add-vectors [va vb]
 [(+ (first va) (first vb)) (+ (second va) (second vb))])

(add-vectors [1 2] [3 4])
;; [4 6]

Those first and second functions should be SCREAMING at you that there is implicit structure here. Destructuring is sometimes a worthy next step that allows you to make that implicit structure explicit:

(defn add-vectors [[vax vay] [vbx vby]]
 [(+ vax vbx) (+ vay vby)])

However, we are still explicitly handling only two-dimensional vectors. The code would be better handling N-dimensional vectors instead:

(defn add-vectors [va vb]
   (vec (map + va vb)))

And I sincerely hope that you are now asking yourself, “why are we adding only two vectors and not N?”


(defn add-vectors [& vs]
 (vec (apply map + vs)))


12 Responses to “2 is a smell”
  1. Bret Young says:

    This is similar to one of my coding rules that I’ve called, “There is no 2.” By that phrase I mean that there is 1 and there are many but there is no 2. This came about after noticing many times that when you need to extend some ‘single’ case to handle more, I would see (and made the error myself) that the code would only be extended to the ‘double’ case. Most of the time this is wrong and what you really want is the ‘many’ case. Fortunately, Clojure’s abstractions make this choice clearer up front even when experience might be lacking.

  2. Alan Dipert says:

    The pattern I have noticed is 1, 2, N, particularly for associative functions. 2 as an arity is meaningful because it’s what you need to define N as a fold. Clojure’s math functions are examples of this: https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L907

    There are performance and other reasons to do things differently, but this approach has personally served me well as a way to think about associative functions with uniformly typed arguments as axioms: 1,2,N equate to base, relation, and induction.

  3. Bill Ricker says:

    In general I agree that 0 1 or N are the only safe explicit or implicit limits. I forget if it was Perlis, Hoare/Dijkstra, or Knuth that turned me onto that decades ago.

    However, there are cases where 2 is the right limit – alternation and oscillation. Sometimes If-then-else is correct solution, you shouldn’t have to always use case/switch which 0 1 N rule would require. Buffer-swapping for graphics or heap-swapping for GC requires exactly 2 instances, writing generic N version will be wrong.

  4. I’ve heard of a similar idea when designing relational database and object structures I had heard called the ‘zero, one, infinity’ rule. when designing relations between different objects and tables, you should only account for those three cases: 0 (no relation), 1 (direct relation) ‘infinite’ (or n-to-n rather).

  5. Edwin says:

    Dijkstra: Two or more, use a for.

  6. sova says:

    Hey man, very insightful. I think you just saved me about 4 years of my [future] life tracking down bugs because I did not none-one-many.

    Perhaps an interesting aside, in Arabic the counting system uses 1, 2, many. So to say 1 boy, 2 boys, and 5 boys one would say
    1 walid,
    2 walidayn
    3 ewlad
    4 ewlad
    5 ewlad

    whereas in English it looks like
    1 boy
    2 boys
    3 boys

    Thanks for the post, mate

  7. Anonymous says:

    I wrote this today

    (defmacro cond-bind [x & clauses]
    (when clauses
    (list ‘if-let [x (first clauses)]
    (if (next clauses)
    (second clauses)
    (throw (IllegalArgumentException.
    “cond-bind requires an even number of forms”)))
    (cons `(cond-bind ~x) (nthnext clauses 2)))))

    and would claim that the 2 at the end is OK. It could of course be replaced by (next (next %)), but would that be any better?

    But I agree that the rule makes sense in general. I guess there is an exception to every rule…

  8. Jed Wesley-Smith says:

    A co-worker has an apt post-work drink rule that we have since adopted and apply quite generally – “none, one or many”.

  9. This is what I like about functional langs (well just lisp that I’ve looked at so far). I remember trying to figure out HTH to do things like this in PHP with $$$$var definitions. And in C++ I could never figure out how to do basic template-y abstraction like this.

  10. It’s a good rule and I use it all the time, but you have to be careful applying it.

    I’ve seen programmers try to generalize things to N when really it should have been two-state.

    Binary/boolean logic is an obvious case — there are two states for a boolean variable. Really, I’m just pointing out that the rule is subject to developers committing the fencepost fallacy.

  11. Tagore Smith says:

    This is probably true in Clojure. I’m not sure it’s a valid general principle though- depends a bit on what set of primitives you are working with, and how you choose to build abstractions from them. There’s a long tradition of using lists positionally (but defining named selectors for their positions) in some Lisps. Thus you’ll find a number of functions in SICP that take a list and return its cadr (or cddr, or caddr, etc. ;) .) I’ve written a certain amount of Scheme that looked like this myself.

    It would be weird to do this in Clojure, given that Clojure has a rich set of data structures that act like maps. But there are times when it’s nice to be able to write programs with a very small set of primitives- you can write a working implementation of a small subset of Scheme pretty quickly in almost any environment, particularly if you’re not all that concerned with efficiency.

    I’ve actually done something quite like this in Python a fair bit recently, though admittedly for quick and dirty prototype code I knew would eventually get swapped for C. I don’t think it’s all that strange to return a pair of bezier curves from a function that splits bezier curves at a certain parameter, particularly in quick and dirty code. I’ll note that in the C that replaced it I passed in two pointers to already allocated curves. And if I were planning on keeping that code in Python I might have returned a map with the curves named “left” and “right.” So I won’t argue that that was a particularly good way of writing things, but it was convenient, and I’m not sure that writing something like split_bezier(some_bezier, u)[0] is really all that smelly. There’s maybe a faint smell there and like a pile of lightly soiled socks it could come to smell pretty bad if you piled it too high but, just like the pile of socks, good judgment and moderation (and a good nose) are key.