Programming

Boolean flags hide two functions in one

Early in my programming career I remember a senior developer arguing that functions generally shouldn't have boolean parameters. He believed the presence of a boolean parameter indicated two functions being forced into one and instead of having call sites pass a true or false to a single function the function should be split in two. That's not a hard and fast rule I program by today, but it has sometimes helped me diagnose why certain code feels awkward.

Here's a structure I encountered the other day:

(defn update-data [new-data save?]
  ; update in-memory data
  ...
  (when save?
    ; update data in database
    ...))

In a function like this, the boolean flag isn't necessary. The function can easily (and maybe should) separate into two functions called, say, update-in-memory-data and update-db-data. Anywhere that calls update-data can call update-in-memory-data, and anywhere that calls it with a save? parameter of true can follow the call to update-in-memory-data with a call to update-db-data. The code won't be quite as succinct, but it will be more transparent and self-documenting. It will also be easier to insert logic between the in-memory update and the database update (though whether or not that's desirable depends on the situation).

Not every boolean flag is used so simply, of course, especially not in the long run, as in my experience functions with boolean flags tend to accumulate logic. The presence of a flag suggests that the function is the right place to put any logic that depends on such a flag, which results in more and more code piling into the function, whether it should or not. While I don't aggressively factor out boolean flags, I will often try to nip them in the bud to prevent functions from developing longer, branchier logic.

For example, the update-data function above may suggest that in-memory updates need to be performed together with database updates, which can result in operations being interleaved:

(defn update-data [new-data save?]
  ; update in-memory data
  ...
  (when save?
    ; update data in database
    ...)
  ; write in-memory data to cache
  ...
  (when save?
    ; flush writes to database
    ...)
  ; update subscribers of in-memory data
  ...
  (when save?
    ; trigger database indexing
    ...))

In this case, the function can be broken out into smaller chunks which can be used by two new functions:

(defn update-in-memory-data [new-data]
  (update-in-memory-data* new-data)
  (write-in-memory-data* new-data)
  (update-subscribers* new-data))

(defn update-db-data [new-data]
  (update-in-memory-data* new-data)
  (update-database-data* new-data)
  (write-in-memory-data* new-data)
  (flush-database-writes* new-data)
  (update-subscribers* new-data)
  (trigger-db-indexing* new-data))

Most interleaved logic won't be so easy to separate, and you may need a more complex approach, such as polymorphic dispatch or the template pattern. The effort may not be worth it. Functions that aren't used much may offer no benefits from being split, but a function that's collected lots of logic frequently has useful tidbits hiding in it, and splitting out the logic can make the code more understandable, more reusable, and more easily modified without breaking existing uses.

July 2023