exupero's blog
RSSApps

Transducers

Part of my original interest in lisp came from its name being short for "list processing". I didn't realize until later that "list processing" only referred to its syntax, and I thought the term might have to do with how lisp handled sequential data, what I would now call "stream processing". While Clojure's built-in collection functions did teach me a lot, my original interest wasn't really satisfied until the introduction of transducers.

The beauty of transducers is that they encapsulate stateful, imperative, and mutative logic behind a simple, composable interface. For me they were a bit difficult to understand at first, but hands-on experimentation made a huge difference, so in the next few posts I'll demo some of the custom transducers I've written, though I admit that the majority haven't become particularly crucial to my day-to-day programming. Generally my needs are met by either the transducing arities of Clojure's core functions or by Christophe Grand's xforms library.

Probably the most useful transducer I've written and not found elsewhere is split-by:

(defn split-by [pred]
  (fn [rf]
    (let [buffer (volatile! nil)]
      (fn
        ([] (rf))
        ([res] (rf (rf res @buffer)))
        ([res item]
         (if-let [buffer' @buffer]
           (if (pred item)
             (do
               (vreset! buffer [item])
               (rf res buffer'))
             (do
               (vswap! buffer conj item)
               res))
           (do
             (vreset! buffer [item])
             res)))))))

split-by is similar to Clojure core's partition-by transducer, but it splits the sequence whenever the given predicate returns true, rather than every time it returns a new value. My most common uses for it are splitting a body of text on blank lines or grouping the lines of multiline log messages by a pattern on the first line.

Transducers can feed zero, one, or many values to the reducer. map always supplies a value, while split-by above acts like filter and sometimes feeds in one value, sometimes none. For an example of a transducer that feeds in multiple values, see mapcat. Notice that we can use volatile references in transducers because a transducer can't be used from multiple threads at he same time.

In the next post we'll look at some more workaday transducers.