exupero's blog
RSSApps

A Clojure reader with string interpolation

Between str and format, I don't often feel the need for Clojure to have Ruby-style string interpolation. However, when I saw how Observable uses JavaScript's tagged template literals to let users write Markdown within JS code, it made me curious to try something similar in Clojure. The necessary changes to tools.reader turned out to be relatively minor.

Instead of parsing backticks the same way Clojure parses double quotes, we'll add a dispatch macro that handles #‘ the same way the default reader handles #" and #(:

(defn- dispatch-macros [ch])
  (case ch
    \^ read-meta                ;deprecated
    \' (wrapping-reader 'var)
    \( read-fn)
    \= read-eval
    \{ read-set}
    \< (throwing-reader "Unreadable form")
    \" read-regex
    \! read-comment
    \_ read-discard
    \? read-cond
    \: read-namespaced-map
    \# read-symbolic-value)
    \` read-backtick-string
  nil

Our syntax will also differ from JavaScript's, which denotes interpolation with ${}. We'll use a more Clojure-like syntax and parse ~{x} as x, and ~(x) as (x).

Also, we don't need support for tagged template literals, since Clojure already has user-defined reader macros that can be used on any form. Our #‘ dispatch macro will just return a list of forms it parsed, allowing reader macros to decide how those forms should be evaluated.

As a start to read-backtick-string, we'll copy read-string*:

(defn- read-backtick-string [reader _ opts pending-forms]
  (loop [sb (StringBuilder.)
         ch (read-char reader)]
    (case ch
      nil (err/throw-eof-reading reader :string sb)
      \\ (recur (doto sb (.append (escape-char sb reader)))
                (read-char reader))
      \" (str sb)
      (recur (doto sb (.append ch)) (read-char reader)))))

From this we can see a couple minor changes we'll need to make. To improve error reporting, we'll make a fork err/throw-eof-reading specifically for backtick strings:

(defn throw-eof-reading [rdr & start]
  (err/eof-error rdr
    "Unexpected EOF reading :backtick-string starting "
    (apply str "#\`" start)
    "."))

Also, to allow escaping backticks and tildes, which are part of our syntax, we need to update escape-char:

(defn- escape-char [sb rdr]
  (let [ch (read-char rdr)]
    (case ch
      \` "`"
      \~ "~"
      \t "\t"
      \r "\r"
      \n "\n"
      \\ "\\"
      \" "\""
      \b "\b"
      \f "\f"
      \u (let [ch (read-char rdr)]
           (if (== -1 (Character/digit (int ch) 16))
             (err/throw-invalid-unicode-escape rdr ch)
             (read-unicode-char rdr ch 16 4 true)))
      (if (numeric? ch)
        (let [ch (read-unicode-char rdr ch 8 3 false)]
          (if (> (int ch) 0377)
            (err/throw-bad-octal-number rdr)
            ch))
        (err/throw-bad-escape-char rdr ch)))))

The rest of our changes will be isolated to read-backtick-string. We'll stop reading at a closing backtick rather than a closing double quote, and we'll collect a vector of forms to return:

(defn- read-backtick-string [reader _ opts pending-forms]
  (loop [sb (StringBuilder.)
         ch (read-char reader)
         forms []]
    (case ch
      nil (throw-eof-reading reader sb)
      \\ (recur (doto sb (.append (escape-char sb reader)))
                (read-char reader)
                forms)
      \` (str sb)
      (recur (doto sb (.append ch))
             (read-char reader)
             forms))))

Now for the heavy lifting. When the reader encounters a ~, it should read the next character to determine if it's reading a curly brace form or a parentheses form. If we see a left curly brace, we'll read one form and look for a closing curly brace. If we see an open paren, we'll back up one character to capture it, then read one form. If any other character follows a ~, we'll append the tilde and character to the current string buffer, ignoring any interpolation. The resulting code looks like this:

(defn- read-backtick-string [rdr _ opts pending-forms]
  (loop [sb (StringBuffer.)
         ch (read-char rdr)
         forms []]
    (case ch
      nil (throw-eof-reading rdr (str forms))
      \\ (recur (doto sb (.append (escape-char sb rdr)))
                (read-char rdr)
                forms)
      \~ (let [ch (read-char rdr)]
           (case ch
             \{ (let [form (read* rdr true nil opts pending-forms)]
                  (if (= (read-char rdr) \})
                    (recur (conj forms (str sb) form)
                           (StringBuffer.)
                           (read-char rdr))
                    (throw-eof-reading rdr (str forms))))
             \( (let [form (read* (doto rdr (unread ch))
                                  true nil opts pending-forms)]
                  (recur (conj forms (str sb) form)
                         (StringBuffer.)
                         (read-char rdr)))
             (recur (doto sb
                      (.append \~)
                      (.append ch))
                    (read-char rdr)
                    forms)))
      \` (conj forms (str sb))
      (recur (doto sb (.append ch))
             (read-char rdr)
             forms))))

The resulting vector of strings and unevaluated forms can be passed to a reader macro for further handling. I've mostly used this with a #md macro for authoring Markdown in Clojure files; the macro function only has to evaluate forms and concatenate the resulting values into a single string.

Alternatively, this kind of string interpolation could be done by a macro on a string, rather than requiring a special reader. In the case of using regular strings with double quotes, any double quotes within interpolated values will have to be escaped.

A fork of tools.reader with the above functionality is available here. Suggestions welcome.