exupero's blog
RSSApps

Refactoring Instaparse trees

In the previous post I used rewrite-clj to refactor some Clojure code with functional zippers. Rewrite-clj is specific to Clojure, but when refactoring code in other languages, we can use Clojure's built-in clojure.zip on similar concrete syntax trees, such as those output by Instaparse.

By default, Instaparse produces a tree of nested vectors, in which the first element is a keyword identifying the lexical token:

[:S
 [:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]]
 [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]

We can navigate this using clojure.zip/vector-zip:

(require '[clojure.zip :as z])
(def zloc
  (z/vector-zip
    [:S
     [:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]]
     [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]))
(-> zloc z/next z/next z/node)
[:AB [:A "a" "a" "a" "a" "a"] [:B "b" "b" "b"]]

The hardest part of refactoring using Instaparse is constructing a grammar for the language to parse. For most programming languages, that's a non-trivial task.

When I write a grammar for refactoring, I include all the text in the code as nodes in the concrete syntax tree, including whitespace, which makes the CST easier to turn back into code later.

Clojure.zip doesn't have the convenience functions rewrite-clj provides for finding nodes, so let's write our own simple function to move to a node matching a predicate:

(defn find-node [zloc direction pred]
  (loop [zloc zloc]
    (if (and zloc (not (z/end? zloc)))
      (if (pred (z/node zloc))
        zloc
        (recur (direction zloc)))
      zloc)))
(-> zloc
    (find-node z/next #{:B})
    z/up
    z/node)
[:B "b" "b" "b"]

find-node moves node to node using the given direction function, returning when it lands on a node satisfying pred. If it doesn't find such a node, it returns the end of the zipper.

When navigating concrete syntax trees with clojure.zip, I haven't needed much more than this function.

Clojure.zip has several functions for changing CSTs, specifically append-child, edit, insert-child, insert-left, insert-right, remove, and replace. Most often I use edit and replace:

(-> zloc
    (find-node z/next #{:B})
    z/up
    (z/edit dedupe)
    z/root)
[:S
 [:AB [:A "a" "a" "a" "a" "a"] (:B "b")]
 [:AB [:A "a" "a" "a" "a"] [:B "b" "b"]]]

I'm typically not very particular about the exact structure of the replacement nodes I put into the tree, because the final step of turning the CST back into code I do by stringing together all the non-keywords in the tree:

(defn unparse [cst]
  (->> cst
       flatten
       (remove keyword?)
       clojure.string/join))

If the CST omits some tokens needed in the final code, such as quotes around strings, you can add processing to generate them:

(defn unparse [cst]
  (->> cst
       (clojure.walk/postwalk
         (fn [node]
           (if (and (vector? node)
                    (= :string (first node)))
             (str \" (second node) \")
             node)))
       flatten
       (remove keyword?)
       clojure.string/join))