exupero's blog
RSSApps

More flexible rewrite rules

In the previous post we transformed stack operations using substitution sequences and patterns. In this post we'll generalize it using protocols, which will let us make new kinds of rewrite rules.

Here's the protocol we'll use:

(defprotocol RewriteRule
  (length [_])
  (replacement [_ ops]))

And an updated rewrite function that uses it:

(defn rewrite [ops rule]
  (let [len (length rule)
        [[i r]] (sequence
                  (comp
                    (map-indexed
                      (fn [i ops-window]
                        [i (replacement rule ops-window)]))
                    (filter second))
                  (partition len 1 ops))]
    (if r
      (splice ops i (+ i len) r)
      ops)))

This version of rewrite takes a rule rather than a pattern and a replacement, and it asks the rule how many operations it needs to match against and what its replacement is for those operations. Figuring out bindings will be the job of a rewrite rule.

We can generate a rule with reify. Here's a function that creates the pattern replacemetns we used in the previous posts.

(defn pattern [pat replacement]
  (reify RewriteRule
    (length [_]
      (count pat))
    (replacement [_ ops]
      (when-let [binds (bindings pat ops)]
        (clojure.walk/postwalk-replace binds replacement)))))
(rewrite '[9 * 5 / 32 +]
         (pattern '[9 *] '[dup [10 *] dip -]))
[dup [10 *] dip - 5 / 32 +]

Now let's add a new kind of rewrite rule, one that we couldn't have written with pattern matching. This one will factor whole numbers into smaller multiplication steps:

(def factor
  (reify RewriteRule
    (length [_] 1)
    (replacement [_ [n]]
      (when (integer? n)
        (when-let [[x y] (some #(when (and (not= n %)
                                           (zero? (mod n %)))
                                  [% (/ n %)])
                               (range 2 11))]
          [x y '*])))))

Updating rewrite-first-rule and rewrites to use rules rather than pattern and replacement pairs lets us apply this rule as many times as it makes a difference.

(rewrites '[3600 *] [factor])
[2 2 2 2 3 3 5 5 * * * * * * * *]

Combine with a pattern rule to multiply by each number one at a time:

(rewrites '[3600 *]
          [factor
           (pattern '[?x * *] '[* ?x *])])
[2 * 2 * 2 * 2 * 3 * 3 * 5 * 5 *]

If you prefer your multiplication to start with greater numbers first, we can write an ordered-multiplication rule:

(def ordered-multiplication
  (reify RewriteRule
    (length [_] 4)
    (replacement [_ ops]
      (when-let [{:syms [?x ?y]} (bindings '[?x * ?y *] ops)]
        (when (< ?x ?y)
          [?y '* ?x '*])))))
(rewrites '[3600 *]
          [factor
           (pattern '[?x * *] '[* ?x *])
           ordered-multiplication])
[5 * 5 * 3 * 3 * 2 * 2 * 2 * 2 *]

Multiplying by 5, 5, 3, 3, 2, 2, 2, and 2 doesn't seem like the easiest way to multiply by 3600 mentally, so we can try more kinds of rules in hopes that rewrites finds a better algorithm. Here's a rule that breaks out an integer's highest place value:

(def expand
  (reify RewriteRule
    (length [_] 1)
    (replacement [_ [n]]
      (when (and (integer? n) (< 10 n))
        (let [p (int (Math/pow 10 (int (Math/log10 n))))
              m (rem n p)]
          (when (pos? m)
            [(- n m) m '+]))))))
(rewrites '[3600 *] [expand])
[3000 600 + *]

And here's one to factor out the greatest common divisor of two numbers that are added together:

(defn gcd [a b]
  (if (zero? b)
    a
    (recur b (mod a b))))
(def factor-gcd
  (reify RewriteRule
    (length [_] 3)
    (replacement [_ ops]
      (when-let [{:syms [?x ?y]} (bindings '[?x ?y +] ops)]
        (when (and (integer? ?x) (integer? ?y))
          (let [common-divisor (gcd ?x ?y)]
            (when (< 1 common-divisor)
              [(/ ?x common-divisor)
               (/ ?y common-divisor)
               '+
               common-divisor
               '*])))))))
(rewrites '[3000 600 + *] [factor-gcd])
[5 1 + 600 * *]

Using those together with some hand-crafted pattern rules gives a somewhat different algorithm than the sequence of multiplications above:

(rewrites '[3600 *]
          [expand
           factor-gcd
           (pattern '[?x * *] '[* ?x *])
           (pattern '[600 *] '[6 100 * *])
           (pattern '[6 *] '[5 1 + *])
           (pattern '[?x ?y + *] '[dup [?x *] dip ?y * +])
           (pattern '[1 *] [])
           (pattern '[dup [?x *] dip +] '[dup ?x * +])])
[dup 5 * + dup 5 * + 100 *]

Is this better than multiplying by 5 twice, 3 twice, and 2 four times? Perhaps marginally. It's turned the two pairs of multiplication by 2 and 3 (multiplication by 6) into multiplication by 5 plus the current number, which the above algorithm does twice before multiplying by 100. Some practitioners might find that easier, but maybe there's a better way we haven't found yet? Play around with rewrite rules and see what you can find.

A word of caution, however: watch out that no rules undo each other or expand parts of each other, otherwise rewrites will fall into an infinite loop. We can defend against that by tracking what sequences of operations we've seen and terminating if there's a repeat, and by setting a limit on the number of iterations. Or you can just split rules that might interact infinitely into separate batches, like so:

(-> '[3600 *]
  (rewrites [factor
             (pattern '[?x * *] '[* ?x *])
             ordered-multiplication])
  (rewrites [(pattern '[3 * 3 *] '[9 *])]))
[5 * 5 * 9 * 2 * 2 * 2 * 2 *]