Programming

Insta-parsing with Janet

I haven't used Janet much, though I am occasionally intrigued enough to attempt something with it. A while ago I needed to do a mass refactoring of Java and Kotlin annotations, and since Instaparse didn't work in Babashka, I wrote a parsing expression grammar in Janet and emitted an Instaparse-like concrete syntax tree for piping to a Babashka script.

By default Instaparse produces CSTs as nested vectors with a node's type as the first element. For example, parsing a Java annotation like @Test might output:

[:annotation "@" [:name "Test"]]

Nodes can also be strings, without a type, such as "@" above.

To parse Java annotations with Janet, let's define a simple grammar that only finds annotations and leaves everything else unparsed:

(def grammar
  ~{:main (* (any (+ :annotation :unparsed)) -1)
    :annotation (* "@" :name)
    :name :w+
    :unparsed (+ :w+ :s+)})

(pp (peg/match grammar "An @Annotation"))

@[]

That's an empty array, which means the PEG matched but didn't capture anything. To capture matches, we can use <-:

(def grammar
  ~{:main (* (any (+ :annotation :unparsed)) -1)
    :annotation (<- (* "@" :name))
    :name (<- :w+)
    :unparsed (<- (+ :w+ :s+))})

(pp (peg/match grammar "An @Annotation"))

@["An" " " "Annotation" "@Annotation"]

That gives us matches, but doesn't tell us what node matched. Let's create a tagged function and embed it within the quoted grammar:

(defn tagged [tag]
  ~(replace (* (constant ,tag) ,tag) ,tuple))

(def grammar
  ~{:main (* (any (+ ,(tagged :annotation) ,(tagged :unparsed))) -1)
    :annotation (<- (* "@" ,(tagged :name)))
    :name (<- :w+)
    :unparsed (<- (+ :w+ :s+))})

(pp (peg/match grammar "An @Annotation"))

@[(:unparsed "An") (:unparsed " ") (:annotation (:name "Annotation") "@Annotation")]

If we want to avoid the repeated "Annotation", we have to stop capturing the :annotation node:

(defn tagged [tag]
  ~(replace (* (constant ,tag) ,tag) ,tuple))

(def grammar
  ~{:main (* (any (+ ,(tagged :annotation) ,(tagged :unparsed))) -1)
    :annotation (* "@" ,(tagged :name))
    :name (<- :w+)
    :unparsed (<- (+ :w+ :s+))})

(pp (peg/match grammar "An @Annotation"))

@[(:unparsed "An") (:unparsed " ") (:annotation (:name "Annotation"))]

Now we have something that looks like an Instaparse syntax tree.

Java annotations can be much more complex than just a name, but aside from additional grammar rules, this demonstrates all I had to do to get a CST with the desired structure. To turn it into EDN, see the next post.

(If you're interested in Janet, I recommend the real book Janet for Mortals. It's a great overview of Janet that introduces the language without also introducing basic programming concepts, an intermediate level of software education that often feels overlooked.)

May 2023