I haven't used Janet much, though I am occasionally intrigued enough to attempt something with it. A while ago I needed to do a mass refactoring of Java and Kotlin annotations, and since Instaparse didn't work in Babashka, I wrote a parsing expression grammar in Janet and emitted an Instaparse-like concrete syntax tree for piping to a Babashka script.
By default Instaparse produces CSTs as nested vectors with a node's type as the first element. For example, parsing a Java annotation like @Test
might output:
[:annotation "@" [:name "Test"]]
Nodes can also be strings, without a type, such as "@"
above.
To parse Java annotations with Janet, let's define a simple grammar that only finds annotations and leaves everything else unparsed:
(def grammar
~{:main (* (any (+ :annotation :unparsed)) -1)
:annotation (* "@" :name)
:name :w+
:unparsed (+ :w+ :s+)})
(pp (peg/match grammar "An @Annotation"))
@[]
That's an empty array, which means the PEG matched but didn't capture anything. To capture matches, we can use <-
:
(def grammar
~{:main (* (any (+ :annotation :unparsed)) -1)
:annotation (<- (* "@" :name))
:name (<- :w+)
:unparsed (<- (+ :w+ :s+))})
(pp (peg/match grammar "An @Annotation"))
@["An" " " "Annotation" "@Annotation"]
That gives us matches, but doesn't tell us what node matched. Let's create a tagged
function and embed it within the quoted grammar:
(defn tagged [tag]
~(replace (* (constant ,tag) ,tag) ,tuple))
(def grammar
~{:main (* (any (+ ,(tagged :annotation) ,(tagged :unparsed))) -1)
:annotation (<- (* "@" ,(tagged :name)))
:name (<- :w+)
:unparsed (<- (+ :w+ :s+))})
(pp (peg/match grammar "An @Annotation"))
@[(:unparsed "An") (:unparsed " ") (:annotation (:name "Annotation") "@Annotation")]
If we want to avoid the repeated "Annotation", we have to stop capturing the :annotation
node:
(defn tagged [tag]
~(replace (* (constant ,tag) ,tag) ,tuple))
(def grammar
~{:main (* (any (+ ,(tagged :annotation) ,(tagged :unparsed))) -1)
:annotation (* "@" ,(tagged :name))
:name (<- :w+)
:unparsed (<- (+ :w+ :s+))})
(pp (peg/match grammar "An @Annotation"))
@[(:unparsed "An") (:unparsed " ") (:annotation (:name "Annotation"))]
Now we have something that looks like an Instaparse syntax tree.
Java annotations can be much more complex than just a name, but aside from additional grammar rules, this demonstrates all I had to do to get a CST with the desired structure. To turn it into EDN, see the next post.
(If you're interested in Janet, I recommend the real book Janet for Mortals. It's a great overview of Janet that introduces the language without also introducing basic programming concepts, an intermediate level of software education that often feels overlooked.)