| 20.1 Interpreting Jsonnet | 381 |
| 20.2 Jsonnet Language Features | 381 |
| 20.3 Parsing Jsonnet | 383 |
| 20.4 Evaluating the Syntax Tree | 392 |
| 20.5 Serializing to JSON | 397 |
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
case Expr.Str(s) => Value.Str(s)
case Expr.Dict(kvs) => Value.Dict(kvs.map((k, v) => (k, evaluate(v, scope))))
case Expr.Plus(left, right) =>
val Value.Str(leftStr) = evaluate(left, scope)
val Value.Str(rightStr) = evaluate(right, scope)
Value.Str(leftStr + rightStr)
20.1.scala
Snippet 20.1: evaluating a syntax tree using pattern matching
This chapter builds upon the simple parsers we learned in Chapter 19: Parsing Structured Text, and walks you through the process of implementing a simple programming language in Scala.
Working with programming language source code is a strength of Scala: parsing, analyzing, compiling, or interpreting it. This chapter should will you how easy it is to write a simple interpreter to parse and evaluate program source code in Scala. Even if your goal is not to implement an entirely new programming language, these techniques are still useful: for writing linters, program analyzers, query engines, and other such tools.
The goal of this chapter is to implement an interpreter for a subset of the Jsonnet programming language.
|
|
Jsonnet is a simple language meant to construct JSON configuration files: the
output of evaluating a .jsonnet file is a single JSON structure containing
dictionaries, lists, strings, numbers, and booleans. The output JSON can then be
used to configure Kubernetes,
CloudFormation,
Terraform, or other software systems.
While you can construct JSON structures in any programming language, Jsonnet makes it much more convenient than doing it in a general-purpose language like Java or Python. Jsonnet is used heavily in industry in order to manage large and complex system configurations.
Our interpreter will roughly follow the following steps:
String into a syntax tree structure called
ExprExpr into a Jsonnet value called ValueValue into an output JSON StringWe can visualize these steps as follows:
This chapter will walk you through all three phases of implementing our simple interpreter, giving you an intuition for the techniques, data structures and algorithms involved. While not a comprehensive or production-ready implementation, this chapter should give you enough foundation to get started with your own simple language-related projects.
Jsonnet is similar to JSON, but introduces the constructs that help you tidy up
verbose or repetitive portions of your JSON config: local variables, function
definitions, and basic operations like concatenating strings with +. You can
learn more about the syntax and experiment with the language interactively on
the official website:
For the purpose of this exercise, we will stop at implementing these three simple features. Further features and improvements can be implemented the same way, and we will leave the task of exhaustively implementing the entire Jsonnet language to production intepreters like google/jsonnet.
Jsonnet has a similar set of primitives as JSON. For this chapter, we will consider just a subset of them:
"hello world" // strings
{"key": "value", "thing": "123"} // dictionaries
20.4.jsonnet
We will assume that strings do not contain escape sequences like \n or \",
and leave out additional data types like numbers, booleans, arrays, or null. We
will also assume that strings only support a single operation: concatenation via
+.
|
|
You can define locals using the local keyword, and define local functions by
combining the local keyword with a function expression. As in most
programming languages, functions are called using parentheses containing the
arguments you wish to pass:
|
|
|
|
Local variables and functions can also return structured data in dictionaries:
|
|
The language features described here can be combined in arbitrary ways, e.g. a function can be called inside a dictionary value. And that function may itself return a dictionary, that gets nested within:
|
|
|
|
This quick tour is only a small subset of the full Jsonnet language, but will be enough for this chapter.
First, let us look at parsing:
To parse Jsonnet, we will be using the FastParse library introduced in Chapter 19: Parsing Structured Text.
We will do all our work inside the Scala CLI REPL. Let's start by defining
a syntax tree of what our minimal Jsonnet syntax looks like: strings,
dictionaries, functions, and local definitions. We will use a Scala enum:
Exprs.scala20.13.scalaenumExpr:caseStr(s:String)caseIdent(name:String)casePlus(left:Expr,right:Expr)caseDict(pairs:Map[String,Expr])caseLocal(name:String,assigned:Expr,body:Expr)caseFunc(argNames:Seq[String],body:Expr)caseCall(expr:Expr,args:Seq[Expr])
Here, the Expr data structure (short for "expression") represents the
meaningful parts of the Jsonnet syntax. As stated earlier, we want to be able to
support strings with +, dictionaries, local identifiers, and function
definitions and calls. Each of these language constructs maps straightforwardly
to one of the case classes in the Expr AST.
The Expr type is recursive: a Plus is an Expr, but also contains
left: Expr, right: Expr within it. Dict is an Expr that contains a
pairs: Map[String, Expr].
Here are some example snippets and what we expect them to parse to.
|
|
|
|
|
|
|
|
As Jsonnet syntax is recursive, we expect to be able to parse any combination of these language features combined together.
To begin using FastParse, we will add the dependency, and use the following imports:
$ scala --import com.lihaoyi::fastparse:3.1.1 --repl
> import fastparse.*, MultiLineWhitespace.*
20.18.scala
For this exercise, we will be using FastParse's support for automatic whitespace
skipping, using MultiLineWhitespace. That lets us skip all spaces, tabs, and
newlines automatically when parsing so we can write our grammar without needing
an explicit ws parser like we did in Chapter 19: Parsing Structured Text.
It is simplest to parse the non-recursive parts of our Jsonnet syntax, often
called "terminals". Of the case classes defined above, only two of them are
terminals: Str and Ident.
First, let's write the parser for Str. We will ignore escape characters for
simplicity, meaning a string is simply a ", followed by zero-or-more non-"
characters, closed by another ":
> def str[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" ).map(Expr.Str(_))
> fastparse.parse("\"hello\"", str(using _))
res0: fastparse.Parsed[Expr.Str] = Success(value = Str("hello"), index = 7)
> fastparse.parse("\"hello world\"", str(using _))
res1: fastparse.Parsed[Expr.Str] = Success(
value = Str("hello world"),
index = 13
)
> fastparse.parse("\"\"", str(using _))
res2: fastparse.Parsed[Expr.Str] = Success(value = Str(""), index = 2)
> fastparse.parse("123", str(using _))
res3: fastparse.Parsed[Expr.Str] = Parsed.Failure(Position 1:1, found "123")
20.19.scala
Note how we use the ~~/ operator after the "\"" open quote. The ~~ means
we do not want to consume whitespace here (since we are inside a string) and the
/ is a FastParse Cut meaning we want
to avoid backtracking if the parse fails. This is because only string literals
can start with a double quote character ", so the parser does not need to
backtrack and try other alternatives if the parse fails after seeing a ". That
helps the parser report more informative error messages when the parse fails.
Identifiers are similar: an alphabet or underscore, followed by zero or more alphanumeric characters.
> def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident(_))
> fastparse.parse("hello", ident(using _))
res4: fastparse.Parsed[Expr.Ident] = Success(
value = Ident("hello"),
index = 5
)
> fastparse.parse("123", ident(using _)) // Identifiers cannot start with a number
res5: fastparse.Parsed[Expr.Ident] = Parsed.Failure(Position 1:1, found "123")
20.20.scalaNow that we have the ability to parse the terminal non-recursive syntax tree
nodes, Strs and Idents, we need to move on to the recursive parts of the
syntax tree. The simplest recursive syntax tree node is Expr.Plus, used to
model the a + b syntax.
The Expr.Plus nodes representing + and all other case classes in our
syntax tree have a recursive definition. Plus contains left and right Exprs,
but an Expr could be a Plus node. This can be solved by making our parsers
recursive, as follows:
> {
def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map:
case (left, rights) => rights.foldLeft(left)(Expr.Plus(_, _))
def plus[T: P] = P( "+" ~ prefixExpr )
def prefixExpr[T: P] = P( str | ident )
def str[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
.map(Expr.Str(_))
def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
.map(Expr.Ident(_))
}
20.21.scala
In the above snippet, we define an expr parser that contains a prefixExpr,
followed by zero or more plus parsers, each of which parses a + followed by
another prefixExpr. The expr parser is recursive, so to satisfy the compiler
we need to annotate its type as : P[Expr] representing a parser that produces
an Expr value on success.
That lets us parse the following:
|
|
|
|
Note that prefixExpr can be either a str or ident, but cannot be an expr
node with its own plus nodes within. That is because any plus nodes will
already get parsed by the existing plus.rep parser. We also cannot simply
define plus as expr ~ "+" ~ expr: this is because the plus parser would
then be Left Recursive, causing an infinite recursion at parse time. For more
details, see the following article:
Since prefixExpr is a P[Expr], and plus is a P[Expr], the combined
prefixExpr ~ plus.rep would by default be a P[(Expr, Seq[Expr)]: a parser
that produces a tuple of (Expr, Seq[Expr]). The first item in the tuple
represents the Expr parsed by the first prefixExpr, and the second item
represents the Seq[Expr] parsed by the plus.rep, each repetition of which
produces an Expr. To allow our def expr parser to produce a single Expr,
we need to transform the produced (Expr, Seq[Expr]) using .map. The
foldLeft expression:
rights.foldLeft(left)(Expr.Plus(_, _))
This can also be written out more verbosely as follows:
rights.foldLeft(left)((lhs, rhs) => Expr.Plus(lhs, rhs))
foldLeft starts off with the value left, and for every item in the rights
list, it wraps both of them in a Plus node. This builds a tree of Plus nodes
from the flat list of rights. In a more sophisticated parser the shape of the
tree may depend on the precedence of the operators, but for this exercise we end
up building a simple left-associative binary tree.
We can test this out as follows:
> fastparse.parse("a + b", expr(using _))
res6: fastparse.Parsed[Expr] = Success(
value = Plus(left = Ident("a"), right = Ident("b")),
index = 5
)
> fastparse.parse("""a + " " + c""", expr(using _))
res7: fastparse.Parsed[Expr] = Success(
value = Plus(
left = Plus(left = Ident("a"), right = Str(" ")),
right = Ident("c")
),
index = 11
)
20.24.scalaExpr.Dict nodes are also recursive, with each comma-separated key-value pair
containing an String key and an Expr value. We can extend our parser to
handle those via:
def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map:
case (left, items) => items.foldLeft(left)(Expr.Plus(_, _))
def plus[T: P] = P( "+" ~ prefixExpr )
- def prefixExpr[T: P] = P( str | ident )
+ def prefixExpr[T: P] = P( str | ident | dict )
+ def dict[T: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" )
+ .map(kvs => Expr.Dict(kvs.toMap))
- def str[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
- .map(Expr.Str(_))
+ def str[T: P] = P( str0 ).map(Expr.Str(_))
+ def str0[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
.map(Expr.Ident(_))
20.25.scala
The three main changes we have here are:
Adding dict as one of the alternatives in def prefixExpr
Extracting the str0 parser from str, where str0 returns the raw String
that was parsed, while str wraps it in an Expr.Str syntax tree node. We do
this because Expr.Dict keys have the same syntax as Expr.Strs, but do not
need to be wrapped in Expr.Str nodes.
Parsing dictionary literals as a {, followed by zero-or-more key-value pairs
separated by ,s. Each key-value pair is a str0, followed by :, followed by
any expr. This repetition of key-value pairs would by default give us a
Seq[(String, Expr)], which we need to .map on to transform into an
Expr.Dict node.
We can test these changes as follows:
> fastparse.parse("""{"a": "b", "cde": id, "nested": {}}""", expr(using _))
res8: fastparse.Parsed[Expr] = Parsed.Failure(Position 1:1, found "{\"a\": \"b\",")
20.26.scalaAdding the parsers for func, local and call to this code requires the
following changes:
def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep).map:
case (left, items) => items.foldLeft(left)(Expr.Plus(_, _))
def plus[T: P] = P( "+" ~ prefixExpr )
- def prefixExpr[T: P] = P( str | ident | dict )
+ def prefixExpr[T: P]: P[Expr] = P( callExpr ~ call.rep ).map:
+ case (left, items) => items.foldLeft(left)(Expr.Call(_, _))
+ def callExpr[T: P] = P( str | dict | local | func | ident )
def dict[T: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" )
.map(kvs => Expr.Dict(kvs.toMap))
+ def call[T: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )
+ def local[T: P] =
+ P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr ).map(Expr.Local.apply)
+ def func[T: P] =
+ P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr ).map(Expr.Func.apply)
def str[T: P] = P( str0 ).map(Expr.Str(_))
def str0[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
- def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
- .map(Expr.Ident(_))
+ def ident[T: P] = P( ident0 ).map(Expr.Ident(_))
+ def ident0[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
20.27.scala
The main changes here are:
Turning prefixExpr into a callExpr followed by one or more function calls
suffix
Splitting out ident0 from ident, since func uses the same syntax as
ident for parsing its parameter list but does not need the identifiers to be
boxed into Expr.Ident nodes
Defining a function call suffix as an open (, a comma-separated list of
exprs, and a closing )
Defining function and local: each starting with a keyword, followed by
their respective syntaxes
Like we saw earlier when parsing Plus nodes, we use foldLeft to convert the
(Expr, Seq[Expr]) tuple that the callExpr ~ call.rep parser gives us into
nested Call nodes.
Wrapping all of this up into an object Parser for tidyness gives us:
Parse.scala20.28.scala//| moduleDeps: [./Exprs.scala]//| mvnDeps://| - com.lihaoyi::fastparse:3.1.1objectParser:importfastparse.*,MultiLineWhitespace.*defexpr[T:P]:P[Expr]=P(prefixExpr~plus.rep).map:case(e,items)=>items.foldLeft(e)(Expr.Plus(_,_))defprefixExpr[T:P]:P[Expr]=P(callExpr~call.rep).map:case(e,items)=>items.foldLeft(e)(Expr.Call(_,_))defcallExpr[T:P]=P(str|dict|local|func|ident)defstr[T:P]=P(str0).map(Expr.Str(_))defstr0[T:P]=P("\""~~/CharsWhile(_!='"',0).!~~"\"")defident[T:P]=P(ident0).map(Expr.Ident(_))defident0[T:P]=P(CharIn("a-zA-Z_")~~CharsWhileIn("a-zA-Z0-9_",0)).!defdict[T:P]=P("{"~/(str0~":"~/expr).rep(0,",")~"}").map(kvs=>Expr.Dict(kvs.toMap))deflocal[T:P]=P("local"~/ident0~"="~expr~";"~expr).map(Expr.Local.apply)deffunc[T:P]=P("function"~/"("~ident0.rep(0,",")~")"~expr).map(Expr.Func.apply)defplus[T:P]=P("+"~prefixExpr)defcall[T:P]=P("("~/expr.rep(0,",")~")")
We can test the parser manually and see that it does what we want:
> fastparse.parse("""{"a": "A", "b": "bee"}""", Parser.expr(using _))
res9: fastparse.Parsed[Expr] = Success(
value = Dict(Map("a" -> Str("A"), "b" -> Str("bee"))),
index = 22
)
> fastparse.parse("""f()(a) + g(b, c)""", Parser.expr(using _))
res10: fastparse.Parsed[Expr] = Success(
value = Plus(
left = Call(
expr = Call(expr = Ident("f"), args = List()),
args = List(Ident("a"))
),
right = Call(expr = Ident("g"), args = List(Ident("b"), Ident("c")))
),
index = 16
)
20.29.scala
The syntax of our programming language is recursive: local, function,
plus, and dict expressions can contain other expressions, nested arbitrarily
deeply. We can test this out by feeding such nested examples into the expr
parser:
> val input = """local thing = "kay"; {"f": function(a) a + a, "nested": {"k": v}}"""
> fastparse.parse(input, Parser.expr(using _))
res11: fastparse.Parsed[Expr] = Success(
value = Local(
name = "thing",
assigned = Str("kay"),
body = Dict(
Map(
"f" -> Func(
argNames = List("a"),
body = Plus(left = Ident("a"), right = Ident("a"))
),
"nested" -> Dict(Map("k" -> Ident("v")))
)
)
),
index = 65
)
20.30.scalaNow that we have a syntax tree of Expr nodes, the next step is to evaluate the
syntax tree to produce a runtime data structure of Values.
We will define Value as follows:
Values.scala20.31.scalaenumValue:caseStr(s:String)caseDict(pairs:Map[String,Value])caseFunc(call:Seq[Value]=>Value)
Note that while the Expr syntax tree contains nodes that represent
identifiers, locals, function applications, and so on, a Value can only be a
Str, a Dict, or a Func. This is similar to the ujson.Value we saw in
Chapter 8: JSON and Binary Data Serialization, but simplified and with the
addition of the Func class.
The key thing to know about Value is that it doesn't matter where a Value
came from. For example, a Value.Str could have been:
Expr.Str in the source codeExpr.FuncExpr.LocalExpr.Plus nodeRegardless of where the Value.Str was produced syntactically, it is still the
same Value.Str. The final Value for the entire Jsonnet program will later be
converted to a JSON string as the output.
The contents of Value.Str and Value.Dict should be self explanatory.
Value.Func is a bit less obvious: by defining it as
Func(call: Seq[Value] => Value), we are saying
"a function is something you can pass a list of argument values to and returns a value".
We will see how to instantiate these Value.Func nodes later.
The basic task here is to write a function that converts an Expr into a
Value. Naively, such a function would look something like this:
def evaluate(expr: Expr): Value
However, the Value returned by evaluating an Expr isn't only dependent on
the contents of that Expr: it also depends on the enclosing scope. Consider a
loose identifier:
foo
The value of this Expr.Ident doesn't just depend on the name of the identifier
("foo"), but also on what value was bound to that name. For example. this
binding may be done via a local declaration:
local foo = "I am Cow";
foo
20.32.jsonnet
This mapping of names to Values is often known as the "lexical scope" within
your program. Thus we might instead define evaluate as:
def evaluate(expr: Expr, scope: Map[String, Value]): Value
From here we can start fleshing it out.
The simplest case is literal Expr.Str, which are evaluated to the identical
Value.Str:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
case Expr.Str(s) => Value.Str(s)
20.33.scala
Literal dictionaries are also straightforward: Expr.Dicts become
Value.Dicts, with the same keys, except we need to evaluate each value into
its corresponding expression:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
case Expr.Str(s) => Value.Str(s)
+ case Expr.Dict(kvs) => Value.Dict(kvs.map((k, v) => (k, evaluate(v, scope))))
20.34.scala
Dictionary literals do not add or remove anything from the lexical scope, so the
scope parameter is passed through unchanged. We can test these two cases as
follows:
> evaluate(fastparse.parse("\"hello\"", Parser.expr(using _)).get.value, Map.empty)
res12: Value = Str("hello")
> val input = """{"hello": "world", "key": "value"}"""
> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res13: Value = Dict(Map("hello" -> Str("world"), "key" -> Str("value")))
20.35.scala
For now, we are passing in Map.empty as the scope, since evaluating literal
strings and dictionaries always results in the same thing regardless of what
values are in scope.
Next, we will look at the Expr.Plus nodes. We have only defined their behavior
to work on string values (Value.Str), so evaluating them involves:
Value.StrValue.Str def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
...
case Expr.Dict(kvs) => ...
+ case Expr.Plus(left, right) =>
+ val Value.Str(leftStr) = evaluate(left, scope).runtimeChecked
+ val Value.Str(rightStr) = evaluate(right, scope).runtimeChecked
+ Value.Str(leftStr + rightStr)
20.36.scala> evaluate(
fastparse.parse("\"hello\" + \"world\"", Parser.expr(using _)).get.value,
Map.empty
)
res14: Value = Str("helloworld")
20.37.scala
If one of the children of Expr.Plus is not a Value.Str, the val
assignments in the above code fails with a MatchError. Providing a nicer error
message is left as an exercise to the reader.
So far, nothing we have evaluated so far depends on what values are in the
scope of the evaluation. This will change when we start handling local
declarations, which add values to the scope, and identifiers like foo which
use those values.
Let's consider the following local syntax:
> val input = """local greeting = "Hello "; greeting + greeting"""
> fastparse.parse(input, Parser.expr(using _))
res15: fastparse.Parsed[Expr] = Success(
value = Local(
name = "greeting",
assigned = Str("Hello "),
body = Plus(left = Ident("greeting"), right = Ident("greeting"))
),
index = 46
)
20.38.scalaThe point of local is to evaluate the assigned expression to a value, assign
that value to the name, and then evaluate the body expression with that value
bound to that name in the scope. We can write that in code as follows:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
...
case Expr.Plus(left, right) => ...
+ case Expr.Local(name, assigned, body) =>
+ val assignedValue = evaluate(assigned, scope)
+ evaluate(body, scope + (name -> assignedValue))
20.39.scala
Note that because scope is a Scala immutable Map,
+ (name -> assignedValue) doesn't modify the existing Map, but instead creates a new
Map with the additional key-value pair included. This is a relatively
efficient O(log n) operation due to how immutable Maps are implemented.
Once a local has put a name into scope, evaluating Ident nodes is
straightforward: we just fetch the value of that name in the scope:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
...
case Expr.Local(name, assigned, body) => ...
+ case Expr.Ident(name) => scope(name)
20.40.scala
We can test it to see that it works when the identifiers and declarations line
up, even when multiple locals are chained together:
> val input = """local greeting = "Hello "; greeting + greeting"""
> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res16: Value = Str("Hello Hello ")
> val input = """local x = "Hello "; local y = "world"; x + y"""
> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res17: Value = Str("Hello world")
20.41.scala
Lastly, we can verify that it fails when an identifier is not found:
> val input = """local greeting = "Hello "; nope + nope"""
> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
java.util.NoSuchElementException: key not found: nope
at scala.collection.immutable.Map$Map1.apply(Map.scala:267)
at rs$line$31$.evaluate(rs$line$31:11)
at rs$line$31$.evaluate(rs$line$31:5)
... 30 elided
20.42.scalaThe last thing we have left to evaluate are the Expr.Func function literals
and Expr.Call function applications:
case class Func(params: Seq[String], body: Expr) extends Expr
case class Call(expr: Expr, args: Seq[Expr]) extends Expr
20.43.scala
Evaluating an Expr.Func should give us a Value.Func, and evaluating an
Expr.Call on a Value.Func should give us the result of evaluating that
function. The result could be a Value.Str, a Value.Dict, or even another
Value.Func.
Evaluating a Expr.Call can be thought of as the following steps:
expr: Expr into a Value.Funcargs: Seq[Expr] into a Seq[Value] of argument valuesValue.Func#call function on the evaluated argument values to produce
the result def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
...
case Expr.Ident(name) => ...
+ case Expr.Call(expr, args) =>
+ val Value.Func(call) = evaluate(expr, scope).runtimeChecked
+ val evaluatedArgs = args.map(evaluate(_, scope))
+ call(evaluatedArgs)
20.44.scala
The tricky thing here is: how do we evaluate the Expr.Func to produce a
Value.Func whose call attribute does what we want?
When you think about what calling a function really means, it boils down to
four steps:
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
...
case Expr.Call(expr, args) => ...
+ case Expr.Func(argNames, body) =>
+ Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))
20.45.scala
Again, scope ++ argNames.zip(args) is an efficient O(log n) operation on
the scope Map. We can then test this with some of the functions we saw
earlier:
> val input = """local f = function(a) a + "1"; f("123")"""
> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res19: Value = Str("1231")
> val input = """local f = function(a, b) a + " " + b; f("hello", "world")"""
> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res20: Value = Str("hello world")
20.46.scala
The evaluate function now looks like:
Evaluate.scala20.47.scala//| moduleDeps: [./Parse.scala, ./Values.scala]defevaluate(expr:Expr,scope:Map[String,Value]):Value=exprmatchcaseExpr.Str(s)=>Value.Str(s)caseExpr.Dict(kvs)=>Value.Dict(kvs.map((k,v)=>(k,evaluate(v,scope))))caseExpr.Plus(left,right)=>valValue.Str(leftStr)=evaluate(left,scope).runtimeCheckedvalValue.Str(rightStr)=evaluate(right,scope).runtimeChecked Value.Str(leftStr+rightStr)caseExpr.Local(name,assigned,body)=>valassignedValue=evaluate(assigned,scope)evaluate(body,scope+(name->assignedValue))caseExpr.Ident(name)=>scope(name)caseExpr.Call(expr,args)=>valValue.Func(call)=evaluate(expr,scope).runtimeCheckedvalevaluatedArgs=args.map(evaluate(_,scope))call(evaluatedArgs)caseExpr.Func(argNames,body)=>Value.Func(args=>evaluate(body,scope++argNames.zip(args)))
In the last two sections, we saw how to parse an input String of Jsonnet
source code into an Expr syntax tree structure, and then evaluate the Expr
into a Value data structure that represents a JSON value. In this last section
of the chapter, we will see how to serialize the Value data structure to an
output JSON String:
The signature of our serialize function looks like this:
def serialize(v: Value): String = v.runtimeChecked match
case Value.Str(s) => ...
case Value.Dict(kvs) => ...
20.48.scala
A Value can only be a Value.Str, Value.Dict, or Value.Func. For now, we
will assume there are no longer any Value.Func nodes in the final data
structure, as functions cannot be serialized to JSON. Production implementations
of Jsonnet like google/jsonnet simply raise
an error if an un-called function is present in the output, and the above
snippet would fail with a MatchError, signalled by the use of .runtimeChecked.
Since earlier we assumed there are no
escape sequences in our strings, we can serialize Value.Str nodes by adding
double quotes "s around them:
def serialize(v: Value): String = v.runtimeChecked match
+ case Value.Str(s) => "\"" + s + "\""
20.49.scala
As for Value.Dicts, we should serialize each key-value pair as "foo": "bar",
separated by commas, and wrap the whole thing in curly brackets:
def serialize(v: Value): String = v.runtimeChecked match
case Value.Str(s) => "\"" + s + "\""
+ case Value.Dict(kvs) =>
+ kvs.map((k, v) => "\"" + k + "\": " + serialize(v)).mkString("{", ", ", "}")
20.50.scala
Now all we need to do is wrap the entire parse -> evaluate -> serialize
pipeline in a nice helper function, and it is ready for use!
> def jsonnet(input: String): String =
serialize(evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty))
20.51.scala | |
The complete code of the Jsonnet interpreter is given below, combined into a single self-contained file:
Jsonnet.scala20.54.scala//| mvnDeps://| - com.lihaoyi::fastparse:3.1.1enumExpr:caseStr(s:String)caseIdent(name:String)casePlus(left:Expr,right:Expr)caseDict(pairs:Map[String,Expr])caseLocal(name:String,assigned:Expr,body:Expr)caseFunc(argNames:Seq[String],body:Expr)caseCall(expr:Expr,args:Seq[Expr])objectParser:importfastparse.*,MultiLineWhitespace.*defexpr[T:P]:P[Expr]=P(prefixExpr~plus.rep).map:case(e,items)=>items.foldLeft(e)(Expr.Plus(_,_))defprefixExpr[T:P]:P[Expr]=P(callExpr~call.rep).map:case(e,items)=>items.foldLeft(e)(Expr.Call(_,_))defcallExpr[T:P]=P(str|dict|local|func|ident)defstr[T:P]=P(str0).map(Expr.Str(_))defstr0[T:P]=P("\""~~/CharsWhile(_!='"',0).!~~"\"")defident[T:P]=P(ident0).map(Expr.Ident(_))defident0[T:P]=P(CharIn("a-zA-Z_")~~CharsWhileIn("a-zA-Z0-9_",0)).!defdict[T:P]=P("{"~/(str0~":"~/expr).rep(0,",")~"}").map(kvs=>Expr.Dict(kvs.toMap))deflocal[T:P]=P("local"~/ident0~"="~expr~";"~expr).map(Expr.Local.apply)deffunc[T:P]=P("function"~/"("~ident0.rep(0,",")~")"~expr).map(Expr.Func.apply)defplus[T:P]=P("+"~prefixExpr)defcall[T:P]=P("("~/expr.rep(0,",")~")")enumValue:caseStr(s:String)caseDict(pairs:Map[String,Value])caseFunc(call:Seq[Value]=>Value)defevaluate(expr:Expr,scope:Map[String,Value]):Value=exprmatchcaseExpr.Str(s)=>Value.Str(s)caseExpr.Dict(kvs)=>Value.Dict(kvs.map((k,v)=>(k,evaluate(v,scope))))caseExpr.Plus(left,right)=>valValue.Str(leftStr)=evaluate(left,scope).runtimeCheckedvalValue.Str(rightStr)=evaluate(right,scope).runtimeChecked Value.Str(leftStr+rightStr)caseExpr.Local(name,assigned,body)=>valassignedValue=evaluate(assigned,scope)evaluate(body,scope+(name->assignedValue))caseExpr.Ident(name)=>scope(name)caseExpr.Call(expr,args)=>valValue.Func(call)=evaluate(expr,scope).runtimeCheckedvalevaluatedArgs=args.map(evaluate(_,scope))call(evaluatedArgs)caseExpr.Func(argNames,body)=>Value.Func(args=>evaluate(body,scope++argNames.zip(args)))defserialize(v:Value):String=v.runtimeCheckedmatchcaseValue.Str(s)=>"\""+s+"\""caseValue.Dict(kvs)=>kvs.map((k,v)=>"\""+k+"\": "+serialize(v)).mkString("{",", ","}")defjsonnet(input:String):String=serialize(evaluate(fastparse.parse(input,Parser.expr(using_)).get.value,Map.empty))
In this chapter, we have walked through how to use Scala to implement an
interpreter for a simple programming language: a subset of Jsonnet. We used
FastParse to parse the source code string into an Expr syntax tree,
recursively evaluated the Expr syntax tree into a Value data structure, and
serialized the Value data structure to a JSON String.
Parsing and evaluating programming language source code is a very different use case than the distributed real-time file synchronizer we wrote in Chapter 18: Building a Real-time File Synchronizer. The style of Scala we have written is correspondingly different, utilizing recursive parsing and tree traversals rather than concurrent actor-based event pipelines. Nevertheless, the Scala language and libraries makes this easy, and in 70 lines of code we have a working implementation of a simple interpreted programming language.
While the interpreter we implemented in this chapter is relatively simple, the same techniques apply to implementing most interpreted programming languages, including production-ready implementations like google/jsonnet. Hopefully this chapter has given you an intuition for how a simple programming language is implemented. Even if you never need to implement your own programming language, these techniques may still be useful if you find yourself writing linters, program analyzers, query engines, and other such tools.
Exercise: Modify our Jsonnet interpreter to be able to handle basic numbers as well. For
simplicity assume numbers can only be positive integers, and the only
operation they support is addition via a + b.
|
|
Exercise: Modify def serialize to generate the ujson.Values we used in
Chapter 8: JSON and Binary Data Serialization, and make it use
ujson.write(..., indent = 2) to pretty-print the output JSON
rather than having it all generated on a single line
Exercise: Modify our Jsonnet interpreter to provide simple Jsonnet stack traces when the
interpretation fails for whatever reason. You can use the Index parser from
FastParse to capture the character-index of each Expr node we are parsing,
and combine that with the input string to find the line and column number for
any Expr node whose evaluation failed with an exception.
| |