20

Implementing a Programming Language


20.1 Interpreting Jsonnet381
20.2 Jsonnet Language Features381
20.3 Parsing Jsonnet383
20.4 Evaluating the Syntax Tree392
20.5 Serializing to JSON397

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
  case Expr.Str(s) => Value.Str(s)
  case Expr.Dict(kvs) => Value.Dict(kvs.map((k, v) => (k, evaluate(v, scope))))
  case Expr.Plus(left, right) =>
    val Value.Str(leftStr) = evaluate(left, scope)
    val Value.Str(rightStr) = evaluate(right, scope)
    Value.Str(leftStr + rightStr)
20.1.scala

Snippet 20.1: evaluating a syntax tree using pattern matching

This chapter builds upon the simple parsers we learned in Chapter 19: Parsing Structured Text, and walks you through the process of implementing a simple programming language in Scala.

Working with programming language source code is a strength of Scala: parsing, analyzing, compiling, or interpreting it. This chapter should will you how easy it is to write a simple interpreter to parse and evaluate program source code in Scala. Even if your goal is not to implement an entirely new programming language, these techniques are still useful: for writing linters, program analyzers, query engines, and other such tools.

20.1 Interpreting Jsonnet

The goal of this chapter is to implement an interpreter for a subset of the Jsonnet programming language.

jsonnetlocal greeting = "Hello ";
local person = function(name) {
  "name": name,
  "welcome": greeting + name + "!"
};
{
  "person1": person("Alice"),
  "person2": person("Bob"),
  "person3": person("Charlie")
}20.2.jsonnet
json{
  "person1": {
    "name": "Alice",
    "welcome": "Hello Alice!"
  },
  "person2": {
    "name": "Bob",
    "welcome": "Hello Bob!"
  },
  "person3": {
    "name": "Charlie",
    "welcome": "Hello Charlie!"
  }
}20.3.json

Jsonnet is a simple language meant to construct JSON configuration files: the output of evaluating a .jsonnet file is a single JSON structure containing dictionaries, lists, strings, numbers, and booleans. The output JSON can then be used to configure Kubernetes, CloudFormation, Terraform, or other software systems.

While you can construct JSON structures in any programming language, Jsonnet makes it much more convenient than doing it in a general-purpose language like Java or Python. Jsonnet is used heavily in industry in order to manage large and complex system configurations.

Our interpreter will roughly follow the following steps:

  • Parse a Jsonnet source code String into a syntax tree structure called Expr
  • Evaluate the Expr into a Jsonnet value called Value
  • Serialize the final Jsonnet Value into an output JSON String

We can visualize these steps as follows:

G String1 String parse parse String1->parse String2 String Expr Expr evaluate evaluate Expr->evaluate Value Value serialize serialize Value->serialize parse->Expr evaluate->Value serialize->String2

This chapter will walk you through all three phases of implementing our simple interpreter, giving you an intuition for the techniques, data structures and algorithms involved. While not a comprehensive or production-ready implementation, this chapter should give you enough foundation to get started with your own simple language-related projects.

20.2 Jsonnet Language Features

Jsonnet is similar to JSON, but introduces the constructs that help you tidy up verbose or repetitive portions of your JSON config: local variables, function definitions, and basic operations like concatenating strings with +. You can learn more about the syntax and experiment with the language interactively on the official website:

For the purpose of this exercise, we will stop at implementing these three simple features. Further features and improvements can be implemented the same way, and we will leave the task of exhaustively implementing the entire Jsonnet language to production intepreters like google/jsonnet.

20.2.1 Primitives

Jsonnet has a similar set of primitives as JSON. For this chapter, we will consider just a subset of them:

"hello world" // strings
{"key": "value", "thing": "123"} // dictionaries
20.4.jsonnet

We will assume that strings do not contain escape sequences like \n or \", and leave out additional data types like numbers, booleans, arrays, or null. We will also assume that strings only support a single operation: concatenation via +.

jsonnet"hello" + "world"
json"helloworld"

20.2.2 Locals and Functions

You can define locals using the local keyword, and define local functions by combining the local keyword with a function expression. As in most programming languages, functions are called using parentheses containing the arguments you wish to pass:

jsonnetlocal greeting = "Hello ";
greeting + greeting20.5.jsonnet
json"Hello Hello "
jsonnetlocal hello =
  function(name) "Hello " + name;
hello("Bob")20.6.jsonnet
json"Hello Bob"

Local variables and functions can also return structured data in dictionaries:

jsonnetlocal person = function(name) {
  "name": name,
  "welcome": "Hello " + name + "!",
};
person("Bob")20.7.jsonnet
json{
  "name": "Bob",
  "welcome": "Hello Bob!",
};20.8.json

20.2.3 Composition

The language features described here can be combined in arbitrary ways, e.g. a function can be called inside a dictionary value. And that function may itself return a dictionary, that gets nested within:

jsonnet
local f = function(x) "Hello " + x;
{"key": "value", "thing": f("World")}20.9.jsonnet
json{
  "key": "value",
  "thing": "Hello World"
}20.10.json
jsonnetlocal f = function(x) {
  "nested key": "Hello " + x
};
{"key": "value", "thing": f("World")}20.11.jsonnet
json{
  "key": "value",
  "thing": {
    "nested key": "Hello World"
  }
}20.12.json

This quick tour is only a small subset of the full Jsonnet language, but will be enough for this chapter.

20.3 Parsing Jsonnet

First, let us look at parsing:

G String1 String parse parse String1->parse String2 String Expr Expr parse->Expr evaluate evaluate Expr->evaluate Value Value serialize serialize Value->serialize evaluate->Value serialize->String2

To parse Jsonnet, we will be using the FastParse library introduced in Chapter 19: Parsing Structured Text.

20.3.1 Defining the Syntax Tree

We will do all our work inside the Scala CLI REPL. Let's start by defining a syntax tree of what our minimal Jsonnet syntax looks like: strings, dictionaries, functions, and local definitions. We will use a Scala enum:

Exprs.scalaenum Expr:
  case Str(s: String)
  case Ident(name: String)
  case Plus(left: Expr, right: Expr)
  case Dict(pairs: Map[String, Expr])
  case Local(name: String, assigned: Expr, body: Expr)
  case Func(argNames: Seq[String], body: Expr)
  case Call(expr: Expr, args: Seq[Expr])20.13.scala

Here, the Expr data structure (short for "expression") represents the meaningful parts of the Jsonnet syntax. As stated earlier, we want to be able to support strings with +, dictionaries, local identifiers, and function definitions and calls. Each of these language constructs maps straightforwardly to one of the case classes in the Expr AST.

The Expr type is recursive: a Plus is an Expr, but also contains left: Expr, right: Expr within it. Dict is an Expr that contains a pairs: Map[String, Expr].

20.3.2 Example Parses

Here are some example snippets and what we expect them to parse to.

20.3.2.1 Strings

jsonnet"hello"
parsedExpr.Str("hello")

20.3.2.2 Dictionaries

jsonnet{"hello": "world", "123": "456"}
parsedExpr.Dict(Map(
  "hello" -> Expr.Str("world"),
  "123" -> Expr.Str("456")
))20.14.scala

20.3.2.3 Functions

jsonnetfunction(a, b) a + " " + b
parsedExpr.Func(
  List("a", "b"),
  Expr.Plus(
    Expr.Plus(
      Expr.Ident("a"),
      Expr.Str(" ")
    ),
    Expr.Ident("b")
  )
)20.15.scala

20.3.2.4 Locals

jsonnetlocal f = function(a) a + "1";
f("123")20.16.jsonnet
parsedExpr.Local(
  "f",
  Expr.Func(
    List("a"),
    Expr.Plus(
      Expr.Ident("a"),
      Expr.Str("1")
    )
  ),
  Expr.Call(
    Expr.Ident("f"),
    List(Expr.Str("123"))
  )
)20.17.scala

As Jsonnet syntax is recursive, we expect to be able to parse any combination of these language features combined together.

20.3.3 Parsing Terminals

To begin using FastParse, we will add the dependency, and use the following imports:

$ scala --import com.lihaoyi::fastparse:3.1.1 --repl

> import fastparse.*, MultiLineWhitespace.*
20.18.scala

For this exercise, we will be using FastParse's support for automatic whitespace skipping, using MultiLineWhitespace. That lets us skip all spaces, tabs, and newlines automatically when parsing so we can write our grammar without needing an explicit ws parser like we did in Chapter 19: Parsing Structured Text.

It is simplest to parse the non-recursive parts of our Jsonnet syntax, often called "terminals". Of the case classes defined above, only two of them are terminals: Str and Ident.

20.3.3.1 Parsing Strings

First, let's write the parser for Str. We will ignore escape characters for simplicity, meaning a string is simply a ", followed by zero-or-more non-" characters, closed by another ":

> def str[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" ).map(Expr.Str(_))

> fastparse.parse("\"hello\"", str(using _))
res0: fastparse.Parsed[Expr.Str] = Success(value = Str("hello"), index = 7)

> fastparse.parse("\"hello world\"", str(using _))
res1: fastparse.Parsed[Expr.Str] = Success(
  value = Str("hello world"),
  index = 13
)

> fastparse.parse("\"\"", str(using _))
res2: fastparse.Parsed[Expr.Str] = Success(value = Str(""), index = 2)

> fastparse.parse("123", str(using _))
res3: fastparse.Parsed[Expr.Str] = Parsed.Failure(Position 1:1, found "123")
20.19.scala

Note how we use the ~~/ operator after the "\"" open quote. The ~~ means we do not want to consume whitespace here (since we are inside a string) and the / is a FastParse Cut meaning we want to avoid backtracking if the parse fails. This is because only string literals can start with a double quote character ", so the parser does not need to backtrack and try other alternatives if the parse fails after seeing a ". That helps the parser report more informative error messages when the parse fails.

20.3.3.2 Parsing Identifiers

Identifiers are similar: an alphabet or underscore, followed by zero or more alphanumeric characters.

> def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!.map(Expr.Ident(_))

> fastparse.parse("hello", ident(using _))
res4: fastparse.Parsed[Expr.Ident] = Success(
  value = Ident("hello"),
  index = 5
)

> fastparse.parse("123", ident(using _)) // Identifiers cannot start with a number
res5: fastparse.Parsed[Expr.Ident] = Parsed.Failure(Position 1:1, found "123")
20.20.scala

20.3.4 Parsing Plus

Now that we have the ability to parse the terminal non-recursive syntax tree nodes, Strs and Idents, we need to move on to the recursive parts of the syntax tree. The simplest recursive syntax tree node is Expr.Plus, used to model the a + b syntax.

The Expr.Plus nodes representing + and all other case classes in our syntax tree have a recursive definition. Plus contains left and right Exprs, but an Expr could be a Plus node. This can be solved by making our parsers recursive, as follows:

> {
  def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map:
    case (left, rights) => rights.foldLeft(left)(Expr.Plus(_, _))
  def plus[T: P] = P( "+" ~ prefixExpr )
  def prefixExpr[T: P] = P( str | ident )
  def str[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
    .map(Expr.Str(_))
  def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
    .map(Expr.Ident(_))
  }
20.21.scala

In the above snippet, we define an expr parser that contains a prefixExpr, followed by zero or more plus parsers, each of which parses a + followed by another prefixExpr. The expr parser is recursive, so to satisfy the compiler we need to annotate its type as : P[Expr] representing a parser that produces an Expr value on success.

That lets us parse the following:

jsonnet"hello" + "world"
parsedExpr.Plus(
  Expr.Str("hello"),
  Expr.Str("world")
)20.22.scala
jsonnethello + " " + world
parsedExpr.Plus(
  Expr.Plus(
    Expr.Ident("hello"),
    Expr.Str(" ")
  ),
  Expr.Ident("world")
)20.23.scala

Note that prefixExpr can be either a str or ident, but cannot be an expr node with its own plus nodes within. That is because any plus nodes will already get parsed by the existing plus.rep parser. We also cannot simply define plus as expr ~ "+" ~ expr: this is because the plus parser would then be Left Recursive, causing an infinite recursion at parse time. For more details, see the following article:

20.3.4.1 Constructing Parse Nodes with Fold Left

Since prefixExpr is a P[Expr], and plus is a P[Expr], the combined prefixExpr ~ plus.rep would by default be a P[(Expr, Seq[Expr)]: a parser that produces a tuple of (Expr, Seq[Expr]). The first item in the tuple represents the Expr parsed by the first prefixExpr, and the second item represents the Seq[Expr] parsed by the plus.rep, each repetition of which produces an Expr. To allow our def expr parser to produce a single Expr, we need to transform the produced (Expr, Seq[Expr]) using .map. The foldLeft expression:

rights.foldLeft(left)(Expr.Plus(_, _))

This can also be written out more verbosely as follows:

rights.foldLeft(left)((lhs, rhs) => Expr.Plus(lhs, rhs))

foldLeft starts off with the value left, and for every item in the rights list, it wraps both of them in a Plus node. This builds a tree of Plus nodes from the flat list of rights. In a more sophisticated parser the shape of the tree may depend on the precedence of the operators, but for this exercise we end up building a simple left-associative binary tree.

G Plus1 Plus a a Plus1->a b b Plus1->b Plus2 Plus Plus2->Plus1 c c Plus2->c Plus3 Plus a + b + c + d Plus3->Plus2 d d Plus3->d

We can test this out as follows:

> fastparse.parse("a + b", expr(using _))
res6: fastparse.Parsed[Expr] = Success(
  value = Plus(left = Ident("a"), right = Ident("b")),
  index = 5
)

> fastparse.parse("""a + " " + c""", expr(using _))
res7: fastparse.Parsed[Expr] = Success(
  value = Plus(
    left = Plus(left = Ident("a"), right = Str(" ")),
    right = Ident("c")
  ),
  index = 11
)
20.24.scala

20.3.5 Parsing Dictionaries

Expr.Dict nodes are also recursive, with each comma-separated key-value pair containing an String key and an Expr value. We can extend our parser to handle those via:

  def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map:
    case (left, items) => items.foldLeft(left)(Expr.Plus(_, _))
  def plus[T: P] = P( "+" ~ prefixExpr )
- def prefixExpr[T: P] = P( str | ident )
+ def prefixExpr[T: P] = P( str | ident | dict )
+ def dict[T: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" )
+   .map(kvs => Expr.Dict(kvs.toMap))
- def str[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
-   .map(Expr.Str(_))
+ def str[T: P] = P( str0 ).map(Expr.Str(_))
+ def str0[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
  def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
    .map(Expr.Ident(_))
20.25.scala

The three main changes we have here are:

  • Adding dict as one of the alternatives in def prefixExpr

  • Extracting the str0 parser from str, where str0 returns the raw String that was parsed, while str wraps it in an Expr.Str syntax tree node. We do this because Expr.Dict keys have the same syntax as Expr.Strs, but do not need to be wrapped in Expr.Str nodes.

  • Parsing dictionary literals as a {, followed by zero-or-more key-value pairs separated by ,s. Each key-value pair is a str0, followed by :, followed by any expr. This repetition of key-value pairs would by default give us a Seq[(String, Expr)], which we need to .map on to transform into an Expr.Dict node.

We can test these changes as follows:

> fastparse.parse("""{"a": "b", "cde": id, "nested": {}}""", expr(using _))
res8: fastparse.Parsed[Expr] = Parsed.Failure(Position 1:1, found "{\"a\": \"b\",")
20.26.scala

20.3.6 Parsing Functions, Locals, and Function Calls

Adding the parsers for func, local and call to this code requires the following changes:

  def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep).map:
    case (left, items) => items.foldLeft(left)(Expr.Plus(_, _))
  def plus[T: P] = P( "+" ~ prefixExpr )
- def prefixExpr[T: P] = P( str | ident | dict )
+ def prefixExpr[T: P]: P[Expr] = P( callExpr ~ call.rep ).map:
+   case (left, items) => items.foldLeft(left)(Expr.Call(_, _))
+ def callExpr[T: P] = P( str | dict | local | func | ident )
  def dict[T: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" )
    .map(kvs => Expr.Dict(kvs.toMap))
+ def call[T: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )
+ def local[T: P] =
+   P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr ).map(Expr.Local.apply)
+ def func[T: P] =
+   P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr ).map(Expr.Func.apply)
  def str[T: P] = P( str0 ).map(Expr.Str(_))
  def str0[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )

- def ident[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
-   .map(Expr.Ident(_))
+ def ident[T: P] = P( ident0 ).map(Expr.Ident(_))
+ def ident0[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!
20.27.scala

The main changes here are:

  • Turning prefixExpr into a callExpr followed by one or more function calls suffix

  • Splitting out ident0 from ident, since func uses the same syntax as ident for parsing its parameter list but does not need the identifiers to be boxed into Expr.Ident nodes

  • Defining a function call suffix as an open (, a comma-separated list of exprs, and a closing )

  • Defining function and local: each starting with a keyword, followed by their respective syntaxes

Like we saw earlier when parsing Plus nodes, we use foldLeft to convert the (Expr, Seq[Expr]) tuple that the callExpr ~ call.rep parser gives us into nested Call nodes.

Wrapping all of this up into an object Parser for tidyness gives us:

Parse.scala//| moduleDeps: [./Exprs.scala]
//| mvnDeps:
//| - com.lihaoyi::fastparse:3.1.1
object Parser:
  import fastparse.*, MultiLineWhitespace.*
  def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map:
    case (e, items) => items.foldLeft(e)(Expr.Plus(_, _))
  def prefixExpr[T: P]: P[Expr] = P( callExpr ~ call.rep ).map:
    case (e, items) => items.foldLeft(e)(Expr.Call(_, _))
  def callExpr[T: P] = P( str | dict | local | func | ident )

  def str[T: P] = P( str0 ).map(Expr.Str(_))
  def str0[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
  def ident[T: P] = P( ident0 ).map(Expr.Ident(_))
  def ident0[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!

  def dict[T: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" )
    .map(kvs => Expr.Dict(kvs.toMap))
  def local[T: P] = P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr )
    .map(Expr.Local.apply)
  def func[T: P] = P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr )
    .map(Expr.Func.apply)

  def plus[T: P] = P( "+" ~ prefixExpr )
  def call[T: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )20.28.scala

20.3.7 Testing the Parser

We can test the parser manually and see that it does what we want:

> fastparse.parse("""{"a": "A", "b": "bee"}""", Parser.expr(using _))
res9: fastparse.Parsed[Expr] = Success(
  value = Dict(Map("a" -> Str("A"), "b" -> Str("bee"))),
  index = 22
)

> fastparse.parse("""f()(a) + g(b, c)""", Parser.expr(using _))
res10: fastparse.Parsed[Expr] = Success(
  value = Plus(
    left = Call(
      expr = Call(expr = Ident("f"), args = List()),
      args = List(Ident("a"))
    ),
    right = Call(expr = Ident("g"), args = List(Ident("b"), Ident("c")))
  ),
  index = 16
)
20.29.scala

The syntax of our programming language is recursive: local, function, plus, and dict expressions can contain other expressions, nested arbitrarily deeply. We can test this out by feeding such nested examples into the expr parser:

> val input = """local thing = "kay"; {"f": function(a) a + a, "nested": {"k": v}}"""

> fastparse.parse(input, Parser.expr(using _))
res11: fastparse.Parsed[Expr] = Success(
  value = Local(
    name = "thing",
    assigned = Str("kay"),
    body = Dict(
      Map(
        "f" -> Func(
          argNames = List("a"),
          body = Plus(left = Ident("a"), right = Ident("a"))
        ),
        "nested" -> Dict(Map("k" -> Ident("v")))
      )
    )
  ),
  index = 65
)
20.30.scala
See example 20.1 - Parse

20.4 Evaluating the Syntax Tree

Now that we have a syntax tree of Expr nodes, the next step is to evaluate the syntax tree to produce a runtime data structure of Values.

G String1 String parse parse String1->parse String2 String evaluate evaluate Value Value evaluate->Value Expr Expr Expr->evaluate serialize serialize Value->serialize parse->Expr serialize->String2

We will define Value as follows:

Values.scalaenum Value:
  case Str(s: String)
  case Dict(pairs: Map[String, Value])
  case Func(call: Seq[Value] => Value)20.31.scala

20.4.1 Expr vs Value

Note that while the Expr syntax tree contains nodes that represent identifiers, locals, function applications, and so on, a Value can only be a Str, a Dict, or a Func. This is similar to the ujson.Value we saw in Chapter 8: JSON and Binary Data Serialization, but simplified and with the addition of the Func class.

The key thing to know about Value is that it doesn't matter where a Value came from. For example, a Value.Str could have been:

  • A literal Expr.Str in the source code
  • Passed in as a function parameter to an Expr.Func
  • Bound to a local variable via Expr.Local
  • Constructed via Expr.Plus node

Regardless of where the Value.Str was produced syntactically, it is still the same Value.Str. The final Value for the entire Jsonnet program will later be converted to a JSON string as the output.

The contents of Value.Str and Value.Dict should be self explanatory. Value.Func is a bit less obvious: by defining it as Func(call: Seq[Value] => Value), we are saying "a function is something you can pass a list of argument values to and returns a value". We will see how to instantiate these Value.Func nodes later.

20.4.2 Defining Evaluate

The basic task here is to write a function that converts an Expr into a Value. Naively, such a function would look something like this:

def evaluate(expr: Expr): Value

However, the Value returned by evaluating an Expr isn't only dependent on the contents of that Expr: it also depends on the enclosing scope. Consider a loose identifier:

foo

The value of this Expr.Ident doesn't just depend on the name of the identifier ("foo"), but also on what value was bound to that name. For example. this binding may be done via a local declaration:

local foo = "I am Cow";
foo
20.32.jsonnet

This mapping of names to Values is often known as the "lexical scope" within your program. Thus we might instead define evaluate as:

def evaluate(expr: Expr, scope: Map[String, Value]): Value

From here we can start fleshing it out.

20.4.3 Evaluating Literals

The simplest case is literal Expr.Str, which are evaluated to the identical Value.Str:

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
  case Expr.Str(s) => Value.Str(s)
20.33.scala

Literal dictionaries are also straightforward: Expr.Dicts become Value.Dicts, with the same keys, except we need to evaluate each value into its corresponding expression:

 def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
   case Expr.Str(s) => Value.Str(s)
+  case Expr.Dict(kvs) => Value.Dict(kvs.map((k, v) => (k, evaluate(v, scope))))
20.34.scala

Dictionary literals do not add or remove anything from the lexical scope, so the scope parameter is passed through unchanged. We can test these two cases as follows:

> evaluate(fastparse.parse("\"hello\"", Parser.expr(using _)).get.value, Map.empty)
res12: Value = Str("hello")

> val input = """{"hello": "world", "key": "value"}"""

> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res13: Value = Dict(Map("hello" -> Str("world"), "key" -> Str("value")))
20.35.scala

For now, we are passing in Map.empty as the scope, since evaluating literal strings and dictionaries always results in the same thing regardless of what values are in scope.

20.4.4 Evaluating Plus

Next, we will look at the Expr.Plus nodes. We have only defined their behavior to work on string values (Value.Str), so evaluating them involves:

  • Evaluating each child expression into a Value.Str
  • Extracting the contents of each string
  • Concatenating them
  • Wrapping the concatenated string back into a Value.Str
 def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
   ...
   case Expr.Dict(kvs) => ...
+  case Expr.Plus(left, right) =>
+    val Value.Str(leftStr) = evaluate(left, scope).runtimeChecked
+    val Value.Str(rightStr) = evaluate(right, scope).runtimeChecked
+    Value.Str(leftStr + rightStr)
20.36.scala
> evaluate(
    fastparse.parse("\"hello\" + \"world\"", Parser.expr(using _)).get.value,
    Map.empty
  )
res14: Value = Str("helloworld")
20.37.scala

If one of the children of Expr.Plus is not a Value.Str, the val assignments in the above code fails with a MatchError. Providing a nicer error message is left as an exercise to the reader.

So far, nothing we have evaluated so far depends on what values are in the scope of the evaluation. This will change when we start handling local declarations, which add values to the scope, and identifiers like foo which use those values.

20.4.5 Evaluating Locals and Identifiers

Let's consider the following local syntax:

> val input = """local greeting = "Hello "; greeting + greeting"""

> fastparse.parse(input, Parser.expr(using _))
res15: fastparse.Parsed[Expr] = Success(
  value = Local(
    name = "greeting",
    assigned = Str("Hello "),
    body = Plus(left = Ident("greeting"), right = Ident("greeting"))
  ),
  index = 46
)
20.38.scala

20.4.5.1 Evaluating Locals

The point of local is to evaluate the assigned expression to a value, assign that value to the name, and then evaluate the body expression with that value bound to that name in the scope. We can write that in code as follows:

 def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
   ...
   case Expr.Plus(left, right) => ...
+  case Expr.Local(name, assigned, body) =>
+    val assignedValue = evaluate(assigned, scope)
+    evaluate(body, scope + (name -> assignedValue))
20.39.scala

Note that because scope is a Scala immutable Map, + (name -> assignedValue) doesn't modify the existing Map, but instead creates a new Map with the additional key-value pair included. This is a relatively efficient O(log n) operation due to how immutable Maps are implemented.

20.4.5.2 Evaluating Identifiers

Once a local has put a name into scope, evaluating Ident nodes is straightforward: we just fetch the value of that name in the scope:

 def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
   ...
   case Expr.Local(name, assigned, body) => ...
+  case Expr.Ident(name) => scope(name)
20.40.scala

We can test it to see that it works when the identifiers and declarations line up, even when multiple locals are chained together:

> val input = """local greeting = "Hello "; greeting + greeting"""

> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res16: Value = Str("Hello Hello ")

> val input = """local x = "Hello "; local y = "world"; x + y"""

> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res17: Value = Str("Hello world")
20.41.scala

Lastly, we can verify that it fails when an identifier is not found:

> val input = """local greeting = "Hello "; nope + nope"""

> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
java.util.NoSuchElementException: key not found: nope
  at scala.collection.immutable.Map$Map1.apply(Map.scala:267)
  at rs$line$31$.evaluate(rs$line$31:11)
  at rs$line$31$.evaluate(rs$line$31:5)
  ... 30 elided
20.42.scala

20.4.6 Evaluating Functions

The last thing we have left to evaluate are the Expr.Func function literals and Expr.Call function applications:

case class Func(params: Seq[String], body: Expr) extends Expr
case class Call(expr: Expr, args: Seq[Expr]) extends Expr
20.43.scala

Evaluating an Expr.Func should give us a Value.Func, and evaluating an Expr.Call on a Value.Func should give us the result of evaluating that function. The result could be a Value.Str, a Value.Dict, or even another Value.Func.

20.4.6.1 Evaluating Function Calls

Evaluating a Expr.Call can be thought of as the following steps:

  • Evaluate the expr: Expr into a Value.Func
  • Evaluate the args: Seq[Expr] into a Seq[Value] of argument values
  • Call the Value.Func#call function on the evaluated argument values to produce the result
 def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
   ...
   case Expr.Ident(name) => ...
+  case Expr.Call(expr, args) =>
+    val Value.Func(call) = evaluate(expr, scope).runtimeChecked
+    val evaluatedArgs = args.map(evaluate(_, scope))
+    call(evaluatedArgs)
20.44.scala

The tricky thing here is: how do we evaluate the Expr.Func to produce a Value.Func whose call attribute does what we want?

20.4.6.2 Evaluating Function Definitions

When you think about what calling a function really means, it boils down to four steps:

  • Store the original scope of the function at the definition-site
  • Take the values of the arguments passed to the function at call-site
  • Create a modified copy of the original scope at definition-site with the values of the arguments passed at call-site bound to the names of the arguments specified at definition-site
  • Evaluate the body of the function with the modified copy of the scope
 def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
   ...
   case Expr.Call(expr, args) => ...
+  case Expr.Func(argNames, body) =>
+    Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))
20.45.scala

Again, scope ++ argNames.zip(args) is an efficient O(log n) operation on the scope Map. We can then test this with some of the functions we saw earlier:

> val input = """local f = function(a) a + "1";  f("123")"""

> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res19: Value = Str("1231")

> val input = """local f = function(a, b) a + " " + b; f("hello", "world")"""

> evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty)
res20: Value = Str("hello world")
20.46.scala

The evaluate function now looks like:

Evaluate.scala//| moduleDeps: [./Parse.scala, ./Values.scala]
def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
  case Expr.Str(s) => Value.Str(s)
  case Expr.Dict(kvs) => Value.Dict(kvs.map((k, v) => (k, evaluate(v, scope))))
  case Expr.Plus(left, right) =>
    val Value.Str(leftStr) = evaluate(left, scope).runtimeChecked
    val Value.Str(rightStr) = evaluate(right, scope).runtimeChecked
    Value.Str(leftStr + rightStr)
  case Expr.Local(name, assigned, body) =>
    val assignedValue = evaluate(assigned, scope)
    evaluate(body, scope + (name -> assignedValue))
  case Expr.Ident(name) => scope(name)
  case Expr.Call(expr, args) =>
    val Value.Func(call) = evaluate(expr, scope).runtimeChecked
    val evaluatedArgs = args.map(evaluate(_, scope))
    call(evaluatedArgs)
  case Expr.Func(argNames, body) =>
    Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))20.47.scala
See example 20.2 - Evaluate

20.5 Serializing to JSON

In the last two sections, we saw how to parse an input String of Jsonnet source code into an Expr syntax tree structure, and then evaluate the Expr into a Value data structure that represents a JSON value. In this last section of the chapter, we will see how to serialize the Value data structure to an output JSON String:

G String1 String parse parse String1->parse String2 String serialize serialize serialize->String2 Expr Expr evaluate evaluate Expr->evaluate Value Value Value->serialize parse->Expr evaluate->Value

The signature of our serialize function looks like this:

def serialize(v: Value): String = v.runtimeChecked match
  case Value.Str(s) => ...
  case Value.Dict(kvs) => ...
20.48.scala

A Value can only be a Value.Str, Value.Dict, or Value.Func. For now, we will assume there are no longer any Value.Func nodes in the final data structure, as functions cannot be serialized to JSON. Production implementations of Jsonnet like google/jsonnet simply raise an error if an un-called function is present in the output, and the above snippet would fail with a MatchError, signalled by the use of .runtimeChecked. Since earlier we assumed there are no escape sequences in our strings, we can serialize Value.Str nodes by adding double quotes "s around them:

 def serialize(v: Value): String = v.runtimeChecked match
+  case Value.Str(s) => "\"" + s + "\""
20.49.scala

As for Value.Dicts, we should serialize each key-value pair as "foo": "bar", separated by commas, and wrap the whole thing in curly brackets:

 def serialize(v: Value): String = v.runtimeChecked match
   case Value.Str(s) => "\"" + s + "\""
+  case Value.Dict(kvs) =>
+    kvs.map((k, v) => "\"" + k + "\": " + serialize(v)).mkString("{", ", ", "}")
20.50.scala

Now all we need to do is wrap the entire parse -> evaluate -> serialize pipeline in a nice helper function, and it is ready for use!

> def jsonnet(input: String): String =
    serialize(evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty))
20.51.scala
> println(jsonnet(
  """local greet = "Hello ";
     local person = function(name) {
       "name": name,
       "welcome": greet + name + "!"
     };
     {
       "person1": person("Alice"),
       "person2": person("Bob"),
       "person3": person("Charlie")
     }"""
  ))
20.52.scala
{"person1": {
   "name": "Alice",
   "welcome": "Hello Alice!"},
 "person2": {
   "name": "Bob",
   "welcome": "Hello Bob!"},
 "person3": {
   "name": "Charlie",
   "welcome": "Hello Charlie!"}}
20.53.output-json
See example 20.3 - Serialize

20.5.1 Complete Jsonnet Interpreter

The complete code of the Jsonnet interpreter is given below, combined into a single self-contained file:

Jsonnet.scala//| mvnDeps:
//| - com.lihaoyi::fastparse:3.1.1
enum Expr:
  case Str(s: String)
  case Ident(name: String)
  case Plus(left: Expr, right: Expr)
  case Dict(pairs: Map[String, Expr])
  case Local(name: String, assigned: Expr, body: Expr)
  case Func(argNames: Seq[String], body: Expr)
  case Call(expr: Expr, args: Seq[Expr])

object Parser:
  import fastparse.*, MultiLineWhitespace.*
  def expr[T: P]: P[Expr] = P( prefixExpr ~ plus.rep ).map:
    case (e, items) => items.foldLeft(e)(Expr.Plus(_, _))
  def prefixExpr[T: P]: P[Expr] = P( callExpr ~ call.rep ).map:
    case (e, items) => items.foldLeft(e)(Expr.Call(_, _))
  def callExpr[T: P] = P( str | dict | local | func | ident )

  def str[T: P] = P( str0 ).map(Expr.Str(_))
  def str0[T: P] = P( "\"" ~~/ CharsWhile(_ != '"', 0).! ~~ "\"" )
  def ident[T: P] = P( ident0 ).map(Expr.Ident(_))
  def ident0[T: P] = P( CharIn("a-zA-Z_") ~~ CharsWhileIn("a-zA-Z0-9_", 0) ).!

  def dict[T: P] = P( "{" ~/ (str0 ~ ":" ~/ expr).rep(0, ",") ~ "}" )
    .map(kvs => Expr.Dict(kvs.toMap))
  def local[T: P] = P( "local" ~/ ident0 ~ "=" ~ expr ~ ";" ~ expr )
    .map(Expr.Local.apply)
  def func[T: P] = P( "function" ~/ "(" ~ ident0.rep(0, ",") ~ ")" ~ expr )
    .map(Expr.Func.apply)

  def plus[T: P] = P( "+" ~ prefixExpr )
  def call[T: P] = P( "(" ~/ expr.rep(0, ",") ~ ")" )

enum Value:
  case Str(s: String)
  case Dict(pairs: Map[String, Value])
  case Func(call: Seq[Value] => Value)

def evaluate(expr: Expr, scope: Map[String, Value]): Value = expr match
  case Expr.Str(s) => Value.Str(s)
  case Expr.Dict(kvs) => Value.Dict(kvs.map((k, v) => (k, evaluate(v, scope))))
  case Expr.Plus(left, right) =>
    val Value.Str(leftStr) = evaluate(left, scope).runtimeChecked
    val Value.Str(rightStr) = evaluate(right, scope).runtimeChecked
    Value.Str(leftStr + rightStr)
  case Expr.Local(name, assigned, body) =>
    val assignedValue = evaluate(assigned, scope)
    evaluate(body, scope + (name -> assignedValue))
  case Expr.Ident(name) => scope(name)
  case Expr.Call(expr, args) =>
    val Value.Func(call) = evaluate(expr, scope).runtimeChecked
    val evaluatedArgs = args.map(evaluate(_, scope))
    call(evaluatedArgs)
  case Expr.Func(argNames, body) =>
    Value.Func(args => evaluate(body, scope ++ argNames.zip(args)))

def serialize(v: Value): String = v.runtimeChecked match
  case Value.Str(s) => "\"" + s + "\""
  case Value.Dict(kvs) =>
    kvs.map((k, v) => "\"" + k + "\": " + serialize(v)).mkString("{", ", ", "}")

def jsonnet(input: String): String =
  serialize(evaluate(fastparse.parse(input, Parser.expr(using _)).get.value, Map.empty))20.54.scala
See example 20.4 - Jsonnet

20.6 Conclusion

In this chapter, we have walked through how to use Scala to implement an interpreter for a simple programming language: a subset of Jsonnet. We used FastParse to parse the source code string into an Expr syntax tree, recursively evaluated the Expr syntax tree into a Value data structure, and serialized the Value data structure to a JSON String.

Parsing and evaluating programming language source code is a very different use case than the distributed real-time file synchronizer we wrote in Chapter 18: Building a Real-time File Synchronizer. The style of Scala we have written is correspondingly different, utilizing recursive parsing and tree traversals rather than concurrent actor-based event pipelines. Nevertheless, the Scala language and libraries makes this easy, and in 70 lines of code we have a working implementation of a simple interpreted programming language.

While the interpreter we implemented in this chapter is relatively simple, the same techniques apply to implementing most interpreted programming languages, including production-ready implementations like google/jsonnet. Hopefully this chapter has given you an intuition for how a simple programming language is implemented. Even if you never need to implement your own programming language, these techniques may still be useful if you find yourself writing linters, program analyzers, query engines, and other such tools.

Exercise: Modify our Jsonnet interpreter to be able to handle basic numbers as well. For simplicity assume numbers can only be positive integers, and the only operation they support is addition via a + b.

jsonnetlocal bonus = 15000;
local person = function (name, baseSalary) {
  "name": name,
  "totalSalary": baseSalary + bonus
};
{"person1": person("Alice", 50000)}20.55.jsonnet
json{
  "person1": {
    "name": "Alice",
    "totalSalary": 65000
  }
}20.56.json
See example 20.5 - JsonnetNumbers

Exercise: Modify def serialize to generate the ujson.Values we used in Chapter 8: JSON and Binary Data Serialization, and make it use ujson.write(..., indent = 2) to pretty-print the output JSON rather than having it all generated on a single line

See example 20.6 - JsonnetPrettyPrinting

Exercise: Modify our Jsonnet interpreter to provide simple Jsonnet stack traces when the interpretation fails for whatever reason. You can use the Index parser from FastParse to capture the character-index of each Expr node we are parsing, and combine that with the input string to find the line and column number for any Expr node whose evaluation failed with an exception.

jsonnetlocal f = function(x) y;
f("abc")20.57.jsonnet
Jsonnet error at line 1 column 23
at line 2 column 2
at line 1 column 1
20.58.error
See example 20.7 - JsonnetStackTraces
Discuss Chapter 20 online at https://www.handsonscala.com/discuss/20