| 8.1 Manipulating JSON | 149 |
| 8.2 JSON Serialization of Scala Data Types | 153 |
| 8.3 Writing your own Generic Serialization Methods | 157 |
| 8.4 Binary Serialization | 159 |
> val output = ujson.Arr(
ujson.Obj("hello" -> "world", "answer" -> 42),
true
)
> output(0)("hello") = "goodbye"
> output(0)("tags") = ujson.Arr("cool", "yay", "nice")
> println(output)
[{"hello":"goodbye","answer":42,"tags":["cool","yay","nice"]},true]
8.1.scala
Snippet 8.1: manipulating a JSON tree structure in the Scala REPL
Data serialization is an important tool in any programmer's toolbox. While variables and classes are enough to store data within a process, most data tends to outlive a single program process: whether saved to disk, exchanged between processes, or sent over the network. This chapter will cover how to serialize your Scala data structures to two common data formats - textual JSON and binary MessagePack - and how you can interact with the structured data in a variety of useful ways.
The JSON workflows we learn in this chapter will be used later in Chapter 12: Working with HTTP APIs and Chapter 14: Simple Web and API Servers, while the binary serialization techniques we learn here will be used later in Chapter 17: Multi-Process Applications.
The easiest way to work with JSON and binary serialization is through the uPickle library (and its uJson structured data type).
To begin, first download the sample JSON data at:
uJson and uPickle need to be added as library dependencies to use with the Scala CLI REPL,
which is handled by a command line flag --dep com.lihaoyi::upickle:4.4.2.
To explore the libraries, you can type ujson.<tab> and
upickle.<tab> to see the listing of available operations.
Given a JSON string, You can parse it into a ujson.Value using ujson.read:
> val jsonString = os.read(os.pwd / "ammonite-releases.json")
jsonString: String = """[
{
"url": "https://api.github.com/repos/.../releases/17991367",
"assets_url": "https://api.github.com/repos/.../releases/17991367/assets",
"upload_url": "https://uploads.github.com/repos/.../releases/17991367/assets",
...
"""
> val data = ujson.read(jsonString)
data: ujson.Value.Value = Arr(
ArrayBuffer(
Obj(
Map(
"url" -> Str("https://api.github.com/repos/.../17991367"),
"assets_url" -> Str("https://api.github.com/repos/.../17991367/assets"),
...
8.2.scala
You can also construct JSON data structures directly using the ujson.*
constructors. The constructors for primitive types like numbers, strings, and
booleans are optional and can be elided:
> val small = ujson.Arr(
ujson.Obj("hello" -> ujson.Str("world"), "answer" -> ujson.Num(42)),
ujson.Bool(true)
)
> val small = ujson.Arr(
ujson.Obj("hello" -> "world", "answer" -> 42),
true
)
8.3.scala
These can be serialized back to a string using the ujson.write function, or
written directly to a file without needing to first serialize it to a String
in-memory:
> println(ujson.write(small))
[{"hello":"world","answer":42},true]
> os.write(os.pwd / "out.json", small)
> os.read(os.pwd / "out.json")
res0: String = "[{\"hello\":\"world\",\"answer\":42},true]"
8.4.scalaA ujson.Value can be one of several types:
sealed trait Value
case class Str(value: String) extends Value
case class Obj(value: mutable.LinkedHashMap[String, Value]) extends Value
case class Arr(value: mutable.ArrayBuffer[Value]) extends Value
case class Num(value: Double) extends Value
sealed trait Bool extends Value
case object False extends Bool
case object True extends Bool
case object Null extends Value
8.5.scala
Value is a sealed trait, indicating that this set of classes and objects
encompass all the possible JSON Values. You can conveniently cast a
ujson.Value to a specific sub-type and get its internal data by using the
.bool, .num, .arr, .obj, or .str methods:
> data.
apply boolOpt num render update
arr formatted numOpt str value
arrOpt httpContentType obj strOpt writeBytesTo
bool isNull objOpt transform
8.6.scala
If you are working with your JSON as an opaque tree - indexing into it by index
or key, updating elements by index or key - you can do that directly using the
data(...) and data(...) = ... syntax.
You can look up entries in the JSON data structure using data(...) syntax,
similar to how you look up entries in an Array or Map:
> data(0)
res1: ujson.Value = Obj(
Map(
"url" -> Str("https://api.github.com/repos/.../17991367"),
"assets_url" -> Str("https://api.github.com/repos/.../17991367/assets"),
...
> data(0)("url")
res2: ujson.Value = Str(
"https://api.github.com/repos/lihaoyi/Ammonite/releases/17991367"
)
> data(0)("author")("id")
res3: ujson.Value = Num(2.0607116E7)
8.7.scala
ujson.Values are mutable:
> println(small)
[{"hello":"world","answer":42},true]
> small(0)("hello") = "goodbye"
> small(0)("tags") = ujson.Arr("cool", "yay", "nice")
> println(small)
[{"hello":"goodbye","answer":42,"tags":["cool","yay","nice"]},true]
8.8.scalaIf you want to assume your JSON value is of a particular type and do
type-specific operations like "iterate over array", "get length of string",
or "get keys of object", you need to use .arr, .str, or .obj to cast
your JSON structure to the specific type and extract the value.
For example, fetching and manipulating the fields of a ujson.Obj requires use
of .obj:
> small(0).obj.remove("hello")
> small.arr.append(123)
> println(small)
[{"answer":42,"tags":["cool","yay","nice"]},true,123]
8.9.scala
Extracting values as primitive types requires use of .str or .num. Note that
ujson.Nums are stored as doubles. You can call .toInt to convert the
ujson.Nums to integers:
> data(0)("url").str
res6: String = "https://api.github.com/repos/.../releases/17991367"
> data(0)("author")("id").num
res7: Double = 2.0607116E7
> data(0)("author")("id").num.toInt
res8: Int = 20607116
8.10.scalaIf the type of the JSON value is not parseable into the str or num type we
are expecting, the call throws a runtime exception.
To traverse over the
tree structure of the ujson.Value (8.1.1), we can use a
recursive function. For example, here is one that recursively traverses the
data we had parsed earlier, and collects all the ujson.Str nodes in the JSON
structure.
> def traverse(v: ujson.Value): Iterable[String] = v match
case a: ujson.Arr => a.arr.map(traverse).flatten
case o: ujson.Obj => o.obj.values.map(traverse).flatten
case s: ujson.Str => Seq(s.str)
case _ => Nil
> traverse(data)
res9: Iterable[String] = ArrayBuffer(
"https://api.github.com/repos/.../releases/17991367",
"https://api.github.com/repos/.../releases/17991367/assets",
"https://uploads.github.com/repos/.../releases/17991367/assets",
"https://github.com/.../releases/tag/1.6.8",
...
8.11.scalaOften you do not just want dynamically-typed JSON trees: rather, you usually
want Scala collections or case classes, with fields of known types.
Serializing values of type T is done by looking up given serializers of
type ReadWriter[T]. Some of these serializers are provided by the library,
while others you have to define yourself in your own code.
Given ReadWriters are already defined for most common Scala data types:
Ints, Doubles, Strings, Seqs, Lists, Maps, tuples, etc. You can thus
serialize and deserialize collections of primitives and other builtin types
automatically.
> val numbers = upickle.read[Seq[Int]]("[1, 2, 3, 4]")
numbers: Seq[Int] = List(1, 2, 3, 4)
> upickle.write(numbers)
res10: String = "[1,2,3,4]"
> val tuples = upickle.read[Seq[(Int, Boolean)]](
"[[1, true], [2, false]]"
)
tuples: Seq[(Int, Boolean)] = List((1, true), (2, false))
> upickle.write(tuples)
res11: String = "[[1,true],[2,false]]"
8.12.scala
Serialization is done via the Typeclass Inference technique we covered in Chapter 5: Notable Scala Features, and thus can work for arbitrarily deep nested data structures:
> val input = """{"weasel": ["i", "am"], "baboon": ["i", "r"]}"""
> val parsed = upickle.read[Map[String, Seq[String]]](input)
parsed: Map[String, Seq[String]] = Map(
"weasel" -> List("i", "am"),
"baboon" -> List("i", "r")
)
> upickle.write(parsed)
res12: String = "{\"weasel\":[\"i\",\"am\"],\"baboon\":[\"i\",\"r\"]}"
8.13.scalaTo convert a JSON structure into a case class, there are a few steps:
case class representing the fields and types you expect to be
present in the JSONupickle.ReadWriter for that case classupickle.read to deserialize the JSON structure.For example, the author value in the JSON data we saw earlier has the
following fields:
> println(ujson.write(data(0)("author"), indent = 4))
{
"login": "Ammonite-Bot",
"id": 20607116,
"node_id": "MDQ6VXNlcjIwNjA3MTE2",
"gravatar_id": "",
"type": "User",
"site_admin": false,
...
}
8.14.scala
Which can be (partially) modeled as the following case class:
> case class Author(login: String, id: Int, site_admin: Boolean) derives upickle.ReadWriter
For every case class you want to serialize, you have to define a contextual
upickle.ReadWriter to mark it as serializable. With Scala 3 we can use the
derives keyword, which generates a given ReadWriter that serializes and deserializes the
case class with its field names mapped to corresponding JSON object keys,
but you could also do it manually via a Mapped Serializer (8.2.3) if you need more
flexibility or customization.
> val author = upickle.read[Author](data(0)("author")) // read uJson
author: Author = Author(
login = "Ammonite-Bot",
id = 20607116,
site_admin = false
)
> author.login
res14: String = "Ammonite-Bot"
> val author2 = upickle.read[Author]( // read directly from a String
"""{"login": "lihaoyi", "id": 313373, "site_admin": true}"""
)
author2: Author = Author(login = "lihaoyi", id = 313373, site_admin = true)
> upickle.write(author2)
res15: String = "{\"login\":\"lihaoyi\",\"id\":313373,\"site_admin\":true}"
8.15.scala
Once you have defined a ReadWriter[Author], you can then also serialize and
de-serialize Authors as part of any larger data structure:
> upickle.read[Map[String, Author]]("""{
"haoyi": {"login": "lihaoyi", "id": 1337, "site_admin": true},
"bot": {"login": "ammonite-bot", "id": 31337, "site_admin": false}
}""")
res16: Map[String, Author] = Map(
"haoyi" -> Author(login = "lihaoyi", id = 1337, site_admin = true),
"bot" -> Author(login = "ammonite-bot", id = 31337, site_admin = false)
)
8.16.scalaIn general, you can serialize any arbitrarily nested tree of case classes,
collections, and primitives, as long as every value within that structure is
itself serializable.
uPickle allows you to easily construct given serializers for new types based
on existing ones. For example, by default uPickle does not have support for
serializing os.Paths:
> upickle.write(os.pwd)
-- [E172] Type Error: ----------------------------------------------------------
1 |upickle.write(os.pwd)
| ^
|No given instance of type upickle.Writer[os.Path] was found for a
|context parameter of method write in trait Api.
8.17.scala
The compiler will try to find compatible givens and suggest you import them.
However, because os.Paths can be trivially converted to and from Strings, we
can use the bimap function to construct a ReadWriter[os.Path] from the
existing ReadWriter[String]:
> given pathRw: upickle.ReadWriter[os.Path] =
upickle.readwriter[String].bimap[os.Path](
p => ... /* convert os.Path to String */,
s => ... /* convert String to os.Path */
)
8.18.scala
bimap needs you to specify what your existing serializer is (here String),
and what new type you want to serialize (os.Path), and provide conversion
functions to convert back and forth between the two types. In this case, we
could use the following converters:
> given pathRw: upickle.ReadWriter[os.Path] =
upickle.readwriter[String].bimap[os.Path](
p => p.toString,
s => os.Path(s)
)
8.19.scala
With this given pathRw defined, we can now serialize and deserialize
os.Paths. This applies recursively as well, so any case classes or
collections contain os.Path can now be serialized as well:
> val str = upickle.write(os.pwd)
str: String = "\"/Users/lihaoyi/test\""
> upickle.read[os.Path](str)
res17: os.Path = /Users/lihaoyi/test
> val str2 = upickle.write(Array(os.pwd, os.home, os.root))
str2: String = "[\"/Users/lihaoyi/test\",\"/Users/lihaoyi\",\"/\"]"
> upickle.read[Array[os.Path]](str2)
res18: Array[os.Path] = Array(
/Users/lihaoyi/test, /Users/lihaoyi, /
)
8.20.scala
If you want more flexibility in how your JSON is deserialized into your data
type, you can use upickle.readwriter[ujson.Value].bimap to work with
the raw ujson.Values:
> given pathRw: upickle.ReadWriter[Thing] =
upickle.readwriter[ujson.Value].bimap[Thing](
p => ... /* convert a Thing to ujson.Value */,
s => ... /* convert a ujson.Value to Thing */
)
8.21.scala
You then have full freedom in how you want to convert a ujson.Value into a
Thing, and how you want to serialize the Thing back into a ujson.Value.
You can define your own methods that are able to serialize (or deserialize)
values of various types by making them generic with a context bound of Reader,
Writer, or ReadWriter.
The key context bounds relevant to the uPickle serialization library are:
def foo[T: upickle.Reader]: allows use of upickle.read[T]
def foo[T: upickle.Writer]: allows use of upickle.write[T]
def foo[T: upickle.ReadWriter]: allows use of both upickle.read[T] and upickle.write[T]
As we discussed in Chapter 5: Notable Scala Features, the context bound syntax above is equivalent to the following context parameter:
def foo[T](using reader: upickle.Reader[T])
This allows the compiler to infer the parameter if it is not explicitly provided, and saves us the inconvenience of having to pass serializers around manually.
Using context bounds, we can write generic methods that can operate on any input type, as long as that type is JSON serializable. For example, if we want to write a method that serializes a value and prints out the JSON to the console, we can do that as follows:
> case class Asset(id: Int, name: String) derives upickle.ReadWriter
> def myPrintJson[T: upickle.Writer](t: T) = println(upickle.write(t))
8.22.scala> myPrintJson(Asset(1, "hello"))
{"id":1,"name":"hello"}
> myPrintJson(Seq(1, 2, 3))
[1,2,3]
> myPrintJson(Seq(Asset(1, "hello"), Asset(2, "goodbye")))
[{"id":1,"name":"hello"},{"id":2,"name":"goodbye"}]
8.23.scala
If we want to write a method that reads input from the console and parses it to JSON of a particular type, we can do that as well:
> def myReadJson[T: upickle.Reader](): T =
print("Enter some JSON: ")
upickle.read[T](Console.in.readLine())
> myReadJson[Seq[Int]]()
Enter some JSON: [1, 2, 3, 4, 5]
res19: Seq[Int] = List(1, 2, 3, 4, 5)
> myReadJson[Author]()
Enter some JSON: {"login": "Haoyi", "id": 1337, "site_admin": true}
res20: Author = Author("Haoyi", 1337, true)
8.24.scala
Note that when calling myReadJson(), we have to pass in the type parameter
[Seq[Int]] or [Author] explicitly, whereas when calling myPrintJson() the
compiler can infer the type parameter based on the type of the given value
Asset(1, "hello"), Seq(1, 2, 3), etc.
In general, we do not need a context bound when we are writing code that
operates on a single concrete type, as the compiler will already be able to
infer the correct concrete serializer. We only need a context bound if the
method is generic, to indicate to the compiler that it should be callable only
with concrete types that have an given Reader[T] or Writer[T] available
We will be using this ability to write generic methods dealing with serialization and de-serialization to write generic RPC (Remote Procedure Call) logic in Chapter 17: Multi-Process Applications.
The advantage of using context bounds over other ways of serializing data types is that they allow the serialization logic to be inferred statically. That has three consequences:
uPickle's serializers being resolved at compile time using Scala's givens
gives you the convenience of reflection-based frameworks with the performance of
hand-written serialization code.
Unlike hand-written serializers, the compiler does most of the busy-work constructing the serialization logic for you. You only need to teach it how to serialize and deserialize your basic primitives and collections and it will know how to serialize all combinations of these without additional boilerplate.
Unlike reflection-based serializers, uPickles serializers are fast: they avoid runtime reflection which has significant overhead in most languages, and can be optimized by the compiler to generate lean and efficient code to execute at run time.
The compiler is able to reject non-serializable data types early during compilation, rather than blowing up later after the code has been deployed to production. For example, trying to serialize an open process output stream results in a compile error telling us that what we are doing is invalid before the code even runs:
> myPrintJson(System.out)
-- [E172] Type Error: ----------------------------------------------------------
1 |myPrintJson(System.out)
| ^
|No given instance of type upickle.Writer[java.io.PrintStream] was
|found for a context parameter of method myPrintJson.
8.25.scalaBecause every upickle.read call has a statically-specified type, we
will never deserialize a value of unexpected type: this rules out a class of
security issues where an attacker can force your code to accidentally
deserialize an unsafe object able to compromise your application.
For example, if we accidentally try to deserialize a sun.misc.Unsafe instance
from JSON, we get an immediate compile time error:
> myReadJson[sun.misc.Unsafe]()
-- [E172] Type Error: ----------------------------------------------------------
1 |myReadJson[sun.misc.Unsafe]()
| ^
|No given instance of type upickle.Reader[sun.misc.Unsafe] was found
|for a context parameter of method myReadJson.
8.26.scala
In general, the Scala language allows you to check the serializability of your data structures at compile time, avoiding an entire class of bugs and security vulnerabilities. Rather than finding your serialization logic crashing or misbehaving in production due to an unexpected value appearing in your data structure, the Scala compiler surfaces these issues at compile time, making them much easier to diagnose and fix.
Apart from serializing Scala data types as JSON, uPickle also supports serializing them to compact MessagePack binary blobs. These are often more compact than JSON, especially for binary data that would need to be Base 64 encoded to fit in a JSON string, at the expense of losing human readability.
Serializing data structures to binary blobs is done via the writeBinary and
readBinary methods:
> val blob = upickle.writeBinary(Author("haoyi", 31337, true))
blob: Array[Byte] = Array(-125, -91, 108, 111, ...)
> upickle.readBinary[Author](blob)
res21: Author = Author(login = "haoyi", id = 31337, site_admin = true)
8.27.scala
writeBinary and readBinary work on any data type that can be converted
to JSON. The following example demonstrates
serialization and de-serialization of Map[Int, List[Author]]s:
> val data = Map(
1 -> Nil,
2 -> List(Author("haoyi", 1337, true), Author("lihaoyi", 31337, true))
)
> val blob2 = upickle.writeBinary(data)
blob2: Array[Byte] = Array(-126, 1, -112, 2, -110, ...)
> upickle.readBinary[Map[Int, List[Author]]](blob2)
res22: Map[Int, List[Author]] = Map(
1 -> List(),
2 -> List(
Author(login = "haoyi", id = 1337, site_admin = true),
Author(login = "lihaoyi", id = 31337, site_admin = true)
)
)
8.28.scala
Unlike JSON, MessagePack binary blobs are not human readable by default:
Array(-110, -110, 1, -112, ...) is not something you can quickly skim and see
what it contains! If you are working with a third-party server returning
MessagePack binaries with an unknown or unusual structure, this can make it
difficult to understand what a MessagePack blob contains so you can properly
deserialize it.
To help work with the MessagePack blobs of unknown structure, uPickle comes with
a uPack library that lets you read the blobs into an in-memory upack.Msg
structure (similar to ujson.Value) that is easy to inspect:
> upack.read(blob)
res23: upack.Msg = Obj(
Map(Str("login") -> Str("haoyi"), Str("id") -> Int32(31337), Str("site_admin") -> True)
)
8.29.scala> upack.read(blob2)
res24: upack.Msg = Obj(
Map(
Int32(1) -> Arr(ArrayBuffer()),
Int32(2) -> Arr(
ArrayBuffer(
Obj(
Map(
Str("login") -> Str("haoyi"),
Str("id") -> Int32(1337),
Str("site_admin") -> True
)
),
...
8.30.scala
Reading the binary blobs into upack.Msgs is a great debugging tool, and can
help you figure out what is going on under the hood if your
writeBinary/readBinary serialization is misbehaving.
Like ujson.Values, you can manually construct upack.Msg from scratch using
their constituent parts upack.Arr, upack.Obj, upack.Bool, etc. This can be
useful if you need to interact with some third-party systems and need full
control of the MessagePack messages you are sending:
> val msg = upack.Obj(
upack.Str("login") -> upack.Str("haoyi"),
upack.Str("id") -> upack.Int32(31337),
upack.Str("site_admin") -> upack.True
)
> val blob3 = upack.write(msg)
blob3: Array[Byte] = Array(-125, -91, 108, 111, ...)
> val deserialized = upickle.readBinary[Author](blob3)
deserialized: Author = Author(
login = "haoyi",
id = 31337,
site_admin = true
)
8.31.scalaSerializing data is one of the core tools that any programmer needs to have.
This chapter introduces you to the basics of working with data serialization in
a Scala program, using the uPickle library. uPickle focuses on providing
convenient serialization for built in data structures and user-defined case classes, though with Mapped Serializers (8.2.3) you can extend
it yourself to support any arbitrary data type. For more details on using the
uPickle serialization library to work with JSON or MessagePack data, you can
refer to the reference documentation:
uPickle is also available for you to use in projects built using Mill or other build tools at the following coordinates:
Millmvn"com.lihaoyi::upickle:4.4.2"
We will use the JSON APIs we learned in this chapter later in Chapter 12: Working with HTTP APIs, Chapter 14: Simple Web and API Servers, and use the MessagePack binary serialization techniques in Chapter 17: Multi-Process Applications.
There are many other JSON or binary serialization libraries in the Scala ecosystem. For simplicity the rest of this book will be using uPickle, but you can try these other libraries if you wish:
This flow chart covers most of the common workflows working with textual JSON and binary MessagePack data in Scala:
Exercise: Given a normal class class Foo(val i: Int, val s: String) with two public
fields, using the bimap method we saw earlier to define a given
ReadWriter for it to allow instances to be serialized to Javascript objects
{"i": ..., "s": ...}.
Exercise: Often JSON data structures have fields that you do not care about, and make
skimming through the JSON verbose and tedious: e.g. the
ammonite-releases.json we receive from Github comes loaded with lots of
verbose and often-not-very-useful URLs shown below.
"followers_url": "https://api.github.com/users/Ammonite-Bot/followers",
"following_url": "https://api.github.com/users/Ammonite-Bot/following{/other_user}",
"gists_url": "https://api.github.com/users/Ammonite-Bot/gists{/gist_id}",
8.32.json
Write a method that takes a ujson.Value, and removes any values which are
strings beginning with "https://". You can do so either in a mutable or
immutable style: either modifying the ujson.Value in place, or constructing
and returning a new ujson.Value with those values elided.