Part II: Local Development


6 Implementing Algorithms in Scala106
7 Files and Subprocesses126
8 JSON and Binary Data Serialization146
9 Self-Contained Scala Scripts162
10 Static Build Pipelines178

The second part of this book explores the core tools and techniques necessary for writing Scala applications that run on a single computer. We will cover algorithms, files and subprocess management, data serialization, scripts and build pipelines. This chapter builds towards a capstone project where we write an efficient incremental static site generator using the Scala language.

6

Implementing Algorithms in Scala


6.1 Merge Sort107
6.2 Prefix Tries111
6.3 Breadth First Search117
6.4 Shortest Paths120

def breadthFirstSearch[T](start: T, graph: Map[T, Seq[T]]): Set[T] = {
  val seen = collection.mutable.Set(start)
  val queue = collection.mutable.ArrayDeque(start)
  while (queue.nonEmpty) {
    val current = queue.removeHead()
    for (next <- graph(current) if !seen.contains(next)) {
      seen.add(next)
      queue.append(next)
    }
  }
  seen.toSet
}
</> 6.1.scala

Snippet 6.1: a simple breadth-first-search algorithm we will implement using Scala in this chapter

In this chapter, we will walk you through the implementation of a number of common algorithms using the Scala programming language. These algorithms are commonly taught in schools and tested at professional job interviews, so you have likely seen them before.

By implementing them in Scala, we aim to get you more familiar with using the Scala programming language to solve small problems in isolation. We will also see how some of the unique language features we saw in Chapter 5: Notable Scala Features can be applied to simplify the implementation of these well-known algorithms. This will prepare us for subsequent chapters which will expand in scope to include many different kinds of systems, APIs, tools and techniques.

7

Files and Subprocesses


7.1 Paths127
7.2 Filesystem Operations129
7.3 Folder Syncing133
7.4 Simple Subprocess Invocations137
7.5 Interactive and Streaming Subprocesses141

@ os.walk(os.pwd).filter(os.isFile).map(p => (os.size(p), p)).sortBy(-_._1).take(5)
res60: IndexedSeq[(Long, os.Path)] = ArrayBuffer(
  (6340270L, /Users/lihaoyi/test/post/Reimagining/GithubHistory.gif),
  (6008395L, /Users/lihaoyi/test/post/SmartNation/routes.json),
  (5499949L, /Users/lihaoyi/test/post/slides/Why-You-Might-Like-Scala.js.pdf),
  (5461595L, /Users/lihaoyi/test/post/slides/Cross-Platform-Development-in-Scala.js.pdf),
  (4576936L, /Users/lihaoyi/test/post/Reimagining/FluentSearch.gif)
)
</> 7.1.scala

Snippet 7.1: a short Scala code snippet to find the five largest files in a directory tree

Working with files and subprocesses is one of the most common things you do in programming: from the Bash shell, to Python or Ruby scripts, to large applications written in a compiled language. At some point everyone will have to write to a file or talk to a subprocess. This chapter will walk you through how to perform basic file and subprocess operations in Scala.

This chapter finishes with two small projects: building a simple file synchronizer, and building a streaming subprocess pipeline. These projects will form the basis for Chapter 17: Multi-Process Applications and Chapter 18: Building a Real-time File Synchronizer

8

JSON and Binary Data Serialization


8.1 Manipulating JSON147
8.2 JSON Serialization of Scala Data Types150
8.3 Writing your own Generic Serialization Methods154
8.4 Binary Serialization157

@ val output = ujson.Arr(
    ujson.Obj("hello" -> "world", "answer" -> 42),
    true
  )

@ output(0)("hello") = "goodbye"

@ output(0)("tags") = ujson.Arr("awesome", "yay", "wonderful")

@ println(output)
[{"hello":"goodbye","answer":42,"tags":["awesome","yay","wonderful"]},true]
</> 8.1.scala

Snippet 8.1: manipulating a JSON tree structure in the Scala REPL

Data serialization is an important tool in any programmer's toolbox. While variables and classes are enough to store data within a process, most data tends to outlive a single program process: whether saved to disk, exchanged between processes, or sent over the network. This chapter will cover how to serialize your Scala data structures to two common data formats - textual JSON and binary MessagePack - and how you can interact with the structured data in a variety of useful ways.

The JSON workflows we learn in this chapter will be used later in Chapter 12: Working with HTTP APIs and Chapter 14: Simple Web and API Servers, while the binary serialization techniques we learn here will be used later in Chapter 17: Multi-Process Applications.

9

Self-Contained Scala Scripts


9.1 Reading Files Off Disk163
9.2 Rendering HTML with Scalatags164
9.3 Rendering Markdown with Commonmark-Java166
9.4 Links and Bootstrap170
9.5 Optionally Deploying the Static Site174

os.write(
  os.pwd / "out" / "index.html",
  doctype("html")(
    html(
      body(
        h1("Blog"),
        for ((_, suffix, _) <- postInfo)
        yield h2(a(href := ("post/" + mdNameToHtml(suffix)))(suffix))
      )
    )
  )
)
</> 9.1.scala

Snippet 9.1: rendering a HTML page using the third-party Scalatags HTML library

Scala Scripts are a great way to write small programs. Each script is self-contained and can download its own dependencies when necessary, and make use of both Java and Scala libraries. This lets you write and distribute scripts without spending time fiddling with build configuration or library installation.

In this chapter, we will write a static site generator script that uses third-party libraries to process Markdown input files and generate a set of HTML output files, ready for deployment on any static file hosting service. This will form the foundation for Chapter 10: Static Build Pipelines, where we will turn the static site generator into an efficient incremental build pipeline by using the Mill build tool.

10

Static Build Pipelines


10.1 Mill Build Pipelines179
10.2 Mill Modules183
10.3 Revisiting our Static Site Script187
10.4 Conversion to a Mill Build Pipeline188
10.5 Extending our Static Site Pipeline192

import mill._

def srcs = T.source(millSourcePath / "src")

def concat = T{
  os.write(T.dest / "concat.txt",  os.list(srcs().path).map(os.read(_)))
  PathRef(T.dest / "concat.txt")
}
</> 10.1.scala

Snippet 10.1: the definition of a simple Mill build pipeline

Build pipelines are a common pattern, where you have files and assets you want to process but want to do so efficiently, incrementally, and in parallel. This usually means only re-processing files when they change, and re-using the already processed assets as much as possible. Whether you are compiling Scala, minifying Javascript, or compressing tarballs, many of these file-processing workflows can be slow. Parallelizing these workflows and avoiding unnecessary work can greatly speed up your development cycle.

This chapter will walk through how to use the Mill build tool to set up these build pipelines, and demonstrate the advantages of a build pipeline over a naive build script. We will take the the simple static site generator we wrote in Chapter 9: Self-Contained Scala Scripts and convert it into an efficient build pipeline that can incrementally update the static site as you make changes to the sources. We will be using the Mill build tool in several of the projects later in the book, starting with Chapter 14: Simple Web and API Servers.