10

Static Build Pipelines


10.1 Mill Build Pipelines181
10.2 Mill Modules185
10.3 Revisiting our Static Site Script189
10.4 Conversion to a Mill Build Pipeline190
10.5 Extending our Static Site Pipeline193

import mill.*

def srcs = Task.Source("src")

def concat = Task{
  os.write(Task.dest / "concat.txt",  os.list(srcs().path).map(os.read(_)))
  PathRef(Task.dest / "concat.txt")
}
10.1.scala

Snippet 10.1: the definition of a simple Mill build pipeline

Build pipelines are a common pattern, where you have files and assets you want to process but want to do so efficiently, incrementally, and in parallel. This usually means only re-processing files when they change, and re-using the already processed assets as much as possible. Whether you are compiling Scala, minifying Javascript, or compressing tarballs, many of these file-processing workflows can be slow. Parallelizing these workflows and avoiding unnecessary work can greatly speed up your development cycle.

This chapter will walk through how to use the Mill build tool to set up these build pipelines, and demonstrate the advantages of a build pipeline over a naive build script. We will take the the simple static site generator we wrote in Chapter 9: Self-Contained Scala Scripts and convert it into an efficient build pipeline that can incrementally update the static site as you make changes to the sources. We will be using the Mill build tool in several of the projects later in the book, starting with Chapter 14: Simple Web and API Servers.

10.1 Mill Build Pipelines

We will be using the Mill Build Tool to define our build pipelines. While Mill can be used to compile/test/package Scala code (which we will see in subsequent chapters) it can also be used as a general purpose tool for efficiently and incrementally keeping static assets up to date.

In this chapter we will be managing the compilation of markdown files into a HTML static site, but any workflow which you would like to do once and re-use can benefit from Mill. Minifying Javascript files, pre-processing CSS, generating source code, preparing tar or zip archives for deployment: these are all workflows which are slow enough you do not want to repeat them unnecessarily. Avoiding unnecessary re-processing is exactly what Mill helps you do.

10.1.1 Defining a Build Pipeline

To introduce the core concepts in Mill, we'll first look at a trivial Mill build that takes files in a source folder and concatenates them together:

build.millimport mill.*

def srcs = Task.Source("src")

def concat = Task{
  os.write(Task.dest / "concat.txt",  os.list(srcs().path).map(os.read(_)))
  PathRef(Task.dest / "concat.txt")
}10.2.scala

You can read this snippet as follows:

  • This build.mill file defines one set of sources, named srcs, and one downstream target named concat.

  • We make use of srcs in the body of concat via the srcs() syntax, which tells Mill that concat depends on srcs and makes the value of srcs available to concat.

  • Inside concat, we list the src/ folder, read all the files, and concatenate their contents into a single file concat.txt.

  • The final concat.txt file is wrapped in a PathRef, which tells Mill that we care not just about the name of the file, but also its contents.

This results in the following simple dependency graph

trie src/ src/ srcs srcs src/->srcs concat concat srcs->concat

10.1.1.1 Targets

The Mill build tool is built around the concept of targets. Targets are the nodes within the build graph, and represent individual steps that make up your build pipeline.

Every target you define via the def foo = Task{...} syntax gives you the following things for free:

  • It is made available for you to run from the command line via ./mill foo
  • The return value of the Task{...} block is made printable via ./mill show foo
  • It automatically evaluates any necessary upstream tasks, before its own Task{...} block evaluates
  • It automatically caches the computed result on disk, and only re-evaluates when its inputs change

In general, this helps automate a lot of the tedious book-keeping that is normally needed when writing incremental build scripts. Rather than spending your time writing command-line argument parsers or hand-rolling a caching and invalidation strategy, Mill handles all that for you and allows you to focus on the logical structure of your build.

10.1.1.2 Target Destination Folders

Note that the concat.txt file is created within the concat target's destination folder Task.dest. Every target has its own destination folder, named after the fully-qualified path to the target (in this case, out/concat.dest/). This means that we do not have to worry about the concat target's files being accidentally over-written by other targets.

In general, any Mill target should only create and modify files within its own Task.dest, to avoid collisions and interference with other targets. The contents of Task.dest are deleted each time before the target evaluates, (and only evaluates when changes are detected), ensuring that the target always starts each evaluation with a fresh destination folder and isn't affected by the outcome of previous evaluations.

10.1.2 Using Your Build Pipeline

We can install Mill in the current folder via curl, and create a src/ folder with some files inside:

$ REPO=https://repo1.maven.org/maven2/com/lihaoyi/mill-dist/1.0.6

$ curl -L "$REPO/mill-dist-1.0.6-mill.sh" -o mill

$ chmod +x mill

$ mkdir src

$ echo "hear me moo" > src/iamcow.txt

$ echo "hello world" > src/hello.txt
10.3.bash

We can now build the concat target, ask Mill to print the path to its output file, and inspect its contents:

$ ./mill concat

$ ./mill show concat
"ref:fd0201e7:/Users/lihaoyi/test/out/concat.dest/concat.txt"

$ cat out/concat.dest/concat.txt
hear me moo
hello world
10.4.bash

Mill re-uses output files whenever possible: in this case, since the concat target only depends on srcs, calling ./mill concat repeatedly returns the already generated concat.txt file. However, if we change the contents of the srcs by adding a new file to the folder, Mill automatically re-builds concat.txt to take the new input into account:

$ echo "twice as much as you" > src/iweigh.txt

$ ./mill concat

$ cat out/concat.dest/concat.txt
hear me moo
twice as much as you
hello world
10.5.scala
See example 10.1 - Simple

10.1.3 Non-linear Build Pipelines

While our build pipeline above only has one set of sources and one target, we can also define more complex builds. For example, here is an example build with 2 source folders (src/ and resources/) and 3 targets (concat, compress and zipped):

build.mill import mill.*
 def srcs = Task.Source("src")
+def resources = Task.Source("resources")

 def concat = Task{
   os.write(Task.dest / "concat.txt",  os.list(srcs().path).map(os.read(_)))
   PathRef(Task.dest / "concat.txt")
 }
+def compress = Task{
+  for p <- os.list(resources().path) do
+    val copied = Task.dest / p.relativeTo(resources().path)
+    os.copy(p, copied)
+    os.call(cmd = ("gzip", copied))
+  PathRef(Task.dest)
+}
+def zipped = Task{
+  val temp = Task.dest / "temp"
+  os.makeDir(temp)
+  os.copy(concat().path, temp / "concat.txt")
+  for p <- os.list(compress().path) do
+    os.copy(p, temp / p.relativeTo(compress().path))
+  os.call(cmd = ("zip", "-r", Task.dest / "out.zip", "."), cwd = temp)
+  PathRef(Task.dest / "out.zip")
+}10.6.scala

In addition to concatenating files, we also gzip compress the contents of our resources/ folder. We then take the concatenated sources and compressed resources and zip them all up into a final out.zip file:

trie src/ src/ srcs srcs src/->srcs concat concat srcs->concat zipped zipped concat->zipped resources/ resources/ resources resources resources/->resources compress compress resources->compress compress->zipped

Given files in both srcs/ and resources/:

$ mkdir resources

$ echo "# title" > resources/foo.md

$ echo "print(123)" > resources/thing.py
10.7.bash
$ find . -type f | grep -v out
.
./build.mill
./mill
./resources/foo.md
./resources/thing.py
./src/iamcow.txt
./src/iweigh.txt
./src/hello.txt
10.8.scala

We can run ./mill zipped and see the expected concat.txt and *.gz files in the output out.zip:

$ ./mill show zipped
"ref:a3771625:/Users/lihaoyi/test/out/zipped.dest/out.zip"

$ unzip -l out/zipped.dest/out.zip
Archive:  out/zipped.dest/out.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       35  11-30-2019 13:10   foo.md.gz
       45  11-30-2019 13:10   concat.txt
       40  11-30-2019 13:10   thing.py.gz
---------                     -------
      120                     3 files
10.9.bash

10.1.4 Incremental Re-Computation

As shown earlier, out.zip is re-used as long as none of the inputs (src/ and resources/) change. However, because our pipeline has two branches, the concat and compress targets are independent: concat is only re-generated if the src/ folder changes:

trie src/ src/ srcs srcs src/->srcs concat concat srcs->concat zipped zipped concat->zipped resources/ resources/ resources resources resources/->resources compress compress resources->compress compress->zipped

And the compress target is only re-generated if the resources/ folder changes:

trie src/ src/ srcs srcs src/->srcs concat concat srcs->concat zipped zipped concat->zipped resources/ resources/ resources resources resources/->resources compress compress resources->compress compress->zipped

While in these examples our Task{...} targets all returned PathRefs to files or folders, you can also define targets that return any JSON-serializable data type compatible with the uPickle library we went through in Chapter 8: JSON and Binary Data Serialization. Mill also supports a -j <n> flag to parallelize independent targets over multiple threads, e.g. ./mill -j 2 zipped would spin up 2 threads to work through the two branches of the target graph in parallel.

See example 10.2 - Nonlinear

10.2 Mill Modules

Mill also supports the concept of modules. You can use modules to define repetitive sets of build targets.

It is very common for certain sets of targets to be duplicated within your build: perhaps for every folder of source files, you want to compile them, lint them, package them, test them, and publish them. By defining a trait that extends Module, you can apply the same set of targets to different folders on disk, making it easy to manage the build for larger and more complex projects.

Here we are taking the set of srcs/resources and concat/compress/zipped targets we defined earlier and wrapping them in a trait FooModule so they can be re-used:

build.mill import mill.*
+trait FooModule extends Module:
   def srcs = Task.Source("src")
   def resources = Task.Source("resources")

   def concat = Task{ ... }
   def compress = Task{ ... }
   def zipped = Task{ ... }
+
+object bar extends FooModule
+object qux extends FooModule10.10.scala

object bar and object qux extend trait FooModule, and have source paths (accessible via the inherited moduleDir property) of bar/ and qux/ respectively. The srcs and resources definitions above thus point to the following folders:

  • bar/src/
  • bar/resources/
  • qux/src/
  • qux/resources/

You can ask Mill to list out the possible targets for you to build via ./mill resolve __ (that's two _s in a row):

$ ./mill resolve __
bar.compress
bar.concat
bar.resources
bar.srcs
bar.zipped
qux.compress
qux.concat
qux.resources
qux.srcs
qux.zipped
10.11.bash

Any of the targets above can be built from the command line, e.g. via

$ mkdir -p bar/src bar/resources

$ echo "Hello" > bar/src/hello.txt; echo "World" > bar/src/world.txt

$ ./mill show bar.zipped
"ref:efdf1f3c:/Users/lihaoyi/test/out/bar/zipped.dest/out.zip"

$ unzip out/bar/zipped.dest/out.zip
Archive:  out/bar/zipped.dest/out.zip
 extracting: concat.txt

$ cat concat.txt
Hello
World
10.12.bash
See example 10.3 - Modules

10.2.1 Nested Modules

Modules can also be nested to form arbitrary hierarchies:

build.mill//| mill-version: 1.0.6
import mill.*

trait FooModule extends Module:
  def srcs = Task.Source("src")

  def concat = Task{
    os.write(Task.dest / "concat.txt",  os.list(srcs().path).map(os.read(_)))
    PathRef(Task.dest / "concat.txt")
  }

object bar extends FooModule:
  object inner1 extends FooModule
  object inner2 extends FooModule

object wrapper extends Module:
  object qux extends FooModule10.13.scala

Here we have four FooModules: bar, bar.inner1, bar.inner2, and wrapper.qux. This exposes the following source folders and targets:

Source Folders

  • bar/src/
  • bar/inner1/src/
  • bar/inner2/src/
  • wrapper/qux/src/

Targets

  • bar.concat
  • bar.inner1.concat
  • bar.inner2.concat
  • wrapper.qux.concat

Note that wrapper itself is a Module but not a FooModule, and thus does not itself define a wrapper/src/ source folder or a wrapper.concat target. In general, every object in your module hierarchy needs to inherit from Module, although you can inherit from a custom subtype of FooModule if you want them to have some common targets already defined.

The moduleDir made available within each module differs: while in the top-level build pipelines we saw earlier moduleDir was always equal to os.pwd, within a module the moduleDir reflects the module path, e.g. the moduleDir of bar is bar/, the moduleDir of wrapper.qux is wrapper/qux/, and so on.

10.2.2 Cross Modules

The last basic concept we will look at is cross modules. These are most useful when the number or layout of modules in your build isn't fixed, but can vary based on e.g. the files on the filesystem:

build.mill//| mill-version: 1.0.6
import mill.*
import mill.api.BuildCtx, BuildCtx.workspaceRoot

val items = BuildCtx.watchValue{ os.list(workspaceRoot / "foo").map(_.last) }

object foo extends Cross[FooModule](items)
trait FooModule extends Cross.Module[String]:
  def moduleDir = super.moduleDir / crossValue
  def srcs = Task.Source("src")

  def concat = Task{
    os.write(Task.dest / "concat.txt",  os.list(srcs().path).map(os.read(_)))
    PathRef(Task.dest / "concat.txt")
  }10.14.scala

Here, we define a cross module foo that takes a set of items found by listing the sub-folders in foo/. This set of items is dynamic, and can change if the folders on disk change, without needing to update the build.mill file for every change.

Note the BuildCtx.watchValue call; this is necessary to tell Mill to take note in case the number or layout of modules within the foo/ folder changes. Without it, we would need to restart the Mill process using ./mill shutdown to pick up changes in how many entries the cross-module contains.

The Cross class that foo inherits is a Mill builtin, that automatically generates a set of Mill modules corresponding to the items we passed in.

10.2.3 Modules Based on Folder Layout

Typically, a cross module has the same moduleDir as an ordinary module, but in the example above it is overwritten to consider the crossValue as part of the path, giving each module a unique srcs directory.

As written, given a filesystem layout on the left, it results in the source folders and concat targets on the right:

$ mkdir -p foo/bar

$ mkdir -p foo/qux

$ find foo
foo
foo/bar
foo/qux
10.15.bash

sources

  • foo/bar/src
  • foo/qux/src

targets

$ ./mill resolve __.concat
foo[bar].concat
foo[qux].concat
10.16.bash

If we then add a new source folder via mkdir -p, Mill picks up the additional module and concat target:

$ mkdir -p foo/thing/src

$ ./mill resolve __.concat
foo[bar].concat
foo[qux].concat
foo[thing].concat
10.17.bash

10.3 Revisiting our Static Site Script

We have now gone through the basics of how to use Mill to define simple asset pipelines to incrementally perform operations on a small set of files. Next, we will return to the Blog.scala static site script we wrote in Chapter 9, and see how we can use these techniques to make it incremental: to only re-build the pages of the static site whose inputs changed since the last time they were built.

While Blog.scala works fine in small cases, there is one big limitation: the entire script runs every time. Even if you only change one blog post's .md file, every file will need to be re-processed. This is wasteful, and can be slow as the number of blog posts grows. On a large blog, re-processing every post can take upwards of 20-30 seconds: a long time to wait every time you tweak some wording!

It is possible to manually keep track of which .md file was converted into which .html file, and thus avoid re-processing .md files unnecessarily. However, this kind of book-keeping is tedious and easy to get wrong. Luckily, this is the kind of book-keeping and incremental re-processing that Mill is good at!

10.4 Conversion to a Mill Build Pipeline

We will now walk through a step by step conversion of this Blog.scala script file into a Mill build.mill. First, we must rename Blog.scala into build.mill to convert it into a Mill build pipeline and add the import mill.* declaration:

Blog.scala -> build.mill+import mill.*
 import scalatags.Text.all.*10.18.scala

Note that the dependencies on mainargs and os-lib are dropped, as these are automatically available in Mill.

Second, since we can rely on Mill invalidating and deleting stale files and folders as they fall out of date, we no longer need the os.remove.all and os.makeDir.all calls:

build.mill-os.remove.all(os.pwd / "out")
-os.makeDir.all(os.pwd / "out/post")10.19.scala

We will also remove the @main method wrapper and publishing code for now. mill build pipelines use a different syntax for taking command-line arguments than Scala files do, and porting this functionality to our Mill build pipeline is left as an exercise at the end of the chapter.

build.mill-import mainargs.*

-ParserForMethods(this).runOrExit(args) // `args` is available at the top-level

-def main(targetGitRepo: String = "") =
   ...
-
-  if targetGitRepo != "" then
-    os.call(cmd = ("git", "init"), cwd = os.pwd / "out")
-    os.call(cmd = ("git", "add", "-A"), cwd = os.pwd / "out")
-    os.call(cmd = ("git", "commit", "-am", "."), cwd = os.pwd / "out")
-    os.call(cmd = ("git", "push", targetGitRepo, "head", "-f"), cwd = os.pwd / "out")10.20.scala

10.4.1 For-Loop to Cross Modules

Third, we convert the for-loop that we previously used to iterate over the files in the postInfo list, and convert it into a cross module. That will allow every blog post's .md file to be processed, invalidated, and re-processed independently only if the original .md file changes:

build.mill-for (_, suffix, path) <- postInfo do
+object post extends Cross[PostModule](postInfo.map(_(0)))
+trait PostModule extends Cross.Module[String]:
+  def number = crossValue
+  val Some((_, suffix, markdownPath)) = postInfo.find(_(0) == number)
+  def path = Task.Source(markdownPath)
+  def render = Task{
     val parser = org.commonmark.parser.Parser.builder().build()
-    val document = parser.parse(os.read(path))
+    val document = parser.parse(os.read(path().path))
     val renderer = org.commonmark.renderer.html.HtmlRenderer.builder().build()
     val output = renderer.render(document)
     os.write(
-      os.pwd / "out/post" / mdNameToHtml(suffix),
+      Task.dest / mdNameToHtml(suffix),
       doctype("html")(
         ...
       )
     )
+    PathRef(Task.dest / mdNameToHtml(suffix))
+  }10.21.scala

Note how the items in the Cross[](...) declaration are the numbers corresponding to each post in our postInfo list. For each item, we define a source path which is the source file itself, as well as a def render target which is a PathRef to the generated HTML. In the conversion from a hardcoded script to a Mill build pipeline, all the hardcoded references writing files os.pwd / "out" have been replaced by the Task.dest of each target.

10.4.2 An Index Page Target

Fourth, we wrap the generation of the index.html file into a target as well:

build.mill+def links = Task.Input{ postInfo.map(_(1)) }
+
+def index = Task{
   os.write(
-    os.pwd / "out/index.html",
+    Task.dest / "index.html",
     doctype("html")(
       html(
         head(bootstrapCss),
         body(
           h1("Blog"),
-          for (_, suffix, _) <- postInfo do
+          for suffix <- links() do
           yield h2(a(href := ("post/" + mdNameToHtml(suffix)))(suffix))
         )
       )
     )
   )
+  PathRef(Task.dest / "index.html")
+}10.22.scala

Note that we need to define a def links target that is a Task.Input: this tells Mill that the contents of the postInfo.map expression may change (since it depends on the files present on disk) and to make sure to re-evaluate it every time to check for changes. Again, the hardcoded references to os.pwd / "out" have been replaced by the Task.dest of the individual target.

10.4.3 Arranging Files For Distribution

Lastly, we need to aggregate all our individual posts and the index.html file into a single target, which we will call dist (short for "distribution"):

build.mill+val posts = Task.sequence(postInfo.map(_(0)).map(post(_).render))
+
+def dist = Task {
+  for post <- posts() do
+    os.copy(post.path, Task.dest / "post" / post.path.last, createFolders = true)
+  os.copy(index().path, Task.dest / "index.html")
+
+  PathRef(Task.dest)
+}10.23.scala

This is necessary because while previously we created the HTML files for the individual posts and index "in place", now they are each created in separate Task.dest folders assigned by Mill so they can be separately invalidated and re-generated. Thus we need to copy them all into a single folder that we can open locally in the browser or upload to a static site host.

Note that we need to use the helper method Task.sequence to turn the Seq[T[PathRef]] into a T[Seq[PathRef]] for us to use in def dist.

10.4.4 Using Your Static Build Pipeline

We now have a static site pipeline with the following shape:

G 1 - My First\nPost.md 1 - My First Post.md post[1]\nrender post[1] render 1 - My First\nPost.md->post[1]\nrender dist dist post[1]\nrender->dist 2 - My Second\nPost.md 2 - My Second Post.md post[2]\nrender post[2] render 2 - My Second\nPost.md->post[2]\nrender post[2]\nrender->dist 3 - My Third\nPost.md 3 - My Third Post.md post[3]\nrender post[3] render 3 - My Third\nPost.md->post[3]\nrender post[3]\nrender->dist index index index->dist

We can now take the same set of posts we used earlier, and build them into a static website using ./mill. Note that the output is now in the out/dist.dest/ folder, which is the Task.dest folder for the dist target.

$ find post -type f
post/1 - My First Post.md
post/3 - My Third Post.md
post/2 - My Second Post.md

$ ./mill show dist
"ref:b33a3c95:/Users/lihaoyi/Github/blog/out/dist.dest"

$ find out/dist.dest -type f
out/dist.dest/index.html
out/dist.dest/post/my-first-post.html
out/dist.dest/post/my-second-post.html
out/dist.dest/post/my-third-post.html
10.24.bash

We can then open the index.html in our browser to view the blog. Every time you run ./mill dist, Mill will only re-process the blog posts that have changed since you last ran it. You can also use ./mill --watch dist or ./mill -w dist to have Mill watch the filesystem and automatically re-process the files every time they change.

See example 10.6 - Blog

10.5 Extending our Static Site Pipeline

Now that we've defined a simple pipeline, let's consider two extensions:

  • Download the bootstrap.css file at build time and bundle it with the static site, to avoid a dependency on the third party hosting service

  • Extract a preview of each blog post and include it on the home page

10.5.1 Bundling Bootstrap

Bundling bootstrap is simple. We define a bootstrap target to download the file and include it in our dist:

build.mill- val bootstrapCss = link(
-   rel := "stylesheet",
-   href := "https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.css"
- )
+ def bootstrap = Task{
+   os.write(
+     Task.dest / "bootstrap.css",
+     requests.get(
+       "https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.css"
+     )
+   )
+   PathRef(Task.dest / "bootstrap.css")
+ }10.25.scala
build.mill def dist = Task {
   for post <- posts() do
     os.copy(post.path, Task.dest / "post" / post.path.last, createFolders = true)
   os.copy(index().path, Task.dest / "index.html")
+  os.copy(bootstrap().path, Task.dest / "bootstrap.css")
   PathRef(Task.dest)
 }10.26.scala

And then update our two bootstrapCss links to use a local URL:

build.mill-head(bootstrapCss),
+head(link(rel := "stylesheet", href := "../bootstrap.css")),10.27.scala
build.mill-head(bootstrapCss),
+head(link(rel := "stylesheet", href := "bootstrap.css")),10.28.scala

Now, when you run ./mill dist, you can see that the bootstrap.css file is downloaded and bundled with your dist folder, and we can see in the browser that we are now using a locally-bundled version of Bootstrap:

$ find out/dist.dest -type f
out/dist.dest/bootstrap.css
out/dist.dest/index.html
out/dist.dest/post/my-first-post.html
out/dist.dest/post/my-second-post.html
out/dist.dest/post/my-third-post.html
10.29.bash

LocalBootstrap.png

Since it does not depend on any Task.Source, the bootstrap = Task{} target never invalidates. This is usually what you want when depending on a stable URL like bootstrap/4.5.0. If you are depending on something unstable that needs to be regenerated every build, define it as a Task.Input{} task.

We now have the following build pipeline, with the additional bootstrap step:

G 1 - My First\nPost.md 1 - My First Post.md post[1]\nrender post[1] render 1 - My First\nPost.md->post[1]\nrender dist dist post[1]\nrender->dist 2 - My Second\nPost.md 2 - My Second Post.md post[2]\nrender post[2] render 2 - My Second\nPost.md->post[2]\nrender post[2]\nrender->dist 3 - My Third\nPost.md 3 - My Third Post.md post[3]\nrender post[3] render 3 - My Third\nPost.md->post[3]\nrender post[3]\nrender->dist bootstrap bootstrap bootstrap->dist index index index->dist

10.5.2 Post Previews

To render a paragraph preview of each blog post in the index.html page, the first step is to generate such a preview for each PostModule. We will simply take everything before the first empty line in the Markdown file, treat that as the "first paragraph" of the post, and feed it through our Markdown parser:

build.mill trait PostModule extends Cross.Module[String]:
   def number = crossValue
   val Some((_, suffix, path)) = postInfo.find(_(0) == number)
   def path = Task.Source(markdownPath)
+  def preview = Task{
+    val parser = org.commonmark.parser.Parser.builder().build()
+    val firstPara = os.read.lines(path().path).takeWhile(_.nonEmpty)
+    val document = parser.parse(firstPara.mkString("\n"))
+    val renderer = org.commonmark.renderer.html.HtmlRenderer.builder().build()
+    val output = renderer.render(document)
+    output
+  }
   def render = Task{10.30.scala

Here we are leaving the preview as output: String rather than writing it to a file and using a PathRef.

Next, we need to aggregate the previews the same way we aggregated the renders earlier:

build.mill def links = Task.Input{ postInfo.map(_(1)) }
+val previews = Task.sequence(postInfo.map(_(0)).map(post(_).preview))
 def index = Task{10.31.scala

Lastly, in dist, zip the preview together with the postInfo in order to render them:

build.mill-for suffix <- links()
-yield h2(a(href := ("post/" + mdNameToHtml(suffix)))(suffix))
+for (suffix, preview) <- links().zip(previews())
+yield frag(
+  h2(a(href := ("post/" + mdNameToHtml(suffix)))(suffix)),
+  raw(preview) // include markdown-generated HTML "raw" without HTML-escaping
+)10.32.scala

Now we get pretty previews in index.html!

PostPreviews.png

The build pipeline now looks like:

G cluster_0 cluster_1 cluster_2 1 - My First\nPost.md 1 - My First Post.md post[1]\nrender post[1] render 1 - My First\nPost.md->post[1]\nrender post[1]\npreview post[1] preview 1 - My First\nPost.md->post[1]\npreview dist dist post[1]\nrender->dist 2 - My Second\nPost.md 2 - My Second Post.md post[2]\nrender post[2] render 2 - My Second\nPost.md->post[2]\nrender post[2]\npreview post[2] preview 2 - My Second\nPost.md->post[2]\npreview post[2]\nrender->dist 3 - My Third\nPost.md 3 - My Third Post.md post[3]\nrender post[3] render 3 - My Third\nPost.md->post[3]\nrender post[3]\npreview post[3] preview 3 - My Third\nPost.md->post[3]\npreview post[3]\nrender->dist index index post[1]\npreview->index post[2]\npreview->index post[3]\npreview->index bootstrap bootstrap bootstrap->dist index->dist

Note how we now have both post[n].preview and post[n].render targets, with the preview targets being used in index to generate the home page and the render targets only being used in the final dist. As we saw earlier, any change to a file only results in that file's downstream targets being re-generated. This saves time over naively re-generating the entire static site from scratch. It should also be clear the value that Mill Modules (10.2) bring, in allowing repetitive sets of targets like preview and render to be defined for all blog posts without boilerplate.

10.5.3 A Complete Static Site Pipeline

Here's the complete code, with the repetitive org.commonmark.parser.Parser.builder() code extracted into a shared def renderMarkdown function, and the repetitive HTML rendering code extracted into a shared def renderHtmlPage function:

build.mill//| mill-version: 1.0.6
//| mvnDeps:
//| - com.lihaoyi::scalatags:0.13.1
//| - org.commonmark:commonmark:0.26.0
import mill.*
import mill.api.BuildCtx, BuildCtx.workspaceRoot
import scalatags.Text.all.*

def mdNameToHtml(name: String) =
  name.replace(" ", "-").toLowerCase + ".html"

val postInfo = BuildCtx.watchValue {
  os.list(workspaceRoot / "post")
    .map: p =>
      val s"$prefix - $suffix.md" = p.last
      (prefix, suffix, p)
    .sortBy(_(0).toInt)
}
def bootstrap = Task{
  os.write(
    Task.dest / "bootstrap.css",
    requests.get(
      "https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.css"
    )
  )
  PathRef(Task.dest / "bootstrap.css")
}
def renderMarkdown(s: String) = {
  val parser = org.commonmark.parser.Parser.builder().build()
  val document = parser.parse(s)
  val renderer = org.commonmark.renderer.html.HtmlRenderer.builder().build()
  renderer.render(document)
}
def renderHtmlPage(dest: os.Path, bootstrapUrl: String, contents: Frag*) = {
  os.write(
    dest,
    doctype("html")(
      html(head(link(rel := "stylesheet", href := bootstrapUrl)), body(contents))
    )
  )
  PathRef(dest)
}
object post extends Cross[PostModule](postInfo.map(_(0)))
trait PostModule extends Cross.Module[String]:
  def number = crossValue
  val Some((_, suffix, markdownPath)) = postInfo.find(_(0) == number)
  def path = Task.Source(markdownPath)
  def preview = Task{
    renderMarkdown(os.read.lines(path().path).takeWhile(_.nonEmpty).mkString("\n"))
  }
  def render = Task{
    renderHtmlPage(
      Task.dest / mdNameToHtml(suffix),
      "../bootstrap.css",
      h1(a(href := "../index.html")("Blog"), " / ", suffix),
      raw(renderMarkdown(os.read(path().path)))
    )
  }

def links = Task.Input{ postInfo.map(_(1)) }
val posts = Task.sequence(postInfo.map(_(0)).map(post(_).render))
val previews = Task.sequence(postInfo.map(_(0)).map(post(_).preview))

def index = Task {
  renderHtmlPage(
    Task.dest / "index.html",
    "bootstrap.css",
    h1("Blog"),
    for (suffix, preview) <- links().zip(previews())
    yield frag(
      h2(a(href := ("post/" + mdNameToHtml(suffix)))(suffix)),
      raw(preview) // include markdown-generated HTML "raw" without HTML-escaping
    )
  )
}
def dist = Task {
  for post <- posts() do
    os.copy(post.path, Task.dest / "post" / post.path.last, createFolders = true)
  os.copy(index().path, Task.dest / "index.html")
  os.copy(bootstrap().path, Task.dest / "bootstrap.css")
  PathRef(Task.dest)
}10.33.scala

10.6 Conclusion

In this chapter, we have learned how to define simple incremental build pipelines using Mill. We then took the script in Chapter 9: Self-Contained Scala Scripts and converted it into a Mill build pipeline. Unlike a naive script, this pipeline allows fast incremental updates whenever the underlying sources change, along with easy parallelization, all in less than 90 lines of code. We have also seen how to extend the Mill build pipeline, adding additional build steps to do things like bundling CSS files or showing post previews, all while preserving the efficient incremental nature of the build pipeline.

Mill is a general-purpose build tool and can be used to create general-purpose build pipelines for all sorts of data. In later chapters we will be using the Mill build tool to compile Java and Scala source code into executables. For a more thorough reference, you can browse the Mill online documentation:

This chapter marks the end of the second section of this book: Part II Local Development. You should hopefully be confident using the Scala programming language to perform general housekeeping tasks on a single machine, manipulating files, subprocesses, and structured data to accomplish your goals. The next section of this book, Part III Web Services, will explore using Scala in a networked, distributed world: where your fundamental tools are not files and folders, but HTTP APIs, servers, and databases.

Exercise: Mill tasks can take also command line arguments, by defining def name(...) = Task.Command{...} methods. Similar to @main methods in Scala files, the arguments to name are taken from the command line. Define an Task.Command in our build.mill that allows the user to specify a remote git repository from the command line, and uses os.call operations to push the static site to that repository.

See example 10.8 - Push

Exercise: You can use the Puppeteer Javascript library to convert HTML web pages into PDFs, e.g. for printing or publishing as a book. Integrate Puppeteer into our static blog, using the subprocess techniques we learned in Chapter 7: Files and Subprocesses, to add a ./mill pdfs target that creates a PDF version of each of our blog posts.

Puppeteer can be installed via npm, and its docs can be found at:

The following script can be run via node, assuming you have the puppeteer library installed via NPM, and takes a src HTML file path and dest output PDF path as command line arguments to perform the conversion from HTML to PDF:

const puppeteer = require('puppeteer');
const [src, dest] = process.argv.slice(2)
puppeteer.launch().then(async function(browser){
  const page = await browser.newPage();
  await page.goto("file://" + src, {waitUntil: 'load'});
  await page.pdf({path: dest, format: 'A4'});
  process.exit(0)
})
10.34.javascript
See example 10.9 - PostPdf

Exercise: The Apache PDFBox library is a convenient way to manipulate PDFs from Java or Scala code, and can easily be added as a dependency with --dep for use with Scala CLI REPL or scripts via the coordinates org.apache.pdfbox:pdfbox:2.0.18. Add a new target to our build pipeline that uses the class org.apache.pdfbox.multipdf.PDFMergerUtility from PDFBox to concatenate the PDFs for each individual blog post into one long multi-page PDF that contains all of the blog posts one after another.

See example 10.10 - ConcatPdf
Discuss Chapter 10 online at https://www.handsonscala.com/discuss/10