9

Self-Contained Scala Scripts


9.1 Reading Files Off Disk165
9.2 Rendering HTML with Scalatags166
9.3 Rendering Markdown with Commonmark-Java168
9.4 Links and Bootstrap172
9.5 Optionally Deploying the Static Site176

os.write(
  os.pwd / "out/index.html",
  doctype("html")(
    html(
      body(
        h1("Blog"),
        for (_, suffix, _) <- postInfo
        yield h2(a(href := ("post/" + mdNameToHtml(suffix)))(suffix))
      )
    )
  )
)
9.1.scala

Snippet 9.1: rendering a HTML page using the third-party Scalatags HTML library

Scala CLI Scripts are a great way to write small programs. Each script is self-contained and can download its own dependencies when necessary, and make use of both Java and Scala libraries. This lets you write and distribute scripts without spending time fiddling with build configuration or library installation.

In this chapter, we will write a static site generator script that uses third-party libraries to process Markdown input files and generate a set of HTML output files, ready for deployment on any static file hosting service. This will form the foundation for Chapter 10: Static Build Pipelines, where we will turn the static site generator into an efficient incremental build pipeline by using the Mill build tool.

We will start from the simplest possible working script:

Blog.scalaprintln("Hello!")
$ ./mill Blog.scala
Hello!
9.2.bash

Starting from this simple script, we will extend it to:

  • Read markdown files from the filesystem
  • Generate the HTML page skeleton using the Scalatags library
  • Parse and render the markdown using the Atlassian Commonmark library
  • Link our pages together using hyperlinks and add some CSS to make it look good
  • Add an optional flag that deploys the static site to a Git repository

9.1 Reading Files Off Disk

Typically, static site generators take their input as markdown files, possibly with additional metadata, and use that to generate HTML. For this exercise, let's assume that there will be a post/ folder that contains any markdown files we want to convert into blog posts, and each one will be named following the convention:

  • 1 - My First Post.md
  • 2 - My Second Post.md
  • 3 - My Third Post.md
  • etc.

The number before the " - " indicating the order of the blog post in the final site, while the text after indicates the title of the post. We can create some sample posts at the command line as follows:

$ mkdir post
$ touch "post/1 - My First Post.md"
$ touch "post/2 - My Second Post.md"
$ touch "post/3 - My Third Post.md"

$ ls
Blog.scala	post

$ ls post/
1 - My First Post.md	2 - My Second Post.md	3 - My Third Post.md
9.3.bash

Finding these posts is easy with the filesystem operations provided by OS-Lib, which can be imported directly into the script file with the //> using dep ... directive at the top of the file:

Blog.scala//> using dep com.lihaoyi::os-lib:0.11.6
postInfo = os
  .list(os.pwd / "post")
  .map: p =>
    val s"$prefix - $suffix.md" = p.last
    (prefix, suffix, p)
  .sortBy(_(0).toInt)

println("POSTS")
postInfo.foreach(println)9.4.scala

Here, we are listing all files in the post/ folder, splitting their file names on -, and using the two segments to be the number and name of our blog posts. We can run the script to see that it is able to understand the layout of the blog posts, extract their number and name, and sort them in order.

$ ./mill --watch Blog.scala
POSTS
(1,My First Post,/Users/haoyi/test/posts/1 - My First Post.md)
(2,My Second Post,/Users/haoyi/test/posts/2 - My Second Post.md)
(3,My Third Post,/Users/haoyi/test/posts/3 - My Third Post.md)
Program exited with return code 0.
Watching sources, press Ctrl+C to exit, or press Enter to re-run.
9.5.bash

Note that in the above snippet we used the --watch flat with the scala command. flag. With --watch, pressing Enter will then re-run the script, for example if you expect changes in the post/ folder, It will also re-run when editing the script's source file.

The flag can make iterating on the script a much smoother experience than having to re-run the script manually each time.

See example 9.1 - Printing

9.2 Rendering HTML with Scalatags

A static site generator needs to generate a static site, and static sites are made of HTML pages. We could generate HTML by stitching together strings like "<div>" + content + "</div>", but doing so is tedious and unsafe, prone to XSS injection if we're not careful. Luckily, in Scala CLI Scripts we can easily import whatever Java or Scala libraries we want: for now we will import the Scalatags HTML generation library:

With the directive //> using dep com.lihaoyi::scalatags:0.13.1. //> using dep ... is a special syntax available in Scala CLI scripts for downloading a third-party dependency. In this case, we use it to download the Scalatags HTML library. After //> using dep ..., we then import scalatags.Text.all.* to bring the necessary functions into scope, which lets us use Scalatags to generate our first HTML file:

Blog.scala+//| mvnDeps:
+//| - com.lihaoyi::scalatags:0.13.1
+import scalatags.Text.all.*
-println("POSTS")
-postInfo.foreach(println)
+os.remove.all(os.pwd / "out")
+os.makeDir.all(os.pwd / "out/post")
+os.write(
+  os.pwd / "out/index.html",
+  doctype("html")(
+    html(
+      body(
+        h1("Blog"),
+        for (_, suffix, _) <- postInfo
+        yield h2(suffix)
+      )
+    )
+  )
+)9.6.scala
See example 9.2 - Index

This snippet writes a small HTML blob, with <html>, <body> and <h1>/<h2> tags to an out/index.html file. Each function call html(...), body(...), h1(...), etc. defines a pair of opening/closing HTML tags, with the strings such as "Blog" becoming the textual contents of the enclosing tag. Put together, this constructs a scalatags fragment, or Frag. Scalatags Frags satisfy the Writable interface, and can be directly written to a file via os.write, or serialized into an in-memory String via .render.

For now we will be using an out/ folder to store all our output HTML files, so every run we will first delete the out/ folder with os.remove.all and re-create the out/ and out/post/ folders with os.makeDir.all in preparation for writing the HTML files.

We can run the script to see it in action:

$ ./mill Blog.scala

$ cat out/index.html
<!DOCTYPE html><html><head></head><body>
<h1>Haoyi's Blog</h1><h2>My First Post.md</h2><h2>My Second Post.md</h2>
<h2>My Third Post.md</h2></body></html>
9.7.bash

FirstHtml.png

9.3 Rendering Markdown with Commonmark-Java

While the skeleton of the page is written in HTML using Scalatags, for long-form blog posts it is more convenient to write them in Markdown. As sample blog posts, we will take some generic text from the Github Markdown Guide:

post/1 - My First Post.mdSometimes you want numbered lists:

1. One
2. Two
3. Three

Sometimes you want bullet points:

* Start a line with a star
* Profit!9.8.md
post/2 - My Second Post.md# Structured documents

Sometimes it's useful to have different levels of headings to structure your
documents. Start lines with a `#` to create headings. Multiple `##` in a row
denote smaller heading sizes.

### This is a third-tier heading9.9.md
post/3 - My Third Post.md
There are many different ways to style code with GitHub's markdown. If you have
inline code blocks, wrap them in backticks: `var example = true`.  If you've got
a longer block of code, you can indent with four spaces:

    if (isAwesome) {
      return true
    }9.10.md

Perhaps not the most insightful thought-pieces, but they will do for now. The next question would be, how can we parse the markdown? There are perfectly good markdown parsers in Java, and we can pick any we want to use from Scala. For now, we will use the commonmark/commonmark-java library.

The linked readme gives you the maven snippet necessary to use this parser:

<dependency>
    <groupId>org.commonmark</groupId>
    <artifactId>commonmark</artifactId>
    <version>0.26.0</version>
</dependency>
9.11.xml

This directly corresponds to the //> using dep directive:

//> using dep org.commonmark:commonmark:0.26.0

Note that it's a single : between the groupId and the artifactId, as this is a Java library (Scala libraries like Scalatags need a double ::)

9.3.1 Translating Java Snippets to Scala

The Commonmark-Java library gives us some Java sample code to get started using the library:

import org.commonmark.node.*;
import org.commonmark.parser.Parser;
import org.commonmark.renderer.html.HtmlRenderer;

Parser parser = Parser.builder().build();
Node document = parser.parse("This is *Sparta*");
HtmlRenderer renderer = HtmlRenderer.builder().build();
renderer.render(document);  // "<p>This is <em>Sparta</em></p>\n"
9.12.java

Translating this Java code into Scala basically involves replacing all the local variables with vals. Using --dep with the Scala CLI REPL lets us easily test this:

$ ./mill --import "org.commonmark:commonmark:0.26.0" --repl

> val parser = org.commonmark.parser.Parser.builder().build()

> val document = parser.parse("This is *Sparta*")

> val renderer = org.commonmark.renderer.html.HtmlRenderer.builder().build()

> val output = renderer.render(document)
output: String = """<p>This is <em>Sparta</em></p>
"""
9.13.scala

Now that we have it working, we can use this in our code: reading the .md files, transforming them into HTML and writing them into HTML files:

Blog.scala //| mvnDeps:
 //| - com.lihaoyi::scalatags:0.13.1
+//| - org.commonmark:commonmark:0.26.0
 import scalatags.Text.all.*9.14.scala
Blog.scala+def mdNameToHtml(name: String) =
+  name.replace(" ", "-").toLowerCase + ".html"
+
+for (_, suffix, path) <- postInfo do
+  val parser = org.commonmark.parser.Parser.builder().build()
+  val document = parser.parse(os.read(path))
+  val renderer = org.commonmark.renderer.html.HtmlRenderer.builder().build()
+  val output = renderer.render(document)
+  os.write(
+    os.pwd / "out/post" / mdNameToHtml(suffix),
+    doctype("html")(
+      html(
+        body(
+          h1("Blog", " / ", suffix),
+          raw(output)
+        )
+      )
+    )
+  )9.15.scala

You can see the new for loop in the middle with all the code adapted from the commonmark/commonmark-java docs, basically verbatim. We are converting the "raw" names of the files to URL-friendly names in the mdNameToHtml method. For now we will ignore the possibility of name collisions.

Note that we are including the generating HTML strings provided by commonmark/commonmark-java wrapped in raw(...). By default any strings we include in Scalatags fragments are sanitized, which protects you from Cross-Site Scripting and other attacks that arise from unexpected HTML being present in the strings. However, in this case we want the HTML in the rendered markdown, and thus we use raw(...) to opt-in to including the rendered markdown un-sanitized.

9.3.2 Testing our Java Markdown Parser

Running this, it will download the commonmark/commonmark-java library the first time, and use it to render our markdown blog posts to HTML:

$ ./mill Blog.scala

$ find out -type f
out
out/index.html
out/post/my-first-post.html
out/post/my-second-post.html
out/post/my-third-post.html
9.16.bash

We can see on the filesystem that our my-first-post.html and my-second-post.html files are all in place. We can browse the generated HTML below:

$ cat out/post/my-first-post.html
<!DOCTYPE html><html><body>
<h1>Blog / My First Post</h1>
<p>Sometimes you want numbered lists:</p>
<ol>
<li>One</li>
<li>Two</li>
<li>Three</li>
</ol>
<p>Sometimes you want bullet points:</p>
<ul>
<li>Start a line with a star</li>
<li>Profit!</li>
</ul>
</body></html>
9.17.xml

Or open them in the browser:

Post.png

See example 9.3 - Markdown

To turn our folder of generated HTML files into a proper static site, we need to add links between the pages. At the very least, we need links from the index.html page to each individual blog post, and a link from each post back to the index.html page. This is a matter of adding an <a href> tag inside the h2 header tags in index.html, with the href being the relative path to the HTML file of each individual blog post:

Blog.scala     html(
       body(
         h1("Blog"),
         for (_, suffix, _) <- postInfo
-        yield h2(suffix)
+        yield h2(a(href := ("post/" + mdNameToHtml(suffix)), suffix))
       )
     )9.18.scala

:= is a custom operator provided by the Scalatags, and is used to specify HTML attributes and styles. We need to perform a similar change on each individual post's HTML page:

Blog.scala       html(
         body(
-          h1("Blog", " / ", suffix),
+          h1(a(href := "../index.html")("Blog"), " / ", suffix),
           raw(output)
         )
       )9.19.scala

After re-generating the HTML files using ./mill Blog.scala, we can see that the post listing in index.html links to the respective post's HTML files:

IndexLinkToPost.png

And each individual post has the Blog header at the top left link back to the index.html page:

PostLinkToIndex.png

See example 9.4 - Links

9.4.2 Bootstrap

To make our static site look a bit more presentable, we can layer on some Bootstrap CSS over our ugly unstyled page, in order to pretty it up. The Bootstrap getting started page provides the following HTML fragment:

<link
    rel="stylesheet"
    href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.css"
>
9.20.html

Which translates into the following Scalatags fragments that we need to include in our head tag:

Blog.scala+val bootstrapCss = link(
+  rel := "stylesheet",
+  href := "https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.css"
+)9.21.scala
Blog.scala   os.write(
     os.pwd / "out/post" / mdNameToHtml(suffix),
     doctype("html")(
       html(
+        head(bootstrapCss),
         body(
           h1(a("Blog", href := "../index.html"), " / ", suffix),9.22.scala
Blog.scala os.write(
   os.pwd / "out/index.html",
   doctype("html")(
     html(
+      head(bootstrapCss),
       body(
         h1("Blog"),9.23.scala

Here, we're including the link as described in the Bootstrap docs, converted it to Scalatags template syntax. We can see it take effect in the browser:

Bootstrap.png

You can paste this into a Blog.scala file, put markdown files with the naming convention 1 - Hello.md 2 - Post.md in the post/ folder, and run ./mill Blog.scala to generate the HTML pages. Once generated, those pages can go anywhere: viewed locally, pushed to github pages, or deployed elsewhere. The website we are generating is static and can be deployed on any static content host.

The first time you run the script, it will take 2-3 seconds to compile. After that first compile, executing the script should take about a second. You can edit the markdown files and the HTML pages will be re-generated quickly.

See example 9.5 - Bootstrap

9.5 Optionally Deploying the Static Site

Scala CLI Scripts can read command line arguments from a magic args variable, returning an array of strings corresponding to arguments passed after --.

Using the Mainargs library, we can process these arguments, and even produce a help text automatically. We will define a method with the @mainargs.main annotation for our Blog.scala script that allows the user to specify a remote Git repository from the command line, and uses os.call operations to push the static site to that repository.

This can be done as follows:

Blog.scala //| mvnDeps:
 //| - using dep com.lihaoyi::scalatags:0.13.1
 //| - org.commonmark:commonmark:0.26.0
 import scalatags.Text.all.*
+import mainargs.*

+ParserForMethods(this).runOrExit(args) // `args` is available at the top-level

+def main(targetGitRepo: String = "") =
   ...
+
+  if targetGitRepo != "" then
+    os.call(cmd = ("git", "init"), cwd = os.pwd / "out")
+    os.call(cmd = ("git", "add", "-A"), cwd = os.pwd / "out")
+    os.call(cmd = ("git", "commit", "-am", "."), cwd = os.pwd / "out")
+    os.call(cmd = ("git", "push", targetGitRepo, "head", "-f"), cwd = os.pwd / "out")9.24.scala

Rather than writing our code top-level, we put it inside a @main method that takes parameters. Parameter types can be simple primitives (Strings, Ints, Booleans, etc.) and parameters can have default values to make them optional. In this case, the default value targetGitRepo = "" simply skips the deployment step if the user does not pass in that argument from the command line.

Blog.scala can now be called as follows:

$ ./mill Blog.scala # Generate the static site, do not deploy it

$ ./mill Blog.scala --target-git-repo git@github.com:lihaoyi/test.git

$ ./mill Blog.scala --help
main
  --target-git-repo <str>
9.25.bash

This code is suitable for deployment to a Github Pages site, where the static site content is hosted in a Git repository. If you want to deploy elsewhere, it should be straightforward to adapt to whatever deployment logic you need

The final static site now looks like this:

Blog.scala//| mvnDeps:
//| - com.lihaoyi::scalatags:0.13.1
//| - org.commonmark:commonmark:0.26.0
import scalatags.Text.all.*

def main(targetGitRepo: String = "") =
  val postInfo = os
    .list(os.pwd / "post")
    .map: p =>
      val s"$prefix - $suffix.md" = p.last
      (prefix, suffix, p)
    .sortBy(_(0).toInt)

  def mdNameToHtml(name: String) =
    name.replace(" ", "-").toLowerCase + ".html"

  val bootstrapCss = link(
    rel := "stylesheet",
    href := "https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.css"
  )

  os.remove.all(os.pwd / "out")
  os.makeDir.all(os.pwd / "out/post")

  for (_, suffix, path) <- postInfo do
    val parser = org.commonmark.parser.Parser.builder().build()
    val document = parser.parse(os.read(path))
    val renderer = org.commonmark.renderer.html.HtmlRenderer.builder().build()
    val output = renderer.render(document)
    os.write(
      os.pwd / "out/post" / mdNameToHtml(suffix),
      doctype("html")(
        html(
          head(bootstrapCss),
          body(
            h1(a(href := "../index.html")("Blog"), " / ", suffix),
            raw(output)
          )
        )
      )
    )

  os.write(
    os.pwd / "out/index.html",
    doctype("html")(
      html(
        head(bootstrapCss),
        body(
          h1("Blog"),
          for (_, suffix, _) <- postInfo
          yield h2(a(href := ("post/" + mdNameToHtml(suffix)), suffix))
        )
      )
    )
  )

  if targetGitRepo != "" then
    os.call(cmd = ("git", "init"), cwd = os.pwd / "out")
    os.call(cmd = ("git", "add", "-A"), cwd = os.pwd / "out")
    os.call(cmd = ("git", "commit", "-am", "."), cwd = os.pwd / "out")
    os.call(cmd = ("git", "push", targetGitRepo, "head", "-f"), cwd = os.pwd / "out")9.26.scala
See example 9.6 - Deploy

9.6 Conclusion

In this chapter, we've written a Scala Script that implements a small static blog generator. It takes Markdown files in a folder and renders them into a HTML website we can host online. We used the filesystem APIs we learned about in Chapter 7: Files and Subprocesses, along with the third-party Scala library Scalatags to render HTML and the third-party Java library Commonmark to parse and render the markdown. The end result was a single self-contained script that can download its own dependencies and run in any environment without prior setup. Scala scripts are a great way to "try out" third party libraries, testing them out without the complexity of a larger project or build system.

Next, in Chapter 10: Static Build Pipelines, we will re-visit this static site generator in order to make it incremental. This will allow us to re-use previously rendered HTML files, speeding up re-generation of the static site even as the number of posts and pages grows. We will re-visit the Scalatags HTML library in Chapter 14: Simple Web and API Servers, and use it to render the HTML of our interactive chat website.

Exercise: Use the filesystem last-modified timestamp for each blog post's .md file and use it to automatically provide a "Written On YYYY-MM-DD" annotation at the bottom of each blog post, as well as under each post's preview on the index page

See example 9.7 - DeployTimestamp
Discuss Chapter 9 online at https://www.handsonscala.com/discuss/9