Skip to content

You can now browse my site by tags!

Ever since I saw the way Irregular Webcomic lets one browse through the list of strips by tag ("theme"), I knew I wanted to do something similar for my blog too. The idea is that for each tag, I can have a separate "Previous"/"Next" link, instead of only having a Previous/Next link for moving between successive blog articles. Sort of like a skip list! 😁

That sounds simple enough... if I didn't have a wildly complex website build system which insists that each Markdown file is separately used to produce an HTML file, in isolation, without access to all other Markdown files.

In an earlier post, I explored the newest version of said build system, in which I first parse each Markdown file into Pandoc's JSON internal format, then use the parsed JSON files as input into a homegrown Lua template system that outputs the final HTML files, and also uses a few extra templates to generate the blog's fancy listing page as well as well as the Atom feed.

The whole build happens through Tup, which is a make-like build tool, that tracks dependencies through fancy FUSE mounts, which lets it get very fast incremental rebuilds without running the risk of missing a dependency.

*inhales*

...That said, Tup can be overzealous at times, and if something depends on just part of a file, it would still get rebuilt even if a different part of the file is updated, as Tup has no way to detect that.

The article and tag processing pipeline

Extracting the tags (with Jq)

In order to add tags, I figured I would need to store a list of all pages sharing a same tag. Then, when I'm rendering the individual pages, I can search the list for the current page, and use that to get the adjacent Next and Previous page.

To generate the list, I needed to get the parsed pages' JSON files (that already include the tags as metadata), then aggregate them into one large list, outputting it as another JSON file. While I could have used Pandoc for that, I decided that for JSON manipulation, I should probably use jq instead, since jq is the Swiss Army knife in my toolbox for manipulating JSON files, just like Pandoc is the Swiss Army knife for manipulating markup formats.

On my website, tags are part of the "metadata" of a page, which looks something like this:

---
tags:
  - website
  - 100DaysToOffload
---

...Rest of the markdown file...

The tags of the page then get parsed into a Pandoc JSON file, that ends up that looking like this:

{
  "pandoc-api-version": [1, 23, 1],
  "meta": {
    "tags": {
      "t": "MetaList",
      "c": [
        {
          "t": "MetaInlines",
          "c": [ { "t": "Str", "c": "website" } ]
        },
        {
          "t": "MetaInlines",
          "c": [ { "t": "Str", "c": "100DaysToOffload" } ]
        }
      ]
    },
  },
  "blocks": [...]
}

..it's a mess. 🤡 But hey, I can see the "website" and "100DaysToOffload" strings that I want to extract; surely it won't be that bad? 😂

Extracting the tags of a page can be done by chaining enough of Jq's filters. For example, I can use a command like the following:

jq '.meta.tags.c[].c[].c' build/blog/article.json
"website"
"100DaysToOffload"

I can also add the path to the page, which I've thoughtfully added as a metadata called path already:

jq '[.meta.tags.c[].c[].c, .meta.path.c]' build/blog/article.json
[ "website",  "100DaysToOffload", "/blog/article/" ]

Err... close enough. I would really want to have one path for every tag. I want to first iterate over tags, then make arrays including the path; if I use an as binding and a pipeline it somehow works: (Jq magic at its finest)

jq '.meta.tags.c[].c[].c as $tag | [$tag, .meta.path.c]' build/blog/article.json
[ "website", "/blog/article/" ]
[ "100DaysToOffload", "/blog/article/" ]

Now I only need to aggregate the tags into an object. Looking around Jq's documentation, I find the reduce feature, which should do the trick: (from_entries is not an option, since we need to get an array of paths that all share the same key)

jq 'reduce (.meta.tags.c[].c[].c as $tag | [$tag, .meta.path.c]) as $i ({}; .[$i[0]] |= . + [$i[1]])' build/blog/article.json
{
  "website": [
    "/blog/article/"
  ],
  "100DaysToOffload": [
    "/blog/article/"
  ]
}

Perfect!

Now I just need to input multiple articles, and the list of articles will be complete! Thankfully, jq supports multiple input files out of the box:

jq 'reduce (.meta.tags.c[].c[].c as $tag | [$tag, .meta.path.c]) as $i ({}; .[$i[0]] |= . + [$i[1]])' build/blog/article.json build/blog/other-article.json
{
  "website": [
    "/blog/article/"
  ],
  "100DaysToOffload": [
    "/blog/article/"
  ]
}
{
  "other-tag": [
    "/blog/other-article/"
  ],
  "100DaysToOffload": [
    "/blog/other-article/"
  ]
}

🤔 That's not what I wanted, though! I need to output just one combined JSON, in which the two 100DaysToOffload lists are merged.

Browsing Jq's documentation for the last time, I find the -n/--null-input flag which, when used with the inputs function can produce e.g. an array of all inputs. Seems useful; retrying:

jq -n 'reduce (inputs | .meta.tags.c[].c[].c as $tag | [$tag, .meta.path.c]) as $i ({}; .[$i[0]] |= . + [$i[1]])' build/blog/article.json build/blog/other-article.json
{
  "website": [
    "/blog/article/"
  ],
  "100DaysToOffload": [
    "/blog/article/",
    "/blog/other-article/"
  ],
  "other-tag": [
    "/blog/other-article/"
  ],
}

Hooray! We have a list of all tags!

Getting the tags to the template

Now that I had the list of tags, I needed to get them to the article template that was going to use them to display a "Browse more articles" section at the end of every article.

Typically, I would do that using a metadata file, which I pass to Pandoc together with the source markdown file. But for the list of tags, doing it like so would result in an dependency cycle! After all, I need the source markdown file to produce the page's JSON file, which I then use to produce the list of tags; I cannot use the list of tags at the step which parses the markdown file.

Clearly, I need to inject the list of tags during the stage that takes the JSON file and produces an HTML page out of it; labeled pandoc lua in the diagram at the start of this article.
Unfortunately, this stage uses my homegrown Lua template system, that doesn't support metadata files.
Fortunately, my homegrown system supports a cascade of templates, where each template can recursively include the next template in the list. 😁

So, all I needed to do was create a template that would add the list of tags into the metadata of the page we are about to render:

---
template: true
---
<?
self = current_template() -- Get a reference to the currently processed document
doc = in_template() -- Include the next template recursively (in this case, the final page)
doc.meta.globalTags = self.meta.globalTags -- copy the list of tags into the metadata
self.meta = doc.meta -- copy the metadata into the currently-processed document
-- (that way, the template including this template would be able to use the metadata from the final page)
?>
<?= doc ?>

Then, I added a command to Tup which would parse the template using the generated tags metadata, and it was all set:

# List tags using Jq:
: build/article.json ... |> ^o^ jq '...command from before... | {globalTags: .}' %f > %o |> build/_tags.metadata.json

# Compile tags template using the metadata:
: tags.luatmpl.md | build/_tags.metadata.json |> pandoc %f --metadata-file=%i -o %o |> build/_tags.json

# Add the tags template to the cascade (so it processes the main template, which then uses the tags template, which then includes the article itself):
: build/article.json |> pandoc lua output.lua article.luatmpl.html build/_tags.metadata.json %f |> dist/article.html

Here, the ^o^ flag is very special (even if it looks like a face!). It instructs Tup to not rebuild the commands that depend on build/_tags.metadata.json, unless the content of build/_tags.metadata.json has changed — and, given that build/_tags.metadata.json depends on every single page, I would rather not rebuild the whole site every time I make a small tweak to just one page.

Putting the tags on the page

The last part was the simplest; now that I had the tag data accessible in the page template, all I needed to do was use it to make a neatly formatted display. I experimented a bit with different wordings and designs, but my final template looks like the following:

<? for _, tag in ipairs(doc.meta.tags) do ?>
<p class="browse-more">
  <?
  -- Find the current document in the list of documents by tag
  local items = doc.meta.globalTags[stringify(tag)] or {}
  -- (if we don't find it (happens for WIP pages), we count it as one past the end)
  local doc_i = #items + 1
  for i,item in ipairs(items) do
    if stringify(item.path) == stringify(doc.meta.path) then
      doc_i = i
      break
    end
  end
  ?>
  <span class="current">
    Articles tagged #<?= tag ?> (<?= doc_i ?>/<?= #items_list ?>)
  </span>
  <? if items[doc_i - 1] then ?>
    <a class="prev" href="<?= items[doc_i - 1] ?>">← Previous</a>
  <? end ?>
  <? if items_list[doc_i + 1] then ?>
    <a class="next" href="<?= items_list[doc_i + 1] ?>">Next →</a>
  <? end ?>
</p>
<? end ?>

I then used a CSS grid to the putting the Previous and Next links on the sides, where they should be:

Final result

I based the design itself on Tracy Durnell's blog, which similarly includes links for the previous and next page at the bottom of the article; I just extended it a bit to also include tag names. Her blog also include a feature for jumping to a random page, which I absolutely want to include in mine too, but would have to wait until I've migrated to self-hosting my blog.

To match up her design, I ended up complicated all the code so far to also extract page titles (which you can see the final commit), to achieve the following:

%The new "Browse more articles?" section of a recent article (With a light vignette, so it's clearer it's just an image 😅)
The new "Browse more articles?" section of a recent article (With a light vignette, so it's clearer it's just an image 😅)

Was it worth? That's for you to decide! I'll have to go back and fix the tags of some of the earlier pages; but other than that, I'm quite happy with the way it's working now.


This is my 20th post of #100DaysToOffload—but, guess what, I don't even need to announce that anymore! I can just use the number from the Browse section to figure out which number I'm at 😁

Articles tagged website (5/5) →|

Articles tagged technology (12/12) →|

Articles tagged 100DaysToOffload (20/20) →|

Articles on this blog (27/27) →|

Comments?