tlakh/SingleFile.org
2024-03-20 21:24:46 +01:00

4.8 KiB

SingleFile

Prologue

I am using the Zettelkasten methodology for personal knowledge management, with Org-roam to implement it.

While the Internet does not forget things in general, it might be difficult to find things again. Maybe someone published some information in a GitHub gist or has their whole blog there. Suddenly GitHub falls out of favour because it got bought by an evilcorp and information disappears.

For those reasons I store a link to the information directly, a link to the Wayback Machine and a local copy. My Zettelkasten and all references, including a local copy, are stored in git.

SingleFile

SingleFile is a Firefox[fn::Other browsers are supported as well, I just think Firefox is the least-evil one.] extension to save a website, including images and css, into a single HTML file. It's perfect for personal archival purposes. When run as a plugin it will save the page as it is currently rendered. For example dismissed cookie banners will not be part of the safes page.

The downside is that it's some semi-manual process to get the backup into the Zettelkasten note.

SingleFile-CLI

There is also a CLI version, like all modern stuff it runs in docker. Of course I do not have docker on OpenBSD, nor the alternative podman.

So I setup a Fedora VM, installed podman and run singlefile-cli via ssh. On the Fedora VM I use this wrapper:

  #! /bin/sh

  set -e

  content=$(podman run --privileged -u 0:0 singlefile "$@")

  now=$(date --iso-8601=seconds -u)
  title=$(echo ${content} | /home/florian/.cargo/bin/htmlq --text title | \
              awk '{$1=$1};1' | \
              head -1 | tr -d '\n' | tr -c '[:alnum:]' '_' | cut -c -128)

  fn="/tmp/${title}_${now}.html"

  echo ${content} > ${fn}
  echo $fn

It fetches a copy of the website, uses htmlq to extract the title, removes special characters and saves the page in /tmp. It then spits out the file-name of the backup.

On my laptop I have this script:

  #! /bin/sh

  set -e

  fn=$(ssh fedora bin/singlefile "$@")
  bn=$(basename ${fn})
  scp fedora:${fn} .
  open ${bn}

This causes the Fedora VM to fetch a backup of the website and then copies it over and stores it in the current working directory. Finally it opens it in the browser for inspection.

org-cliplink

We still do not have links stored in the Zettelkasten note.

For that I use org-cliplink to insert org mode links from the clipboard.

A bit of elisp code massages the link text to add "Archive: " or "Local: " in front of the title. It also changes the URL for the local backup to have it relative so that it works correctly no matter where the git repo with the Zettelkasten is checked out.

  (use-package org-cliplink
    :config
    (defun custom-org-cliplink ()
      (interactive)
      (org-cliplink-insert-transformed-title
       (org-cliplink-clipboard-content)     ;take the URL from the CLIPBOARD
       (lambda (url title)
         (let* ((parsed-url (url-generic-parse-url url)) ;parse the url
                (turl
                 (cond
                  ;; if type is file make the url relative to zettelkasten
                  ((string= (url-type parsed-url) "file")
                   (url-unhex-string (replace-regexp-in-string "\\(.+\\)\/zettelkasten\/\\(.+\\)" "file:\\2" url)))
                  ;; otherwise keep the original url
                  (t url)))
                (ttitle
                 (cond
                  ;; if type is file, add Local: to title
                  ((string= (url-type parsed-url) "file")
                   (replace-regexp-in-string "\\(.+\\)" "Local: \\1" title))
                  ;; otherwise keep the original title
                  (t title))))
           ;; forward to the default org-cliplink transformer
           (org-cliplink-org-mode-link-transformer turl ttitle)))))
    :custom
    (org-cliplink-title-replacements
     '(("https://github.com/.+/?"
        ("\\(.*\\) · \\(?:Issue\\|Pull Request\\) #\\([0-9]+\\) · \\(.*\\) · GitHub" "\\3#\\2 \\1"))
       ("https://twitter.com/.+/status/[[:digit:]]+/?"
        (".+ on Twitter: \\(.+\\)" "\\1"))
       ("https://web.archive.org/.+/?"
        ("\\(.+\\)" "Archive: \\1"))))
    :bind (:map org-mode-map
                ("C-c C-S-L" . custom-org-cliplink)))

Epilogue

I only recently added singlefile-cli to my work-flow, so far it is a big improvement. Time will tell if the backups are cluttered with banners and I have to go back to semi-manual mode.