diff --git a/SingleFile.org b/SingleFile.org new file mode 100644 index 0000000..065d02a --- /dev/null +++ b/SingleFile.org @@ -0,0 +1,123 @@ +#+TITLE: SingleFile +#+DATE: 2024-03-20 +* Prologue +I am using the [[https://en.wikipedia.org/wiki/Zettelkasten#Use_in_personal_knowledge_management][Zettelkasten]] methodology for personal knowledge +management, with [[https://www.orgroam.com/][Org-roam]] to implement it. + +While the Internet does not forget things in general, it might be +difficult to find things again. Maybe someone published some +information in a GitHub gist or has their whole blog there. +Suddenly GitHub falls out of favour because it got bought by an +evilcorp and information disappears. + +For those reasons I store a link to the information directly, a link +to the [[https://web.archive.org/][Wayback Machine]] and a local copy. My Zettelkasten and all +references, including a local copy, are stored in git. + +* SingleFile +[[https://github.com/gildas-lormeau/SingleFile][SingleFile]] is a Firefox[fn::Other browsers are supported as well, I +just think Firefox is the least-evil one.] extension to save a website, +including images and css, into a single HTML file. It's perfect for +personal archival purposes. When run as a plugin it will save the page +as it is currently rendered. For example dismissed cookie banners will +not be part of the safes page. + +The downside is that it's some semi-manual process to get the backup +into the Zettelkasten note. + +* SingleFile-CLI +There is also a [[https://github.com/gildas-lormeau/single-file-cli][CLI]] version, like all modern stuff it runs in +docker. Of course I do not have docker on OpenBSD, nor the alternative +podman. + +So I setup a Fedora VM, installed podman and run singlefile-cli via +ssh. On the Fedora VM I use this wrapper: +#+begin_src sh + #! /bin/sh + + set -e + + content=$(podman run --privileged -u 0:0 singlefile "$@") + + now=$(date --iso-8601=seconds -u) + title=$(echo ${content} | /home/florian/.cargo/bin/htmlq --text title | \ + awk '{$1=$1};1' | \ + head -1 | tr -d '\n' | tr -c '[:alnum:]' '_' | cut -c -128) + + fn="/tmp/${title}_${now}.html" + + echo ${content} > ${fn} + echo $fn +#+end_src + +It fetches a copy of the website, uses [[https://github.com/mgdm/htmlq][htmlq]] to extract the title, +removes special characters and saves the page in ~/tmp~. It then spits +out the file-name of the backup. + +On my laptop I have this script: +#+begin_src sh + #! /bin/sh + + set -e + + fn=$(ssh fedora bin/singlefile "$@") + bn=$(basename ${fn}) + scp fedora:${fn} . + open ${bn} +#+end_src +This causes the Fedora VM to fetch a backup of the website and then +copies it over and stores it in the current working directory. Finally +it opens it in the browser for inspection. + +* org-cliplink +We still do not have links stored in the Zettelkasten note. + +For that I use [[https://github.com/rexim/org-cliplink][org-cliplink]] to insert org mode links from the +clipboard. + +A bit of elisp code massages the link text to add "Archive: " or +"Local: " in front of the title. It also changes the URL for the local +backup to have it relative so that it works correctly no matter where +the git repo with the Zettelkasten is checked out. + +#+begin_src elisp + (use-package org-cliplink + :config + (defun custom-org-cliplink () + (interactive) + (org-cliplink-insert-transformed-title + (org-cliplink-clipboard-content) ;take the URL from the CLIPBOARD + (lambda (url title) + (let* ((parsed-url (url-generic-parse-url url)) ;parse the url + (turl + (cond + ;; if type is file make the url relative to zettelkasten + ((string= (url-type parsed-url) "file") + (url-unhex-string (replace-regexp-in-string "\\(.+\\)\/zettelkasten\/\\(.+\\)" "file:\\2" url))) + ;; otherwise keep the original url + (t url))) + (ttitle + (cond + ;; if type is file, add Local: to title + ((string= (url-type parsed-url) "file") + (replace-regexp-in-string "\\(.+\\)" "Local: \\1" title)) + ;; otherwise keep the original title + (t title)))) + ;; forward to the default org-cliplink transformer + (org-cliplink-org-mode-link-transformer turl ttitle))))) + :custom + (org-cliplink-title-replacements + '(("https://github.com/.+/?" + ("\\(.*\\) · \\(?:Issue\\|Pull Request\\) #\\([0-9]+\\) · \\(.*\\) · GitHub" "\\3#\\2 \\1")) + ("https://twitter.com/.+/status/[[:digit:]]+/?" + (".+ on Twitter: \\(.+\\)" "\\1")) + ("https://web.archive.org/.+/?" + ("\\(.+\\)" "Archive: \\1")))) + :bind (:map org-mode-map + ("C-c C-S-L" . custom-org-cliplink))) +#+end_src + +* Epilogue +I only recently added singlefile-cli to my work-flow, so far it is a +big improvement. Time will tell if the backups are cluttered with +banners and I have to go back to semi-manual mode.