tlakh/SingleFile.org
2024-03-20 21:24:46 +01:00

124 lines
4.8 KiB
Org Mode

#+TITLE: SingleFile
#+DATE: 2024-03-20
* Prologue
I am using the [[https://en.wikipedia.org/wiki/Zettelkasten#Use_in_personal_knowledge_management][Zettelkasten]] methodology for personal knowledge
management, with [[https://www.orgroam.com/][Org-roam]] to implement it.
While the Internet does not forget things in general, it might be
difficult to find things again. Maybe someone published some
information in a GitHub gist or has their whole blog there.
Suddenly GitHub falls out of favour because it got bought by an
evilcorp and information disappears.
For those reasons I store a link to the information directly, a link
to the [[https://web.archive.org/][Wayback Machine]] and a local copy. My Zettelkasten and all
references, including a local copy, are stored in git.
* SingleFile
[[https://github.com/gildas-lormeau/SingleFile][SingleFile]] is a Firefox[fn::Other browsers are supported as well, I
just think Firefox is the least-evil one.] extension to save a website,
including images and css, into a single HTML file. It's perfect for
personal archival purposes. When run as a plugin it will save the page
as it is currently rendered. For example dismissed cookie banners will
not be part of the safes page.
The downside is that it's some semi-manual process to get the backup
into the Zettelkasten note.
* SingleFile-CLI
There is also a [[https://github.com/gildas-lormeau/single-file-cli][CLI]] version, like all modern stuff it runs in
docker. Of course I do not have docker on OpenBSD, nor the alternative
podman.
So I setup a Fedora VM, installed podman and run singlefile-cli via
ssh. On the Fedora VM I use this wrapper:
#+begin_src sh
#! /bin/sh
set -e
content=$(podman run --privileged -u 0:0 singlefile "$@")
now=$(date --iso-8601=seconds -u)
title=$(echo ${content} | /home/florian/.cargo/bin/htmlq --text title | \
awk '{$1=$1};1' | \
head -1 | tr -d '\n' | tr -c '[:alnum:]' '_' | cut -c -128)
fn="/tmp/${title}_${now}.html"
echo ${content} > ${fn}
echo $fn
#+end_src
It fetches a copy of the website, uses [[https://github.com/mgdm/htmlq][htmlq]] to extract the title,
removes special characters and saves the page in ~/tmp~. It then spits
out the file-name of the backup.
On my laptop I have this script:
#+begin_src sh
#! /bin/sh
set -e
fn=$(ssh fedora bin/singlefile "$@")
bn=$(basename ${fn})
scp fedora:${fn} .
open ${bn}
#+end_src
This causes the Fedora VM to fetch a backup of the website and then
copies it over and stores it in the current working directory. Finally
it opens it in the browser for inspection.
* org-cliplink
We still do not have links stored in the Zettelkasten note.
For that I use [[https://github.com/rexim/org-cliplink][org-cliplink]] to insert org mode links from the
clipboard.
A bit of elisp code massages the link text to add "Archive: " or
"Local: " in front of the title. It also changes the URL for the local
backup to have it relative so that it works correctly no matter where
the git repo with the Zettelkasten is checked out.
#+begin_src elisp
(use-package org-cliplink
:config
(defun custom-org-cliplink ()
(interactive)
(org-cliplink-insert-transformed-title
(org-cliplink-clipboard-content) ;take the URL from the CLIPBOARD
(lambda (url title)
(let* ((parsed-url (url-generic-parse-url url)) ;parse the url
(turl
(cond
;; if type is file make the url relative to zettelkasten
((string= (url-type parsed-url) "file")
(url-unhex-string (replace-regexp-in-string "\\(.+\\)\/zettelkasten\/\\(.+\\)" "file:\\2" url)))
;; otherwise keep the original url
(t url)))
(ttitle
(cond
;; if type is file, add Local: to title
((string= (url-type parsed-url) "file")
(replace-regexp-in-string "\\(.+\\)" "Local: \\1" title))
;; otherwise keep the original title
(t title))))
;; forward to the default org-cliplink transformer
(org-cliplink-org-mode-link-transformer turl ttitle)))))
:custom
(org-cliplink-title-replacements
'(("https://github.com/.+/?"
("\\(.*\\) · \\(?:Issue\\|Pull Request\\) #\\([0-9]+\\) · \\(.*\\) · GitHub" "\\3#\\2 \\1"))
("https://twitter.com/.+/status/[[:digit:]]+/?"
(".+ on Twitter: \\(.+\\)" "\\1"))
("https://web.archive.org/.+/?"
("\\(.+\\)" "Archive: \\1"))))
:bind (:map org-mode-map
("C-c C-S-L" . custom-org-cliplink)))
#+end_src
* Epilogue
I only recently added singlefile-cli to my work-flow, so far it is a
big improvement. Time will tell if the backups are cluttered with
banners and I have to go back to semi-manual mode.